Search This Blog

Saturday, 9 July 2016

SQL Server architectural diagram with all the components

SQL Server architectural diagram with all the components



SQL Server Architecture is a very deep subject. Covering it in a single post is an almost impossible task. However, this subject is very popular topic among beginners and advanced users.  I have requested my friend Anil Kumar, who is expert in SQL Domain to help me write  a simple post about Beginning SQL Server Architecture. As stated earlier, this subject is very deep subject and in this first article series he has covered basic terminologies. In future article he will explore the subject further down.
In this Article we will discuss about MS SQL Server architecture.
The major components of SQL Server are:
  1. Relational Engine
  2. Storage Engine
  3. SQL OS
Now we will discuss and understand each one of them.
1) Relational Engine: Also called as the query processor, Relational Engine includes the components of SQL Server that determine what your query exactly needs to do and the best way to do it. It manages the execution of queries as it requests data from the storage engine and processes the results returned.
Different Tasks of Relational Engine:
  1. Query Processing
  2. Memory Management
  3. Thread and Task Management
  4. Buffer Management
  5. Distributed Query Processing
2) Storage Engine: Storage Engine is responsible for storage and retrieval of the data on to the storage system (Disk, SAN etc.). to understand more, let’s focus on the concepts.
When we talk about any database in SQL server, there are 2 types of files that are created at the disk level – Data file and Log file. Data file physically stores the data in data pages. Log files that are also known as write ahead logs, are used for storing transactions performed on the database.
Let’s understand data file and log file in more details:
Data FileData File stores data in the form of Data Page (8KB) and these data pages are logically organized in extents.
Extents: Extents are logical units in the database. They are a combination of 8 data pages i.e. 64 KB forms an extent. Extents can be of two types, Mixed and Uniform. Mixed extents hold different types of pages like index, system, data etc (multiple objects). On the other hand, Uniform extents are dedicated to only one type (object).
Pages: As we should know what type of data pages can be stored in SQL Server, below mentioned are some of them:
  • Data Page: It holds the data entered by the user but not the data which is of type text, ntext, nvarchar(max), varchar(max), varbinary(max), image and xml data.
  • Index: It stores the index entries.
  • Text/Image: It stores LOB ( Large Object data) like text, ntext, varchar(max), nvarchar(max),  varbinary(max), image and xml data.
  • GAM & SGAM (Global Allocation Map & Shared Global Allocation Map): They are used for saving information related to the allocation of extents.
  • PFS (Page Free Space): Information related to page allocation and unused space available on pages.
  • IAM (Index Allocation Map): Information pertaining to extents that are used by a table or index per allocation unit.
  • BCM (Bulk Changed Map): Keeps information about the extents changed in a Bulk Operation.
  • DCM (Differential Change Map): This is the information of extents that have modified since the last BACKUP DATABASE statement as per allocation unit.
Log File: It also known as write ahead log. It stores modification to the database (DML and DDL).
  • Sufficient information is logged to be able to:
    • Roll back transactions if requested
    • Recover the database in case of failure
    • Write Ahead Logging is used to create log entries
      • Transaction logs are written in chronological order in a circular way
      • Truncation policy for logs is based on the recovery model
SQL OS: This lies between the host machine (Windows OS) and SQL Server. All the activities performed on database engine are taken care of by SQL OS. It is a highly configurable operating system with powerful API (application programming interface), enabling automatic locality and advanced parallelism. SQL OS provides various operating system services, such as memory management deals with buffer pool, log buffer and deadlock detection using the blocking and locking structure. Other services include exception handling, hosting for external components like Common Language Runtime, CLR etc.




Data Pages & Extents

Data Pages & Extents
How Indexes Are Stored
All SQL Server data is physically organized and stored in data pages. Unlike other relational database management systems, SQL Server does not provide the option of choosing the page size for different data types or tables. All SQL Server pages are 8KB in size. When you create primary or secondary database files, SQL Server allocates pages and extents for data storage. Each page has a number, starting from 0; the number of the last page in the database is determined by the database file size. When the database file size is increased through the ALTER DATABASE statement, new data pages are appended to the end of the file. Similarly if you shrink a database, then its data pages are removed, starting from the end of the file and gradually moving to the beginning.
Eight data pages (8KB each) make up a single extent. Therefore, each extent is 64KB. SQL Server has two classes of extents: uniform and mixed. Uniform extents are dedicated to a single object. Normally SQL Server allocates multiple uniform extents for each data table. However, if a table is small, SQL Server won't allocate an entire extent for it. Instead it will allocate data pages from a mixed extent, which can be thought of as a pool of pages for small tables. Each mixed extent can be shared by multiple tables. Since each extent has eight data pages, up-to eight objects can share data pages from a single mixed extent. When you first create a table SQL Server starts by allocating a data page to it from a mixed extent. Once the table has enough data to warrant a full extent, SQL Server will allocate a uniform extent to it. Similarly if you build an index on a table that has at least eight pages SQL Server will dedicate a uniform extent for storing the index data.
A set of pages used to hold table or index data within a single partition is referred to as an allocation unit. An allocation unit can be one of three types:
·         In-row data. This type holds column data with most data types, except data types for large objects.
 
·         Large object (LOB) data. This type holds large data type columns, such as TEXT, NTEXT, IMAGE, VARHCAR(MAX), NVARHCAR(MAX), XML and VARBINARY(MAX).
 
·         Row-overflow data. This type holds data for variable length columns that exceed the row size limit of 8060 bytes.

The Relationship Between Index Size and Page Size
A data record (or data row) within any SQL Server table is always stored on a single page. However, if the row contains large data types and all of its data won't fit on an 8KB page, then portions of the row (variable length columns, for example those with VARCHAR, VARBINARY, SQL_VARIANT data types) can be moved to data pages in row overflow data allocation unit. SQL Server maintains such row overflow data dynamically by adding a pointer to the original data page. If data in variable length columns is subsequently trimmed by an UPDATE statement and the total row size becomes less than 8060 bytes, SQL Server moves the variable length columns back to their original location. Note that the length of each column must still be less than 8000 characters; their combined length can exceed this limit. Note also, that such dynamic allocation of row overflow data comes with the performance penalty.
Normally you should try to keep the row length small enough to fit on a single page. If you have several variable length columns whose combined length will frequently exceed the 8060 byte limit, split the existing table into multiple tables that have one-to-one relationship with each other to ensure that the whole row can fit on a single data page. You cannot build a clustered index on a varying length column whose data is on row-overflow pages; however, you can include such columns as the key or non-key columns of non-clustered indexes.
Your table rows could contain large data, stored in TEXT, NTEXT, VARCHAR(MAX), VARBINARY(MAX), XML and IMAGE data types. Data rows with TEXT, NTEXT and IMAGE data type are stored outside of the normal data pages in special storage structures, allowing up to 2GB of storage. The data page contains a 16-byte pointer to the TEXT and IMAGE pages. Note that these data types are deprecated and will be removed from future versions of SQL Server. You should favor VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) columns for all new development. If you anticipate columns with TEXT, NTEXT or IMAGE data types to contain relatively small amount of data, you can advise SQL Server to attempt storing such content in row; as opposed to creating new TEXT and IMAGE pages. To do so you can use system procedure sp_tableoption with the 'text in row' option. If the data is indeed relatively small, then the storage will work the same way as it does with VARCHAR, NVARCHAR and VARBINARY data types. If the data can fit on a data page it is stored there; otherwise it is stored in row-overflow pages.
Columns built using large variable length data types (VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), XML) are normally stored on the data page and handled the same way as VARCHAR, NVARCHAR and VARBINARY data types. You could use sp_tableoption system procedure with 'large value types out of row' option and advise SQL Server to manage these data types same way as TEXT, NTEXT and IMAGE data types. If you do choose this option the data page will contain a 16-byte pointer to the TEXT and IMAGE pages.
Other Things Stored in the Index
In addition to storing data, all auxiliary information used for managing free space, tracking modified extents, allocating new pages and extents and so forth is also stored in pages. The following table summarizes different types of pages used by SQL Server:
Page Type
Description
Data
Stores all data except those data types stored by text and image pages.
Index
Stores index data.
Text and Image
Stores TEXT, NTEXT, IMAGE, VARCHAR(MAX), VARBINARY(MAX) and XML data types. In addition these pages also store variable length columns (VARCHAR, VARBINARY, SQL_VARIANT data types) once they exceed 8KB in size.
Global Allocation Map (GAM)
Records information about allocation of mixed and uniform extents. GAM pages contain bits that help SQL Server determine whether the extent is used or is available for use. When a new page needs to be allocated, SQL Server reads GAM pages to find whether any extents are available. When an extent is de-allocated, the bit for this extent is reset in GAM to make it available for future allocations.
Shared Global Allocation Map (SGAM)
Records information about mixed extents. SGAM pages record whether each mixed extent has any unused pages. If no mixed extent has any unused pages SQL Server attempts to find an unused extent in GAM pages. Once an extent is found it will be allocated as a mixed extent. If GAM has no free extents and SGAM shows that all extent pages have been allocated the data file is full and must be expanded before you can store additional data.
Page Free Space (PFS)
Keeps track of how full each data page is. In addition PFS also tracks the allocation status of each individual page. GAM bits are set so that SQL Server knows that the extent is allocated; SQL Server uses PFS pages to see which pages can be used for new allocation requests. The free space percentage is only maintained for tables without a clustered index (also called "heaps") and for TEXT / IMAGE pages. If the page has any free space it can be used for newly inserted rows.
Index Allocation Map (IAM)
Stores information about extents used by tables without a clustered index (also called "heaps") or indexes within a single allocation unit. IAM pages have a large bitmap, each bit representing an extent. When you execute an INSERT statement, SQL Server checks the extents currently allocated to the table which you wish to populate. SQL Server will use PFS pages to see if the row can be inserted into previously allocated page. It also uses IAM to determine which extents are allocated to the given partition (allocation unit). If no pages are found with enough room to store the row, SQL Server allocates a new extent.
Bulk Changed Map (BCM)
Stores information about extents altered by bulk operations such as Bulk Copy Program (BCP) or BULK INSERT statement, per allocation unit. BCM only tracks the extents modified since the last BACKUP LOG statement. BCM pages are only relevant for databases that use bulk-logged recovery model because with this model BACKUP LOG statement includes the modified extents in the log backup. Doing so allows the database to be recovered including the bulk operations. If the database uses the simple recovery model, then bulk operations aren't logged. The full recovery model treats bulk operations as fully logged commands. Bulk-logged recovery model logs only the fact that a bulk operation took place, but transaction log backup includes the copy of all extents that were modified through the bulk operation. The performance of bulk operations could be considerably faster if the database uses the bulk-logged model as compared to using the full recovery model, but the transaction log backup size can grow large.
So although BCM pages exist in every database they're only used with databases using the bulk-logged recovery model.
Differential Changed Map (DCM)
Stores information about all extents that have been modified since the last full database backup was taken, per allocation unit. Differential backups scan the DCM pages and copy only those extents that were altered since the last full backup. This makes differential database backups considerably faster than full database backups.
Retrieving Database Page Metadata
You can use DBCC PAGE command to retrieve meta-data about a database page. Normally you should only execute this (undocumented) DBCC command when troubleshooting any issues with Microsoft support engineers. Before executing DBCC PAGE you need to execute DBCC TRACEON (3604) in order to return the output to the screen.