Data Pages & Extents
How Indexes Are Stored
All SQL Server data is
physically organized and stored in data pages. Unlike other relational database
management systems, SQL Server does not provide the option of choosing the page
size for different data types or tables. All SQL Server pages are 8KB in size.
When you create primary or secondary database files, SQL Server allocates pages
and extents for data storage. Each page has a number, starting from 0; the
number of the last page in the database is determined by the database file
size. When the database file size is increased through the ALTER DATABASE statement, new data pages are appended to the end of
the file. Similarly if you shrink a database, then its data pages are removed,
starting from the end of the file and gradually moving to the beginning.
Eight data pages (8KB
each) make up a single extent. Therefore, each extent is 64KB. SQL Server has
two classes of extents: uniform and mixed. Uniform extents are dedicated to a
single object. Normally SQL Server allocates multiple uniform extents for each
data table. However, if a table is small, SQL Server won't allocate an entire
extent for it. Instead it will allocate data pages from a mixed extent, which
can be thought of as a pool of pages for small tables. Each mixed extent can be
shared by multiple tables. Since each extent has eight data pages, up-to eight
objects can share data pages from a single mixed extent. When you first create
a table SQL Server starts by allocating a data page to it from a mixed extent.
Once the table has enough data to warrant a full extent, SQL Server will
allocate a uniform extent to it. Similarly if you build an index on a table
that has at least eight pages SQL Server will dedicate a uniform extent for
storing the index data.
A set of pages used to
hold table or index data within a single partition is referred to as an
allocation unit. An allocation unit can be one of three types:
·
In-row data. This type
holds column data with most data types, except data types for large
objects.
·
Large object (LOB)
data. This type holds large data type columns, such as TEXT, NTEXT, IMAGE,
VARHCAR(MAX), NVARHCAR(MAX), XML and VARBINARY(MAX).
·
Row-overflow data.
This type holds data for variable length columns that exceed the row size limit
of 8060 bytes.
The Relationship
Between Index Size and Page Size
A data record (or data
row) within any SQL Server table is always stored on a single page. However, if
the row contains large data types and all of its data won't fit on an 8KB page,
then portions of the row (variable length columns, for example those with
VARCHAR, VARBINARY, SQL_VARIANT data types) can be moved to data pages in row
overflow data allocation unit. SQL Server maintains such row overflow data
dynamically by adding a pointer to the original data page. If data in variable
length columns is subsequently trimmed by an UPDATE statement and the total row
size becomes less than 8060 bytes, SQL Server moves the variable length columns
back to their original location. Note that the length of each column must still
be less than 8000 characters; their combined length can exceed this limit. Note
also, that such dynamic allocation of row overflow data comes with the
performance penalty.
Normally you should
try to keep the row length small enough to fit on a single page. If you have
several variable length columns whose combined length will frequently exceed
the 8060 byte limit, split the existing table into multiple tables that have
one-to-one relationship with each other to ensure that the whole row can fit on
a single data page. You cannot build a clustered index on a varying length
column whose data is on row-overflow pages; however, you can include such
columns as the key or non-key columns of non-clustered indexes.
Your table rows could
contain large data, stored in TEXT, NTEXT, VARCHAR(MAX), VARBINARY(MAX), XML
and IMAGE data types. Data rows with TEXT, NTEXT and IMAGE data type are stored
outside of the normal data pages in special storage structures, allowing up to
2GB of storage. The data page contains a 16-byte pointer to the TEXT and IMAGE
pages. Note that these data types are deprecated and will be removed from
future versions of SQL Server. You should favor VARCHAR(MAX), NVARCHAR(MAX) and
VARBINARY(MAX) columns for all new development. If you anticipate columns with
TEXT, NTEXT or IMAGE data types to contain relatively small amount of data, you
can advise SQL Server to attempt storing such content in row; as opposed to
creating new TEXT and IMAGE pages. To do so you can use system procedure
sp_tableoption with the 'text in row' option. If the data is indeed relatively
small, then the storage will work the same way as it does with VARCHAR,
NVARCHAR and VARBINARY data types. If the data can fit on a data page it is
stored there; otherwise it is stored in row-overflow pages.
Columns built using
large variable length data types (VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX),
XML) are normally stored on the data page and handled the same way as VARCHAR,
NVARCHAR and VARBINARY data types. You could use sp_tableoption system
procedure with 'large value types out of row' option and advise SQL Server to
manage these data types same way as TEXT, NTEXT and IMAGE data types. If you do
choose this option the data page will contain a 16-byte pointer to the TEXT and
IMAGE pages.
Other Things Stored in
the Index
In addition to storing
data, all auxiliary information used for managing free space, tracking modified
extents, allocating new pages and extents and so forth is also stored in pages.
The following table summarizes different types of pages used by SQL Server:
Page Type
|
Description
|
Data
|
Stores all data
except those data types stored by text and image pages.
|
Index
|
Stores index data.
|
Text and Image
|
Stores TEXT, NTEXT,
IMAGE, VARCHAR(MAX), VARBINARY(MAX) and XML data types. In addition these
pages also store variable length columns (VARCHAR, VARBINARY, SQL_VARIANT
data types) once they exceed 8KB in size.
|
Global Allocation
Map (GAM)
|
Records information
about allocation of mixed and uniform extents. GAM pages contain bits that
help SQL Server determine whether the extent is used or is available for use.
When a new page needs to be allocated, SQL Server reads GAM pages to find
whether any extents are available. When an extent is de-allocated, the bit
for this extent is reset in GAM to make it available for future allocations.
|
Shared Global
Allocation Map (SGAM)
|
Records information
about mixed extents. SGAM pages record whether each mixed extent has any
unused pages. If no mixed extent has any unused pages SQL Server attempts to
find an unused extent in GAM pages. Once an extent is found it will be
allocated as a mixed extent. If GAM has no free extents and SGAM shows that
all extent pages have been allocated the data file is full and must be
expanded before you can store additional data.
|
Page Free Space
(PFS)
|
Keeps track of how
full each data page is. In addition PFS also tracks the allocation status of
each individual page. GAM bits are set so that SQL Server knows that the
extent is allocated; SQL Server uses PFS pages to see which pages can be used
for new allocation requests. The free space percentage is only maintained for
tables without a clustered index (also called "heaps") and for TEXT
/ IMAGE pages. If the page has any free space it can be used for newly
inserted rows.
|
Index Allocation Map
(IAM)
|
Stores information
about extents used by tables without a clustered index (also called
"heaps") or indexes within a single allocation unit. IAM pages have
a large bitmap, each bit representing an extent. When you execute an INSERT statement, SQL Server checks the extents currently allocated to the
table which you wish to populate. SQL Server will use PFS pages to see if the
row can be inserted into previously allocated page. It also uses IAM to
determine which extents are allocated to the given partition (allocation
unit). If no pages are found with enough room to store the row, SQL Server
allocates a new extent.
|
Bulk Changed Map
(BCM)
|
Stores information about extents altered by
bulk operations such as Bulk Copy Program (BCP) or BULK INSERT statement, per
allocation unit. BCM only tracks the extents modified since the last BACKUP
LOG statement. BCM pages are only relevant for databases that use bulk-logged
recovery model because with this model BACKUP LOG statement includes the
modified extents in the log backup. Doing so allows the database to be
recovered including the bulk operations. If the database uses the simple
recovery model, then bulk operations aren't logged. The full recovery model
treats bulk operations as fully logged commands. Bulk-logged recovery model
logs only the fact that a bulk operation took place, but transaction log
backup includes the copy of all extents that were modified through the bulk
operation. The performance of bulk operations could be considerably faster if
the database uses the bulk-logged model as compared to using the full
recovery model, but the transaction log backup size can grow large.
So although BCM pages exist in every
database they're only used with databases using the bulk-logged recovery
model.
|
Differential Changed
Map (DCM)
|
Stores information
about all extents that have been modified since the last full database backup
was taken, per allocation unit. Differential backups scan the DCM pages and
copy only those extents that were altered since the last full backup. This
makes differential database backups considerably faster than full database
backups.
|
Retrieving
Database Page Metadata
You can use DBCC PAGE
command to retrieve meta-data about a database page. Normally you should only
execute this (undocumented) DBCC command when troubleshooting any issues with
Microsoft support engineers. Before executing DBCC PAGE you need to execute
DBCC TRACEON (3604) in order to return the output to the screen.
No comments:
Post a Comment