SQL Server101 How Does It Work
SQL Server101 How Does It Work
SQL Server 101 What is SQL Server and how does it work?
The purpose of this paper is to provide you background on SQL Server. We will explain what SQL Server
is, where it came from and its architecture and building blocks.
SQL Server 101 What is SQL Server and how does it work?
SQL was originally an industry standard but it was quickly standardized formally by the likes of ANSI,
ISO, etc. The standard is revised now and then, where new language elements can be added and even
old elements removed. Each product has its own dialect of the SQL language, where it implements
parts of the SQL standard (which is huge) and adds its own language elements. Additions might be
added in order to be competitive, or even just useable. For instance, the standard doesnt say a word
about indexes, which is an important performance structure.
There is a fair amount of critique of the SQL language where some people mean it was a quick and
dirty implementation of a language for a relational database system. Many feel that SQL isnt even a
relational language. It is safe to say that the SQL language, in some aspects, doesnt do the relational
model justice, as it doesnt allow us to explore the full potential of a relational database system. But SQL,
flawed or not, has certainly proven itself to be useful.
SQL Server 101 What is SQL Server and how does it work?
SQL Server 101 What is SQL Server and how does it work?
Instances
For some of the above components (services), you can have more than one installed in an operating
system. We call each installed component an instance. The components in question are the database
engine, Analysis Services and Reporting Services. You can see by the service name whether it is a
component that allows us to have several instances there is a parenthesis and an instance name
after the friendly service name. One instance can be a default instance (where you see MSSQLSERVER in
parenthesis), and you can have several named instances for which you specify the instance name when
you install that instance.
Each instance is separated from each other, except that they of course live in the same operating system.
Each instance also has its own folder structure, for instance, where the exe file is stored. This allows them
to be different versions and build numbers. You can have totally different configurations at the instance
level and also different security settings (including who are super administrators, sysadmin, etc.).
SQL Server 101 What is SQL Server and how does it work?
model
When you create a database, SQL Server uses a template for that database. This template includes items
such as database option settings. And yes, you have guessed it this template database is of course
the model database.
tempdb
As the name implies, this database is used for temporary storage. There are a number of things that
uses tempdb, including:
Worktables used during execution of a SQL query
Explicitly created temporary tables (tables beginning with #) and table variables
R
ow versioning, where a transaction can get to the prior value for a row which is being modified by
somebody else, instead of being blocked or doing a dirty read
The inserted and deleted tables that you have available in a trigger that fires for INSERT, UPDATE or DELETE
The tempdb database is re-created every time you start your SQL Server. The database file structure
(number of database files, size and path) is not picked from the model database, however it is based
on a system table visible using master.sys.master_files.
The resource database
This resource database is in fact hidden from us. We cannot see this database in the Object Explorer
in SQL Server Management Studio nor does it show in sys.databases. It is hidden because it is not
supposed to be altered. I once heard someone from Microsoft suggest to think of it as a DLL file.
It only contains code, not data. But what code you may ask? The code that is your system stored
procedure (like sp_detach_db, sp_help, etc) and also the system views (sys.databases, sys.tables, etc)
are included in the resource database. This allow Microsoft to replace these files when you patch your
SQL Server instead of running script files to re-create these objects in the master database (like we had
in SQL Server 2000 and earlier).
Database files
Each database is stored over a set of files. Files are internally numbered from file one on up (see for instance
sys.database_files). We have at least two files for each database; one data file (.mdf) and one transaction log
file (.ldf). We can have more than one data file (.ndf) and also more than one transaction log file (.ldf).
A data file belongs to a filegroup. We have a minimum of one filegroup for each database the
PRIMARY filegroup. When you create a table or an index, you can specify what filegroup will be used in
the CREATE command. Theres always a default filegroup, which will be PRIMARY unless you change the
default filegroup. If you do not specify what filegroup a table or index should be created on, then it will
be created on the default filegroup.
SQL Server 101 What is SQL Server and how does it work?
You can have more than one data file for each database. The primary (.mdf ) always belongs to the
PRIMARY filegroup. Each filegroup can have more than one data file. If you have more than one file in
the same filegroup then the files will be filled up proportionally to their sizes.
In the end, the filegroup concept allows you to specify on what file - or files - a table will be stored,
such as if you have one table for which you know you will have heavy access. The table is of moderate
size, large enough so it wont fit in RAM by the standard caching mechanism in SQL Server. But you also
have lots of other data in the database. Also, lets say you have a fast disk (SSD) of moderate size that
you can utilize on this server. Create a filegroup, add one file on this SSD disk and create your heavily
accessed table on this SSD disk (the filegroup with the file which is on this SSD disk). This is only one
example of what you can use several filegroups for, and another is backup at the filegroup level, or
performing database consistency checks at the filegroup level (DBCC CHECKFILEGROUP instead of
DBCC CHECKDB).
However, the vast majority of databases have only one filegroup and even only one data file (the .mdf
file). And, in most cases, this is perfectly sufficient. Im a strong believer in the KISS concept (keep it
simple) and if more than one filegroup fails to give you advantages, then why should you have it?
Data file architecture
Each data file is divided into a set of pages, 8 kB in size. Pages are internally numbered from page 0 and
up. A page can be unused, i.e. a free page. Pages are grouped into eight consecutive pages referred
to as an extent. An extent can also be unused if none of the pages on the extent are in use. We
sometimes refer to this type of extent as unallocated (as opposed to allocated).
An allocated extent can be allocated in two ways. Either as a shared (or mixed) extent. On a shared extent you
find pages from different tables and indexes, hence the name shared (or mixed). Also, an extent can essentially
be owned by a certain table or index, what we call a uniform extent.The first eight pages for each table or
index comes from shared extents and subsequent pages come from uniform extents.
We have assumed two types of allocations in this discussion, data and indexes. There are other types as
well, such as LOB allocations for instance, but data and index pages are enough to illustrate the page
and extent allocation principle in SQL Server.
Table and index architecture
Logically, a table is defined over a number of columns. Each column has a name and is a certain data type.
From a physical perspective, a table has rows stored in pages. However, the table (the data) can be
structured in two different ways:
A Heap table
A heap table is a table for which we have not created a clustered index. SQL Server doesnt store the
rows in any particular order. Rows and pages are essentially stored where there is free space. Needless
to say, SQL Server tries to be conservative and not use more pages than necessary and will try to have
as many rows on each page, within reasonable limits.
SQL Server 101 What is SQL Server and how does it work?
A clustered table
A clustered table is a table over which you have created a clustered index. This index might have been
created automatically since SQL Server by default will create a clustered index on the Primary Key
(column), however you can override this. Or, you can create the clustered index explicitly using the
CREATE CLUSTERED INDEX command.
Now, this begs the question: What is an index?
Lets first look at this from the clustered index perspective. An index sorts the data. For example, lets
say you represent people in a table, with columns such as first name, last name, city, etc. Also, lets say
you define the clustered index over the last name column (last name being the index key). The rows
are now stored in order of last name. Imagine a set of pages (for this index), and in the header of the
first page you have the address (page number) to the next page. This then repeats until the last page.
A page also points back to the previous page. This is, what we call a doubly linked list. The row with the
lowest last name will be the first row on the first page, and vice versa. In other words, you can follow
the linked list from the beginning to the end and you will have read the person in order of the last
name. What we have described here is the leaf level of the index.
We also have a tree structure above this level. Take the first value of the index key (last name) for the
first page, and the same for the second page, etc. Store these on another set of pages, along with the
page number they point to. You now have the level above the leaf level. If one page isnt sufficient for
this level then you keep building higher levels until you have exactly one root page. This is how SQL
Server builds the index tree.
Non-clustered indexes
The description above describes a clustered index. As we know, a table can have a clustered index, or
it might not have a clustered index (in which case it is a heap table). Regardless of which, we can also
have a number of non-clustered indexes for the table. A non-clustered index is also defined over one
or more columns (say first name, for our previous example). SQL Server builds the non-clustered index
pretty much the same way as the clustered index, but in the leaf level of the index we dont have all
columns for the row (the data). Instead, we only have the index key column(s), which in our example
is the first name column along with a pointer to the data row. If the table is a heap table, we will have
the file, page and row number for the row. If the table is a clustered table, we will instead have the
clustering key column values.
You might realize that there is plenty more we could say about these things, digging deeper into
index structures and going deeper into index options. However, the above description is sufficient to
understand what data in tables are used and the fact that we can create an index that SQL Server can
use when finding data. Imagine if SQL Server had to read all the pages that the table uses, just to find
the rows your query is looking for!
SQL Server 101 What is SQL Server and how does it work?
SQL Server 101 What is SQL Server and how does it work?
As mentioned, the GAM and SGAM pages represent approximately 4 GB in the database file, and then
we have another set of GAM and SGAM pages, and so on. The GAM and SGAM pages are always at
fixed positions in the database files. How many pairs of GAMs and SGAMs there are depends on the size
of the database file.
But how can we know what pages a table or index are using, or more precisely, a heap or an index?
Every heap and every index has an IAM (Index Allocation Map) page. The IAM page also maps
approximately 4 GB in the database file and we have one for each extent that this heap or index is
using. IAM pages are of course not at fixed positions, so SQL Server keeps track of the address (page
number) of the first IAM page at the heap or index level. If the heap or index uses extents across a
larger area than 4 GB in the database files, then we have more IAM pages and pointers from the first
IAM page to the next one.
Finally, we have PFS (Page Free Space) pages. A PFS keeps track of how full a page is, approximately.
The first PFS is the second page (page number 1) in each data file. The PFS map approximate 64 MB
in the data file. And they are repeated for each additional 64 MB portion. Each byte (not bit) in the PFS
represents one page in that 64 MB area, with roughly how much free space is included on this page.
The PFS is not maintained for index pages, since that information is not of interest in the first place
when you insert a new row in an index, we always know where to put this row, in the right position
according to the index key..
So, there you have it this is a look at the allocation architecture for SQL Server. Now, you might
wonder whether or not we have to know or care about these things? No, in most cases you can
happily administer and program your SQL Server with no knowledge about GAM, SGAM, IAM and PFSs.
However, an understanding can help you understand some error messages more effectively, index
fragmentation, or just to de-mystify the engine that is under the hood of SQL Server.
If you were to dig further you will discover more details. For example:
A
heap or index can be partitioned (having more than one partition) and you will see that IAMs are
actually at the partition level.
T here are more allocation types (again, IAMs) than heap and index. We have LOB pages, pages for
data types such as varchar(max), nvarchar(max), varbinary(max), text, ntext and image. And there are
also allocations for row-overflow pages we can have a combination of variable length data types
for columns so that a row no longer fits on one column and this is when SQL Server stores some of
the column values on such row-overflow pages.
If you want to dig deeper into this area, I recommend the SQL Server documentation (SQL Server Books
Online, BOL) as a good starting point. Unfortunately, Microsoft has decided to no longer maintain some
of the architectural sections of the product documentation, but we can use the SQL Server 2008 R2
version of BOL which is still accurate:
https://technet.microsoft.com/en-us/library/cc280361(v=sql.105).aspx
10
SQL Server 101 What is SQL Server and how does it work?
Transaction logging
As you know, each database has at least one file for its transaction log, the ldf file. You can have more
than one, but this wont give you any performance impact since they will be used one after the other
serially. Heres a brief description of what happens when you modify data in SQL Server:
Every modification is always done in a transaction. Among other things, a transaction is defined as a
number of modifications that should either be formed all or none an atomic operation. By default, a
single modification statement, such as an INSERT, UPDATE or DELETE, will be performed within its own
transaction meanwhile, if anything fails, while the modification command is being executed, then
everything performed until that point, for that modification command, will be rolled back.
You can also group several modification commands inside the same transaction, using commands
such as BEGIN TRANSACTION, COMMIT TRANSACTION and ROLLBACK TRANSACTION.
When a transaction is started, either implicitly using a modification command, or explicitly by a
BEGIN TRANSACTION command, SQL Server will record in the transaction log that this session has
started a transaction.
For each modification (a row is inserted, updated, deleted, similar for index modifications, etc.), SQL
Server will make sure the modified page is in cache. Every read and modification is served from cache.
If it isnt in cache already then the page will be read from disk and be brought into cache. A log record
is constructed to reflect the modification, and written to the transaction log (not necessarily physically
to the file yet). Now, the page can be modified in cache. This happens for each modification within
this transaction. And in regards to an end of transaction, such as a commit, SQL Server will reflect the
commit in the transaction log and also make sure all log records produced up until the specific point in
time are physically written to the disk (force log write at commit). This is why you want to have good
write performance where the ldf file is located, since the writing to the ldf file is synchronous writes
the application will wait until the operating system and SQL Server has acknowledged that the write
operation has been performed.
The pages that have been modified are dirty at this point. They have been modified since brought
into cache and dont look the same in cache as on disk. SQL Server performs checkpoints now and
then where it writes all dirty pages (for the database) to disk, and also reflects that in the transaction
log This gives SQL Server a starting point for recovery, for instance when you start SQL Server. It will
find the most recent checkpoint in the transaction log and use the transaction log to make sure that
all modifications recorded in the log have actually been made to the data pages, but also roll back
any transactions that were incomplete when you stopped your SQL Server. We sometimes refer to this
as REDO and UNDO. You can see information about this recovery process reflected in the SQL Server
errorlog file, from when you started your SQL Server.
11
SQL Server 101 What is SQL Server and how does it work?
Tools
You get several tools to help you manage SQL Server. The following is a list of the most commonly used
and most important tools included.
SQL Server Management Studio (SSMS)
SSMS is quite simply a graphical user interface allowing you to manage your SQL Server. You can
connect to a SQL Server (or some of the other components) and use Object Explorer to view your
databases, tables, views etc. You can right-click on an object and get a context-menu allowing you to
perform various tasks against that object. And you can open query windows, allowing you to type SQL
queries and commands and execute them and of course also save these as files, script files.
SQL Server Configuration Manager
This tool allows you to configure your server and perform changes that in general are stored outside of
SQL Server (mostly in the Windows Registry) instead of inside SQL Server. These settings include items
such as what Windows account each service is started as, startup parameters and network settings for
your SQL Server.
SQLCMD.EXE
SQLCMD is a command-line tool for executing queries against a SQL Server. You can do this by either
specifying the query as a command-line option, read them from an input-file or use the tool interactively
where you get a prompt from which you can type and use GO to send the queries to your SQL Server.
SQL Server Documentation (also known as Books Online)
This is exactly what it sounds like: the documentation for SQL Server. By default, this will take you to a
website for the documentation. This has several limitations. The navigation is very slow compared to a
local help application and the search functionality is managed by Bing, not limited to only SQL Server.
You can change the documentation to a local application using the tool Manage Help Settings.
Change this to local and then download the parts you are most interested in using the same tool.
12
SQL Server 101 What is SQL Server and how does it work?
13
SQL Server 101 What is SQL Server and how does it work?
Port numbers
When I say SQL Server and port number to somebody, they either look at me like Im from Mars or
they say 1433.
A default instance will, by default (it is changeable) listen on port 1433. And a client application, when you
address the SQL Server instance using only the machine name, host name or IP address, will try 1433. That
is how the client finds the right instance on your server machine. But what if you have several SQL Servers?
Remember that we can have only one default instance - only one listening to 1433.
A named instance will pick a free port by asking the operating system when it first starts. This port
number is saved in the registry and when you start it next time, it will try to use the same port number.
You can see what port an instance is using by looking at the startup information in the SQL Server
errorlog file, or using the SQL Server Configuration Manager tool where you also can change the
port number if you wish. But we dont connect using the port number, you might say. We connect
using the instance name, such as MachineName\InstanceName. Obviously, we have something that
translates the instance name to a port number. This is the SQL Server Browser service. When you have
the \InstanceName part in your connection, the client libraries send a request to the server machine
using port 1434 (UDP), and the SQL Server Browser service on the server machine replies to the client
with the port number that instance is listening on. We can connect using the port number instead of
instance name using MachineName,PortNumer (or host name or IP address instead of machine name).
14
SQL Server 101 What is SQL Server and how does it work?
Windows Logins
When you create a Windows Login, you specify a user, or a group, in your Windows environment (in
your domain, most likely). SQL Server will grab the SID (Security Identifier), from Windows and store the
name and SID. When you connect using a Windows Login, through a trusted connection, SQL Server
verifies that the SID the user is represented as, exists as a login in SQL Server, or any of the groups that
the user is a member. If you are allowed to connect to SQL Server using a Windows login which is a
group, we can still identify the Windows user name inside SQL Server. This is important because we
dont lose the ability to track who is connected or who did what.
Server roles
At the server level, there are roles you can add a login as member to. Theres the public role which every
login is a member. This can be used to grant permission that should apply to everyone. Then there
are eight fixed server roles. You cannot change the permissions that come with a fixed server role, but
you can of course decide who (if anyone) should be a member of that role. Probably, the most known
server role is sysadmin. As a sysadmin, you can do anything everywhere in your SQL Server instance.
This should of course be used very carefully. Think of it as SQL Servers equivalent to Domain Admin.
Examples of other fixed server roles are dbcreator and processadmin.
As of SQL Server 2012, we can also create our own server roles as well as add members and grant
permissions to the role instead of to individual logins.
Gaining access to a database the user
A login only allows you to connect to the instance. This is meaningless unless you can also access the
database(s) you need. This is the user concept. Sometimes we think of this like mapping a login to a
database, or granting access for the login to the database. But what we are really doing is creating a
user (with a name) in the database to, point to, the login. This allows the login to access the database.
You can then grant permission to this user, so they can, for instance, SELECT from a table. In most cases,
we of course have the same name for the user as we have for the login. The connection for the user to
the login is made, though, using the SID for the login. A Windows login SID comes from Windows and
a SQL Servers login SID is generated by SQL Server when you create that login. There is always a user
named dbo in each database. The login who owns a database has the dbo user in that database.
Database roles
Just like the server roles, we also have database roles. We can assign a database role to a user and the
permissions granted to this database roles are now also available for that user. Theres a public database
role and every user always has this role you can grant permissions to this if you want the permissions
to apply to all users in the databases. And there are also fixed database rows. Db_owner gives the same
privileges as the dbo. Examples of other fixed database roles are db_datareader and db_backupoperator.
We can also create our own database roles, assign permissions to them and add users to such a role.
15
SQL Server 101 What is SQL Server and how does it work?
Permissions
Some permissions can be inherited from some of the fixed server or database roles. Other permissions
you will grant specifically, either to a role (server- or database-, depending on type of permission) or
directly to a login or user. In general, you can perform an operation if you have been granted privileges,
unless there is a deny for this operation. Privileges granted accumulate and the following paragraph is
an example of this:
Sue connects to SQL Server using Windows authentication. The Sue Windows account exists as a login
in SQL Server. Sue is also a member of the Accounting Windows group, and that group also exist as a
Windows login on your SQL Server. The Sue login has a user in database A and the Accounting login
has access to database B. Sue will be able to access both databases A and B. In database A, the Sue
user is also a member of a database role named, X. In this database, Sue will be able to perform the
operations that have been granted to her, and also the operations that have been granted to the X
database role. The same principle goes for the B database, of course. The exception to this is DENY,
which overrides GRANT. With DENY, you know that a login or user cannot perform the operation in
question. But there is an exception to this as well: a sysadmin can do everything in the instance. SQL
Server doesnt even check DENY for somebody who is sysadmin.
16
SQL Server 101 What is SQL Server and how does it work?
SQL Server Integration Services (SSIS) is a more advanced tool for export and import. We sometimes
refer to these types of tools as ETL tools, as in Extract, Transform and Load. You define the transfer
using a design tool. This came with the product and was named SQL Server Business Intelligence
Development Studio in SQL Server 2008 and SQL Server Data Tools in SQL Server 2012. As of SQL
Server 2014, this no longer comes with the product and is a separate download. Regardless of version,
the design tool is in the end a plug-in to the Visual Studio development environment.
When you create an SSIS package, you add and configure Tasks. A task performs an action, such as
running an EXE file, creating a folder, executing a SQL command or sending an email. There are a
number of tasks available, and you connect these using Precedence Constraints, which basically define
the sequence to execute your tasks. A special task type is the Data Flow Task, where you add data
sources and destinations. And in between, you can have transformations which can do things such
as lookups, aggregations, and calculations, etc.
The package can then be executed using various methods. You can use the Execute Package Utility
GUI program, DTEXEC.EXE command-line program or a SQL Server Agent job step. There is much more
functionality in SSIS. You can also create SSIS packages using the easier to use but less powerful Import
and Export Wizards.
17
SQL Server 101 What is SQL Server and how does it work?
18
SQL Server 101 What is SQL Server and how does it work?
19