SSAS HardwareSizingTabularSolutions
SSAS HardwareSizingTabularSolutions
SSAS HardwareSizingTabularSolutions
Copyright
This document is provided as-is. Information and views expressed in this
document, including URL and other Internet Web site references, may change
without notice. You bear the risk of using it.
Some examples depicted herein are provided for illustration only and are fictitious.
No real association or connection is intended or should be inferred.
This document does not provide you with any legal rights to any intellectual
property in any Microsoft product. You may copy and use this document for your
internal, reference purposes.
2013 Microsoft. All rights reserved.
Contents
Introduction................................................................................................................ 4
Hardware Considerations for a Development Environment........................................5
Memory...................................................................................................................... 7
Calculate Memory for Databases at Rest.................................................................8
Memory Requirements for Disaster Recovery.........................................................9
Memory Requirements for Program Execution......................................................10
Estimating Memory for Processing........................................................................10
Use a formula to get an initial estimate.............................................................11
Measure memory used by individual objects.....................................................12
Refine the estimate by building a prototype that yields a better value for data
compression....................................................................................................... 13
Calculate actual compression rate.....................................................................14
Estimating Memory for Querying...........................................................................14
About Concurrency Testing.................................................................................... 15
Monitor memory usage during processing and querying......................................15
Key Points.............................................................................................................. 17
Memory Bandwidth and Speed................................................................................. 18
Key Points.............................................................................................................. 19
CPU........................................................................................................................... 19
Cores..................................................................................................................... 20
Other considerations.......................................................................................... 20
Onboard Cache (L1, L2)......................................................................................... 21
NUMA.................................................................................................................... 22
Monitor CPU Usage................................................................................................ 22
CPU usage during query execution....................................................................23
Key Points........................................................................................................... 24
Disk I/O..................................................................................................................... 25
Hardware Configuration Examples........................................................................... 26
Conclusion................................................................................................................ 27
Introduction
This document provides hardware sizing guidance for in-memory Analysis Services
tabular databases so that you can determine the amount of memory and CPU
resources required for query and processing workloads in a production environment.
NOTE: This guide is focused exclusively on in-memory tabular solutions.
DirectQuery models, which execute queries against a backend relational database,
have different resource requirements and are out of scope for this guide.
For in-memory solutions, the best query performance is typically realized on
hardware that maximizes the following:
Notice that disk I/O is not a primary factor in sizing hardware for a tabular solution,
as the model is optimized for in-memory storage and data access. When evaluating
hardware for a tabular solution, your dollars are better spent on the memory
subsystem rather than high performance disks.
When sizing hardware for an in-memory database, your focus should be on three
operational objectives:
To help you evaluate the resources needed to support each objective, we start each
section with brief description of how a resource is used, provide estimation
techniques you can apply to your own solution, and conclude with key takeaways
that summarize important points.
This guide also includes a summary of existing hardware configurations to give you
an idea of the range of hardware currently supporting production workloads.
5
Sometimes knowing what works for other people is the most valuable information
you can have.
The advantage of using SQL Server Management Studio to estimate database size is
that the database is loaded into memory when you access its properties. Using
other approaches, for example using Performance Monitor to get memory usage for
the msmdsrv process at startup, might initially under-report memory usage
7
because youll get only the memory used by Analysis Services plus the metadata of
any databases that its hosting. Actual data wont be read into memory until the
first query is issued to the database.
NOTE: Although reading estimated size from SSMS is the easiest approach, its not
the most accurate approach because the value is estimated at a specific point in
time, and then retained for the duration of the connection. Later, well present
alternative approaches that provide more accuracy.
About the Workspace Database
When creating a tabular model in SQL Server Data Tools, a workspace database
takes up memory on your development machine. As you monitor system resource
usage on your development machine, remember that artifacts of the development
phase, like the workspace database, are not part of production environments.
For this reason, when using Performance Monitor or other tools that report memory
usage at the instance level, remember to unload the workspace database so that
you get a more accurate assessment of memory usage.
To unload the database, simply close the project in SQL Server Data Tools (SSDT).
The workspace database is immediately removed from the server instance you are
using for development purposes.
NOTE: A workspace database will not unload if you set the Workspace retention
option to Keep in memory. If this is the case, right-click the database and choose
Detach to unload the database.
Finally, consider the effects of having multiple workspace databases on a single
development server. If multiple developers are using the same workspace database server
then you might have multiple copies of the database in memory and on disk.
1. In SSDT, go to Tools | Options | Analysis Services.
2. If "Keep workspace in memory" is set for multiple development machines then these
workspaces will collectively consume memory on the database server. Using "Keep
workspace databases on disk but unload from memory" is the better choice if
multiple developers share the same server.
Memory
As you can see, there are variable and fixed components to a tabular instance
deployment. If you never load a single data base, you will still need memory for the
Formula Engine and msmdsrv process. Each database will place additional demands
on the system in the form of data dictionaries, column segments, and query caches
(not shared across databases).
For projects at a more preliminary stage, including projects with data sizes that
exceed the capacity of development machines, you will need to base your estimate
on the uncompressed dataset that you are modeling, or build a prototype that uses
a subset of your data.
Many customers routinely use PowerPivot for Excel as a tool for testing how much
compression theyll get for their raw data. Although PowerPivot uses a slightly
different compression algorithm, the compression engine is the same, allowing you
to arrive at reasonable estimate when prototyping with the PowerPivot add-in.
PowerPivot add-in is available in 32-bit and 64-bit versions. The 64-bit version
supports much bigger models. Try to use that version if you can.
Other customers who work with large datasets usually apply filters during import to
select a subset of data (for example, filtering on one days worth of transactions).
This approach gives you a smaller and more manageable dataset, while retaining
the ability to extrapolate a realistic estimate of the final dataset (all things being
equal, if one days dataset is 20 MB, a month is roughly 600 MB). When choosing a
filter, however, it is important to make sure that the subset of the data is still
representative of the overall data. For example, it might make more sense to filter
by date than by city or state.
10
If you are contending with tabular model deployment on a system that has little
RAM to spare, you can optimize your model to reduce its memory footprint.
Common techniques include omitting high cardinality columns that are not
necessary in the model. Another tradeoff that a solution architect will consider is
using a measure in lieu of a calculated column. A calculated column is evaluated
during processing and persisted to memory. As a result, it performs well during
query execution. Contrast that with a measure that provides equivalent data, but is
generated during query execution and exists only until evicted from cache. The
query runs slower due to extra computations, but the benefit is a reduction in
persistent storage used on an ongoing basis.
11
Alternatively, import a subset of rows from each table and then multiply to get an
estimated size.
12
For a SQL Server database, you can run sp_spaceused to return size information for
each table. The following screenshot provides an illustration using the
FactOnlineSales table, which measures around 362 megabytes.
Alternatively, to get overall database size, omit the table name and execute
sp_spaceused as follows:
Use ContosoRetailDW;
Go
Exec sp_spaceused;
Go
13
Memory used to store the database copy is immediately released after processing
concludes, but that doesnt mean memory usage diminishes to just the RAM needed
to store the database. Queries and Storage Engine caches also consume memory.
On the server, a query devolves into table scans that select and aggregate data,
calculations, and other operations.
As you investigate memory usage by tabular solutions, you
might come across other formulas that are more robust
mathematically, yet harder to use if you are not deeply familiar
with the data. In the end, you might find it more productive to
use a simplistic formula for an initial estimate, and then move
on to prototype a solution using a subset of your own data.
While running a DMV is acceptable, we recommend that you download and use a
workbook created and published by Kasper De Jonge, an Analysis Services program
manager. His workbook uses DMV queries to report memory usage by object, but
improves upon the raw DMV by organizing and presenting the results in a hierarchy
that lets you drill down into the details.
14
Refine the estimate by building a prototype that yields a better value for
data compression
As we work our way up the continuum of estimation techniques, we arrive at one of
the more robust approaches: prototyping using your own data.
The best way to determine how well your data compresses in a tabular solution is
to start with some initial data imports. Because the objective is to understand
compression behavior, treat this as a prototyping exercise. If you were building a
model you planned to keep, you would spend time thinking about model design. For
our purposes, you can set those problems aside and focus simply on choosing which
tables and columns to import for the purpose of estimating database size.
The following steps approach prototyping from the standpoint of large datasets. If
your dataset is not large, you can just run Process Full. Otherwise, process just
one table at time, and then run Process Recalc at the end to process table
dependencies.
1. Create a new tabular project using SQL Server Data Tools.
2. Import the largest table from your external data source into a tabular model.
If its a fact table, exclude any columns that are not needed in the model.
Usually, the primary key of a fact table is a good candidate for exclusion, as
are columns that are only required for ETL processing.
3. Process the table and deploy it to a development server.
4. If deployment succeeds, measure the size of the compressed table in
memory. If deployment fails, apply a filter to get a smaller rowset, while
ensuring that the filtered rowset is still representative of the overall table.
15
5. Import and process a second table, deploy the solution, measure memory
usage, and then repeat with additional tables.
6. Stop when you have sufficient dataset representation in your model.
7. As a final processing step, run Process Recalc to process relationships.
8. Measure the memory used by the database by viewing database properties in
SQL Server Management Studio, or by using DMVs if you want to drill into the
details. At this point, you now have a solid foundation for projecting how
much memory youll need for the rest of the data.
Calculate actual compression rate
Once youve processed and deployed a solution, you have compressed database
files on disk that you can use to calculate a more realistic compression rate.
Comparing the file size of compressed data against uncompressed data gives you
the actual compression rate for your solution.
After you get an actual compression rate, you can replace the denominator (10) in
the simple formula with a more realistic value.
1. Get the file size of the original uncompressed data.
2. Find the \Program Files\Microsoft SQL
Server\MSAS11.<instance>\OLAP\DATA\<db folder>, and note the file size.
3. Divide the result from step 1 by the result in step 2 to get the compression
ratio.
4. Re-compute the simple formula, replacing 10 with your actual compression
rate, to get estimated memory requirements for processing.
NOTE: While this approach is generally reliable, in-memory databases tend to be
somewhat larger than database files on disk. In particular, having highly unique
columns will cause memory usage to exceed the size of data in the data folder. As a
redundant measure, use alternative methods such as the DMV, database property
page in Management Studio, or Performance Monitor to further check database size.
16
DAX query optimization is beyond the scope of this guide, but other sources are
available that cover this material. See the links at the end of this document for
more information.
17
18
Next, issue queries against the database to understand the query profile of your
client application.
If the query can be executed against the compressed data, you wont see a
noticeable difference in memory usage on the server. Otherwise, youll see a
transient surge in memory usage as temporary tables are created to store and scan
uncompressed data.
Although memory spikes during query execution are sporadic, the same cannot be
said for client applications. Client applications will most certainly consume memory
as data is retrieved from the tabular model. As part of your investigation, consider
adding client processes to the trace to monitor memory usage.
1. On the client computer, start the client applications used to query the model.
2. In Performance Monitor start a new trace.
3. Add counters for client processes. Client processes are listed under the
Process object in Performance Monitor.
a. For Management Studio, select ssms (not to be confused with smss).
b. For Excel, select excel.
c. For Power View in SharePoint, hosting is in one the SharePoint Service
Application AppPool processes (usually w3wp.exe).
Key Points
A tabular solution uses memory during processing, when loading metadata after a
service restart, and when loading remaining data on the first query issued against
the model. Once a tabular database is loaded, memory usage remains relatively
stable unless SSAS needs to build temporary tables during query execution.
19
Relative to processing and basic storage requirements, memory used for queries
tends to be minimal. If youre monitoring a query workload in Performance Monitor,
youll notice that memory usage is often flat for many queries.
A query can result in a temporary but radical uptick in memory usage if a column or
table needs to be decompressed during query execution. Certain functions (such as
EARLIEST, SUMX, and FILTER) are known to have performance impact. Always test
the queries and reports you plan to run in a production environment to understand
their performance profile.
When calculating memory usage on a development machine, any workspace
databases loaded in memory will skew your measurements. Be sure to unload the
workspace database when collecting metrics about memory usage.
Finally, remember to monitor memory used by client applications. A query might be
trivial for the Storage Engine, yet bring a client workstation to its knees if a massive
amount of data is read into its memory.
in-memory solution. As you evaluate different hardware systems, look for systems
that offer better than average memory performance and integrated memory
controllers.
Key Points
Tabular solution architectures are optimized for query performance, predicated on
RAM storage being much faster to read from than disk. When evaluating RAM,
consider memory designs that balance throughput and speed.
CPU
Selecting a fast CPU with a sufficient number of cores is also a top consideration. In
a tabular solution, CPU utilization is greatest when query evaluation is pushed to the
Storage Engine. In contrast, CPU bottlenecks are more likely to occur when a query
or calculation is pushed to the single-threaded Formula Engine. Because each query
is single-threaded in the Formula Engine, on a multi-core system, you might see one
processor at maximum utilization while others remain idle.
Constructing queries that only run in the Storage Engine might sound tempting, but
its unrealistic as a design goal. If the point of your model is to provide insights that
solve business problems, youll need to provide queries and expressions that meet
business goals, irrespective of query execution mechanics. Furthermore, if youre
provisioning a server that hosts self-service BI solutions built and published by other
people, query syntax construction is probably beyond your control.
A CPU with clock speeds of at least 2.8 to 3 GHz is your best insurance against
queries that execute single-threaded in the Formula Engine, slow queries that are
difficult optimize, or suboptimal query syntax created by novice model designers.
Queries that run only in the Formula Engine, such as an evaluation of a SUMX or
Filter operation, are single-threaded and a common query performance bottleneck.
FILTER iterates over the entire table to determine which rows to return.
Effect of Concurrent Queries on CPU
The number of clients requesting data from the model will also factor heavily into
how much CPU resource youll need. As previously noted, each query moves
through the Formula Engine as a single-threaded operation. If you have 100 unique
queries running simultaneously, youll want significantly more cores to handle the
load.
Processing Load on CPU
Processing can take a long time to complete, but is typically not considered to be
CPU intensive. Each processing job uses one to two cores; one core to read the
data, and another core for encoding. Given that each partition within one table must
be processed sequentially, the pattern of a processing operation tends to be a small
number of cores sustained over a longer period of time. However, if processing
many tables in parallel, CPU usage can rise.
Cores
Now that you have a basic understanding of how CPU resources are used, lets
move on to specific CPU designs most often used to support medium to large
solutions.
For tabular solutions, the most frequently cited CPU designs range from 8 to 16
cores. Performance appears to be better on systems that have fewer sockets. For
example, 2 sockets with 8 fast cores, as opposed to 4 sockets with 4 cores. Recall
from the previous section the importance of memory bandwidth and speed in data
access. If each socket has its own memory controller, then in theory, we should
expect that using more cores per socket offers better performance than more
sockets with fewer cores.
Equally important, performance gains tend to level off when you exceed 16 cores.
Performance doesnt degrade as you add more cores; it just fails to produce the
same percentage increase that you achieved previously. This behavior is not specific
to tabular solutions; similar outcomes will be encountered when deploying any
memory intensive application on a large multi-core system.
The problem is that memory allocations fall behind relative to the threads making
memory requests. Contention arises as all cores to read and write to the same
shared resource. Operations become serialized, effectively slowing down server
performance. The end result is that a CPU might be at 30-40% utilization yet unable
22
24
NUMA
Unlike its multidimensional (MOLAP) counterpart, a tabular solution is not NUMA
aware. Neither the Formula Engine nor the Storage Engine will modify execution
when running on NUMA machines.
This means that if you have a NUMA machine, you might run into worse
performance than if you used a non-NUMA machine with the same number of cores.
Typically, this only happens on systems having more than 4 NUMA nodes.
Performance degradation occurs when memory access has to traverse NUMA nodes
(i.e., a thread or instruction executing on one node needs something that is
executing on another node). When choosing between systems that have the same
number of cores and RAM, pick a non-NUMA system if you can.
To offset performance degradation, consider setting processor affinity on Hyper-V
VMs, and then installing Analysis Services tabular instances on each VM. For more
information about this technique, see Using Tabular Models in a Large-scale
Commercial Solution and Forcing NUMA Node affinity for Analysis Services Tabular
databases.
25
26
When you run the trace, you can monitor query execution to determine query
duration and execution. Queries pushed to the Storage Engine are indicated through
the event name. A line has VertiPaq in the name tells you that part of the expression
has been pushed down to xVelocity Storage Engine.
Key Points
CPU resources are heavily used for queries and some calculations. Simple
mathematical computations based on a single column (for example, summing or
averaging a numeric column) are pushed to the Storage Engine and executed as a
multi-threaded operation on multiple cores. In contrast, a calculation that ranks or
27
sorts values requires the single-threaded Formula Engine, using just one core and
possibly lots of memory depending on the size of the temporary table.
Incremental performance gains tend to level off after 16 logical cores. Although a
greater number of cores (32 or 64) will definitely increase capacity, you wont see
the same gain in performance increase when going beyond 16 cores.
As SSAS Tabular is not NUMA aware, avoid NUMA unless you need it for other
applications that run on the same hardware. There will be longer wait times if a
request has to traverse NUMA nodes during a read operation.
Finally, when youve narrowed your server selection to a few choices, take a look at
the onboard cache and choose the system that offers the larger onboard cache.
Query cache optimizations in the tabular engine target the L1 and L2 cache. You
gain the most benefit from those optimizations on a system that offers more
onboard cache.
Disk I/O
Disk I/O, which normally looms large in any hardware sizing exercise, is less of a
concern in tabular solution hardware sizing because given sufficient RAM, tabular
solutions are not reading or writing to disk during query execution. Solid
performance of table scans, aggregations, and most calculations are predicated on
having an ample supply of RAM. If the operating system has to page memory to
disk, performance degrades dramatically.
On a properly provisioned server, disk IO occurs infrequently, but at predictable
intervals. Youll always see disk I/O activity during processing when reading from a
relational database (in comparison, saving the tabular database files to disk is
relatively quick). You will also see some I/O after system restart when metadata and
data dictionaries are loaded into memory. Data dictionaries are loaded sequentially
so this step can take some time if you have a large solution or lots of smaller
solutions. You will see I/O activity again when the rest of the data is loaded, typically
when the first query is executed. For query workloads, the ideal system should have
sufficient memory so that paging to disk does not occur at all.
Although disk I/O is not a hardware investment to maximize, dont discount it
entirely. A system that loads many gigabytes of data from disk to memory will
perform better if the disk is fast.
NOTE: Paging to disk will only occur if you set the VertiPaqPagingPolicy to 1.
The default setting is 0, which disallows paging to disk. For more details, Memory
Settings in Tabular Instances of Analysis Services.
28
40 GB
System Information
Dell PowerEdge
R810, dual 8-core
CPU
Hewlett-Packard
ProLiant DL580 (2)
RAM
256 GB
1 TB
Other Details
Server runs other SQL Server
features as well, including the
relational engine and Analysis
Services in multidimensional
mode.
Processing for the tabular solution
runs on the same server.
ProcessFull on a weekly basis,
and ProcessUpdate nightly.
Multi-tenant architecture
supporting at least 4 virtual
machines will run on the two
systems, hosting Analysis
Services in tabular mode, Analysis
Services in multidimensional
mode, SharePoint with Reporting
Services Power View, and a SQL
Server relational database engine.
Decisions about how to allocate
memory across all VMs are still
pending.
Solution design consists of several
smaller tabular models, about 10
total, consuming around 40 GB of
memory all together.
4 GB
29
Hewlett-Packard
96 GB
6 GB
ProLiant BL460 G7
Processor: 2 x Intel
X5675 3.07 GHz
Commodity blade
servers
Conclusion
In this guide, we reviewed a methodology for estimating memory requirements for a
database at steady state, under processing workloads, and under query workloads.
We also covered hardware configurations and tradeoffs to get the best price to
performance ratio.
In simplest terms, when budgeting hardware for a tabular database, you should
maximize these system resources.
If you purchase a high-end machine, you can get immediate use out of excess
RAM by distributing available memory across multiple VMs dedicated to
different applications and workloads, and then reconfigure memory and cores
as capacity requirements increase.
Remember that compression and query performance are variable, depending
on data cardinality, data density, and the types of queries that run on the
server. Two different models that both measure 120 GB in size will use
resources differently depending on the types of queries submitted to each
one. You will need to approach each solution as a unique project and do
thorough testing to determine the hardware requirements for each one.
Continuous monitoring is essential to anticipating future capacity needs. Pay
close attention to memory usage over time, especially if subsequent
processing is resulting in larger and larger models, to ensure that query
performance stays robust.
31
Create a memory-efficient Data Model using Excel 2013 and the PowerPivot add-in
http://blogs.msdn.com/b/sqlsakthi/archive/2012/05/19/cool-now-we-have-acalculator-for-finding-out-a-max-server-memory-value.aspx
http://blogs.msdn.com/b/sqlsakthi/p/max-server-memory-calculator.aspx
Load Testing
(video) Optimizing Your BI Semantic Model for Performance and Scale
(video) Load Testing Analysis Services
Can your BI solution scale?
http://www.tomshardware.com/reviews/ram-speed-tests,1807-2.html
DAX Optimizations
http://mdxdax.blogspot.com/2011/12/dax-query-plan-part-1-introduction.html
http://mdxdax.blogspot.com/2012/01/dax-query-plan-part-2-operator.html
http://mdxdax.blogspot.com/2012/03/dax-query-plan-part-3-vertipaq.html
http://www.powerpivotblog.nl/tune-your-powerpivot-dax-query-dont-use-the-entiretable-in-a-filter-and-replace-sumx-if-possible
http://sqlblog.com/blogs/marco_russo/archive/2011/02/07/powerpivot-filtercondition-optimizations.aspx
http://www.sqlbi.com/articles/optimize-many-to-many-calculation-in-dax-withsummarize-and-cross-table-filtering/
http://sqlblog.com/blogs/marco_russo/archive/2012/09/04/optimize-summarize-withaddcolumns-in-dax-ssas-tabular-dax-powerpivot.aspx
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
32
Send feedback.
33