VSAN Design and Sizing Guide
VSAN Design and Sizing Guide
VSAN Design and Sizing Guide
Contents
INTRODUCTION ........................................................................................................................................ 5
HEALTH SERVICES ...................................................................................................................................................5
VIRTUAL SAN READY NODES ............................................................................................................... 6
VMWARE EVO:RAIL ................................................................................................................................. 6
VIRTUAL SAN DESIGN OVERVIEW ..................................................................................................... 7
FOLLOW THE COMPATIBILITY GUIDE (VCG) PRECISELY ..................................................................................7
Hardware, drivers, firmware ............................................................................................................................ 7
USE SUPPORTED VSPHERE SOFTWARE VERSIONS...............................................................................................7
BALANCED CONFIGURATIONS.................................................................................................................................7
LIFECYCLE OF THE VIRTUAL SAN CLUSTER ........................................................................................................8
SIZING FOR CAPACITY, MAINTENANCE AND AVAILABILITY ...............................................................................9
SUMMARY OF DESIGN OVERVIEW CONSIDERATIONS ..........................................................................................9
HYBRID AND ALL-FLASH DIFFERENCES ....................................................................................... 10
ALL-FLASH CONSIDERATIONS.......................................................................................................... 10
VIRTUAL SAN LIMITS .......................................................................................................................... 11
MINIMUM NUMBER OF ESXI HOSTS REQUIRED ................................................................................................ 11
MAXIMUM NUMBER OF ESXI HOSTS ALLOWED ................................................................................................ 11
MAXIMUM NUMBER OF VIRTUAL MACHINES ALLOWED .................................................................................. 11
MAXIMUM NUMBER OF VIRTUAL MACHINES PROTECTED BY VSPHERE HA ................................................ 12
DISKS, DISK GROUP AND FLASH DEVICE MAXIMUMS ........................................................................................ 12
COMPONENTS MAXIMUMS ................................................................................................................................... 13
VM STORAGE POLICY MAXIMUMS ....................................................................................................................... 13
MAXIMUM VMDK SIZE ........................................................................................................................................ 14
SUMMARY OF DESIGN CONSIDERATIONS AROUND LIMITS .............................................................................. 15
NETWORK DESIGN CONSIDERATIONS .......................................................................................... 16
NETWORK INTERCONNECT - 1GB/10GB ......................................................................................................... 16
ALL-FLASH BANDWIDTH REQUIREMENTS ......................................................................................................... 16
NIC TEAMING FOR REDUNDANCY ....................................................................................................................... 16
MTU AND JUMBO FRAMES CONSIDERATIONS ................................................................................................... 17
MULTICAST CONSIDERATIONS ............................................................................................................................ 17
NETWORK QOS VIA NETWORK I/O CONTROL................................................................................................. 17
SUMMARY OF NETWORK DESIGN CONSIDERATIONS ........................................................................................ 18
VIRTUAL SAN NETWORK DESIGN GUIDE ........................................................................................................... 18
STORAGE DESIGN CONSIDERATIONS ............................................................................................ 19
DISK GROUPS .......................................................................................................................................................... 19
CACHE SIZING OVERVIEW ..................................................................................................................................... 19
FLASH DEVICES IN VIRTUAL SAN ....................................................................................................................... 20
Purpose of read cache ........................................................................................................................................20
Purpose of write cache .......................................................................................................................................20
PCIE FLASH DEVICES VERSUS SOLID STATE DRIVES (SSDS)......................................................................... 21
FLASH ENDURANCE CONSIDERATIONS ............................................................................................................... 22
FLASH CAPACITY SIZING FOR ALL-FLASH CONFIGURATIONS .......................................................................... 22
VMware Storage and Availability Documentation / 2
Introduction
VMware Virtual SAN is a hypervisor-converged, software-defined storage
platform that is fully integrated with VMware vSphere. Virtual SAN aggregates
locally attached disks of hosts that are members of a vSphere cluster, to create a
distributed shared storage solution. Virtual SAN enables the rapid provisioning of
storage within VMware vCenter as part of virtual machine creation and
deployment operations. Virtual SAN is the first policy-driven storage product
designed for vSphere environments that simplifies and streamlines storage
provisioning and management. Using VM-level storage policies, Virtual SAN
automatically and dynamically matches requirements with underlying storage
resources. With Virtual SAN, many manual storage tasks are automated - delivering
a more efficient and cost-effective operational model.
Virtual SAN 6.0 provides two different configuration options, a hybrid configuration
that leverages both flash-based devices and magnetic disks, and an all-flash
configuration. The hybrid configuration uses server-based flash devices to provide a
cache layer for optimal performance while using magnetic disks to provide capacity
and persistent data storage. This delivers enterprise performance and a resilient
storage platform. The all-flash configuration uses flash for both the caching layer
and capacity layer.
There are a wide range of options for selecting a host model, storage controller as
well as flash devices and magnetic disks. It is therefore extremely important that the
VMware Compatibility Guide (VCG) is followed rigorously when selecting hardware
components for a Virtual SAN design.
This document focuses on the helping administrators to correctly design and size a
Virtual SAN cluster, and answer some of the common questions around number of
hosts, number of flash devices, number of magnetic disks, and detailed configuration
questions to help to correctly and successfully deploy a Virtual SAN.
Health Services
Virtual SAN 6.0 comes with a Health Services plugin. This feature checks a range of
different health aspects of Virtual SAN, and provides insight into the root cause of
many potential Virtual SAN issues. The recommendation when deploying Virtual
SAN is to also deploy the Virtual SAN Health Services at the same time. Once an issue
is detected, the Health Services highlights the problem and directs administrators to
the appropriate VMware knowledgebase article to begin problem solving.
Please refer to the Virtual SAN Health Services Guide for further details on how to get
the Health Services components, how to install them and how to use the feature for
validating a Virtual SAN deployment and troubleshooting common Virtual SAN
issues.
VMware EVO:RAIL
Another option available to customer is VMware EVO:RAIL. EVO:RAIL combines
VMware compute, networking, and storage resources into a hyper-converged
infrastructure appliance to create a simple, easy to deploy, all-in-one solution
offered by our partners. EVO:RAIL software is fully loaded onto a partners
hardware appliance and includes VMware Virtual SAN. Further details on EVO:RAIL
can be found here:
http://www.vmware.com/products/evorail
Balanced configurations
As a best practice, VMware recommends deploying ESXi hosts with similar or
identical configurations across all cluster members, including similar or identical
Ensure that all the hardware used in the design is supported by checking the
VMware Compatibility Guide (VCG)
Ensure that all software, driver and firmware versions used in the design are
supported by checking the VCG
Ensure that the latest patch/update level of vSphere is used when doing a
new deployment, and consider updating existing deployments to the latest
patch versions to address known issues that have been fixed
Design for availability. Consider designing with more than three hosts and
additional capacity that enable the cluster to automatically remediate in the
event of a failure
Design for growth. Consider initial deployment with capacity in the cluster
for future virtual machine deployments, as well as enough flash cache to
accommodate future capacity growth
All-flash considerations
Components maximums
Virtual machines deployed on Virtual SAN are made up of a set of objects. For
example, a VMDK is an object, a snapshot is an object, VM swap space is an object,
and the VM home namespace (where the .vmx file, log files, etc. are stored) is also an
object. Each of these objects is comprised of a set of components, determined by
capabilities placed in the VM Storage Policy. For example, if the virtual machine is
deployed with a policy to tolerate one failure, then objects will be made up of two
replica components. If the policy contains a stripe width, the object will be striped
across multiple devices in the capacity layer. Each of the stripes is a component of
the object. The concepts of objects and components will be discussed in greater
detail later on in this guide, but suffice to say that there is a maximum of 3,000
components per ESXi host in Virtual SAN version 5.5, and with Virtual SAN 6.0 (with
on-disk format v2), the limit is 9,000 components per host. When upgrading from
5.5 to 6.0, the on-disk format also needs upgrading from v1 to v2 to get the 9,000
components maximum. The upgrade procedure is documented in the Virtual SAN
Administrators Guide.
Consider enabling vSphere HA on the Virtual SAN cluster for the highest level
of availability. vSphere HA in version 6.0 can protect up to 6,400 virtual
machines.
Consider the number of hosts (and fault domains) needed to tolerate failures.
Consider the number of devices needed in the capacity layer to implement a
stripe width.
Consider component count, when deploying very large virtual machines. It is
unlikely that many customers will have requirements for deploying multiple
62TB VMDKs per host. Realistically, component count should not be a
concern in Virtual SAN 6.0.
Keep in mind that VMDKs, even 62TB VMDKs, will initially be thinly
provisioned by default, so customers should be prepared for future growth in
capacity.
Multicast considerations
Multicast is a network requirement for Virtual SAN. Multicast is used to discover
ESXi hosts participating in the cluster as well as to keep track of changes within the
cluster. It is mandatory to ensure that multicast traffic is allowed between all the
nodes participating in a Virtual SAN cluster.
Multicast performance is also important, so one should ensure a high quality
enterprise switch is used. If a lower-end switch is used for Virtual SAN, it should be
explicitly tested for multicast performance, as unicast performance is not an
indicator of multicast performance.
A link to the guide can be found in the further reading section of this guide, and is
highly recommended.
Disk groups
Disk groups can be thought of as storage containers on Virtual SAN; they contain a
maximum of one flash cache device and up to seven capacity devices: either
magnetic disks or flash devices used as capacity in an all-flash configuration. To put
it simply, a disk group assigns a cache device to provide the cache for a given
capacity device. This gives a degree of control over performance as the cache to
capacity ratio is based on disk group configuration.
If the desired cache to capacity ratio is very high, it may require multiple flash
devices per host. In this case, multiple disk groups must be created to accommodate
this since there is a limit of one flash device per disk group. However there are
advantages to using multiple disk groups with smaller flash devices. They typically
provide more IOPS and also reduce the failure domain.
The more cache to capacity, then the more cache is available to virtual machines for
accelerated performance. However, this leads to additional costs.
Design decision: A single large disk group configuration or multiple smaller disk
group configurations.
However, in all-flash configurations, having a high endurance flash cache device can
extend the life of the flash capacity layer. If the working sets of the application
running in the virtual machine fits mostly in the flash write cache, then there is a
reduction in the number of writes to the flash capacity tier.
Note: In version 6.0 of Virtual SAN, if the flash device used for the caching layer in
all-flash configurations is less than 600GB, then 100% of the flash device is used for
cache. However, if the flash cache device is larger than 600GB, the only 600GB of the
device is used for caching. This is a per-disk group basis.
Design consideration: For all flash configurations, ensure that flash endurance is
included as a consideration when choosing devices for the cache layer. Endurance
figures are included on the VCG.
Design consideration: When sizing disk groups in all-flash configurations, consider
using flash devices that are no larger than 600GB per disk group for best
optimization.
Values
20GB
1,000
20GB x 1,000 = 20,000GB = 20TB
10%
20TB x .10 = 2TB
SSD Tier
All-Flash Capacity
Hybrid Caching
All-Flash Caching
(medium workload)
All-Flash Caching
(high workload)
TB Writes Per
Day
0.2
1
2
TB Writes in 5
years
365
1825
3650
7300
If a vendor uses full Drive Writes Per Day (DWPD) in their specification, by doing
the conversion shown here, one can obtain the endurance in Terabytes Written
(TBW). For Virtual SAN, what matters from an endurance perspective is how much
data can be written to an SSD over the warranty period of the drive (in this example,
it is a five-year period).
Magnetic Disks
Magnetic disks have two roles in hybrid Virtual SAN configurations. They make up
the capacity of the Virtual SAN datastore in hybrid configurations.
The number of magnetic disks is also a factor for stripe width. When stripe width is
specified in the VM Storage policy, components making up the stripe will be placed
on separate disks. If a particular stripe width is required, then there must be the
required number of disks available across hosts in the cluster to meet the
requirement. If the virtual machine also has a failure to tolerate requirement in its
policy, then additional disks will be required on separate hosts, as each of the stripe
components will need to be replicated.
In the screenshot below, we can see such a configuration. There is a stripe width
requirement of two (RAID 0) and a failure to tolerate of one (RAID 1). Note that all
components are placed on unique disks by observing the HDD Disk Uuid column:
Note that HDD refers to the capacity device. In hybrid configurations, this is a
magnetic disk. In all-flash configurations, this is a flash device.
Magnetic disk performance NL SAS, SAS or SATA
When configuring Virtual SAN in hybrid mode, the capacity layer is made up of
magnetic disks. A number of options are available to Virtual SAN designers, and one
needs to consider reliability, performance, capacity and price. There are three
magnetic disk types supported for Virtual SAN:
NL-SAS can be thought of as enterprise SATA drives but with a SAS interface. The
best results can be obtained with SAS and NL-SAS. SATA magnetic disks should only
be used in capacity-centric environments where performance is not prioritized.
Magnetic disk capacity NL-SAS, SAS or SATA
SATA drives provide greater capacity than SAS drives for hybrid Virtual SAN
configurations. On the VCG for Virtual SAN currently, there are 4TB SATA drives
available. The maximum size of a SAS drive at the time of writing is 1.2TB. There is
definitely a trade-off between the numbers of magnetic disks required for the
capacity layer, and how well the capacity layer will perform. As previously
mentioned, although they provide more capacity per drive, SAS magnetic disks
should be chosen over SATA magnetic disks in environments where performance is
desired. SATA tends to less expensive, but do not offer the performance of SAS.
SATA drives typically run at 7200 RPM or slower.
Magnetic disk performance RPM
SAS disks tend to be more reliable and offer more performance, but at a cost. These
are usually available at speeds up to 15K RPM (revolutions per minute). The VCG
lists the RPM (drive speeds) of supported drives. This allows the designer to choose
the level of performance required at the capacity layer when configuring a hybrid
Virtual SAN. While there is no need to check drivers/firmware of the magnetic disks,
the SAS or SATA drives must be checked to ensure that they are supported.
Since SAS drives can perform much better than SATA, for performance at the
magnetic disk layer in hybrid configurations, serious consideration should be given
to the faster SAS drives.
Cache-friendly workloads are less sensitive to disk performance than cacheunfriendly workloads. However, since application performance profiles may change
over time, it is usually a good practice to be conservative on required disk drive
performance, with 10K RPM drives being a generally accepted standard for most
workload mixes.
Number of magnetic disks matter in hybrid configurations
While having adequate amounts of flash cache is important, so are having enough
magnetic disk spindles. In hybrid configurations, all virtual machines write
operations go to flash, and at some point later, these blocks are destaged to a
spinning magnetic disk. Having multiple magnetic disk spindles can speed up the
destaging process.
Format Type
VMFS-L
VMFS-L
VirstoFS
On-disk
version
v1
v1
v2
Overhead
750MB per disk
750MB per disk
1% of physical disk capacity
There is no support for the v2 on-disk format with Virtual SAN version 5.5. The v2
format is only supported on Virtual SAN version 6.0. This overhead for v2 is very
much dependent on how fragmented the user data is on the filesystem. In practice
what has been observed is that the metadata overhead is typically less than 1% of
the physical disk capacity.
Design decision: Include formatting overhead in capacity calculations.
Design consideration: There are other considerations to take into account apart
from NumberOfFailuresToTolerate and formatting overhead. These include whether
or virtual machine snapshots are planned. We will visit these when we look at some
design examples. As a rule of thumb, VMware recommends leaving approximately
30% free space available in the cluster capacity.
Queue depth is extremely important, as issues have been observed with controllers
that have very small queue depths. In particular, controllers with small queue
depths (less than 256) can impact virtual machine I/O performance when Virtual
SAN is rebuilding components, either due to a failure or when requested to do so
when entering maintenance mode.
Design decision: Choose storage I/O controllers that have as large a queue depth as
possible. While 256 are the minimum, the recommendation would be to choose a
controller with a much larger queue depth where possible.
RAID-0 versus pass-through
The second important item is the feature column that displays how Virtual SAN
supports physical disk presentation to Virtual SAN. There are entries referring to
RAID 0 and pass-through. Pass-through means that this controller can work in a
mode that will present the magnetic disks directly to the ESXi host. RAID 0 implies
that each of the magnetic disks will have to be configured as a RAID 0 volume before
the ESXi host can see them. There are additional considerations with RAID 0. For
example, an administrator may have to take additional manual steps replacing a
failed drive. These steps include rebuilding a new RAID 0 volume rather than simply
plugging in a replacement empty disk into the host and allowing Virtual SAN to
claim it.
Design decision: Storage I/O controllers that offer RAID-0 mode typically take
longer to install and replace than pass-thru drives from an operations perspective.
Storage controller cache considerations
VMwares recommendation is to disable the cache on the controller if possible.
Virtual SAN is already caching data at the storage layer there is no need to do this
again at the controller layer. If this cannot be done due to restrictions on the storage
controller, the recommendation is to set the cache to 100% read.
Advanced controller features
Some controller vendors provide third party features for acceleration. For example
HP has a feature called Smart Path and LSI has a feature called Fast Path. VMware
recommends disabling all advanced features when controllers are used in Virtual
SAN environments.
Design decision: When choosing a storage I/O controller, verify that it is on the
VCG, ensure cache is disabled, and ensure any third party acceleration features are
disabled. If the controller offers both RAID 0 and pass-through support, consider
This next screenshot is taken from the VMDK Hard disk 1. It implements both the
stripe width (RAID 0) and the failures to tolerate (RAID 1) requirements. There are
a total of 5 components making up this object; two components are striped, and
Note: The location of the Physical Disk Placement view has changed between
versions 5.5 and 6.0. In 5.5, it is located under the Manage tab. In 6.0, it is under the
Monitor tab.
However, for the most part, VMware recommends leaving striping at the default
value of 1 unless performance issues that might be alleviated by striping are
observed. The default value for the stripe width is 1 whereas the maximum value is
12.
Stripe Width Sizing Consideration
There are two main sizing considerations when it comes to stripe width. The first of
these considerations is if there are enough physical devices in the various hosts and
across the cluster to accommodate the requested stripe width, especially when
there is also a NumberOfFailuresToTolerate value to accommodate.
The second consideration is whether the value chosen for stripe width is going to
require a significant number of components and consume the host component count.
Both of these should be considered as part of any Virtual SAN design, although
considering the increase in maximum component count in 6.0 with on-disk format
v2, this realistically isnt a major concern anymore. Later, some working examples
will be looked at which will show how to take these factors into consideration when
designing a Virtual SAN cluster.
Flash Read Cache Reservation
Previously we mentioned the 10% rule for flash cache sizing. This is used as a read
cache and write buffer in hybrid configurations, and as a write buffer only for allflash configurations, and is distributed fairly amongst all virtual machines. However,
through the use of VM Storage Policy setting FlashReadCacheReservation, it is
possible to dedicate a portion of the read cache to one or more virtual machines.
Note: This policy setting is only relevant to hybrid configurations. It is not
supported or relevant in all-flash configurations due to changes in the caching
mechanisms and the fact that there is no read cache in an all-flash configuration.
For hybrid configurations, this setting defines how much read flash capacity should
be reserved for a storage object. It is specified as a percentage of the logical size of
the virtual machine disk object. It should only be used for addressing specifically
identified read performance issues. Other virtual machine objects do not use this
reserved flash cache capacity.
Unreserved flash is shared fairly between all objects, so for this reason VMware
recommends not changing the flash reservation unless a specific performance issue
is observed. The default value is 0%, implying the object has no read cache reserved,
but shares it with other virtual machines. The maximum value is 100%, meaning
that the amount of reserved read cache is the same size as the storage object
(VMDK).
The Monitor > Virtual SAN > Physical Disks view will display the amount of used
capacity in the cluster. This screen shot is taken from a 5.5 configuration. Similar
views are available on 6.0.
Design consideration: While the creation of replicas is taken into account when the
capacity of the Virtual SAN datastore is calculated, thin provisioning overcommitment is something that should be considered in the sizing calculations when
provisioning virtual machines on a Virtual SAN.
The RAID 1 is the availability aspect. There is a mirror copy of the VM home object
which is comprised of two replica components, implying that this virtual machine
was deployed with a NumberOfFailuresToTolerate = 1. The VM home inherits this
policy setting. The components are located on different hosts. The witness serves as
tiebreaker when availability decisions are made in the Virtual SAN cluster in the
event of, for example, a network partition. The witness resides on a completely
separate host from the replicas. This is why a minimum of three hosts with local
storage is required for Virtual SAN.
The VM Home Namespace inherits the policy setting NumberOfFailuresToTolerate.
This means that if a policy is created which includes a NumberOfFailuresToTolerate
= 2 policy setting, the VM home namespace object will use this policy setting. It
ignores most of the other policy settings and overrides those with its default values.
VM Swap
The virtual machine swap object also has its own default policy, which is to tolerate
a single failure. It has a default stripe width value, is thinly provisioned, and has no
read cache reservation.
However, swap does not reside in the VM home namespace; it is an object in its own
right, so is not limited by the way the VM home namespace is limited by a 255GB
thin object.
The VM Swap object does not inherit any of the setting in the VM Storage Policy. It
always uses the following settings:
Note that the VM Swap object is not visible in the UI when VM Storage Policies are
examined. Ruby vSphere Console (RVC) commands are required to display policy
and capacity information for this object.
Deltas disks created for snapshots
Delta disks, which are created when a snapshot is taken of the VMDK object, inherit
the same policy settings as the base disk VMDK.
Note that delta disks are also not visible in the UI when VM Storage Policies are
examined. However the VMDK base disk is visible and one can deduce the policy
CPU considerations
-
Memory considerations
-
Number of VMs, associated VMDKs, size of each virtual machine and thus
how much capacity is needed for virtual machine storage.
Memory consumed by each VM, as swap objects will be created on the Virtual
SAN datastore when the virtual machine is powered on
Desired NumberOfFailuresToTolerate setting, as this directly impacts the
amount of space required for virtual machine disks
Snapshots per VM, and how long maintained
Estimated space consumption per snapshot
Physical space for storage devices in the host
Virtual SAN 5.5 supports both USB and SD devices for an ESXi boot device,
but does not support SATADOM
Virtual SAN 6.0 introduces SATADOM as a supported ESXi boot device
When these devices are used for boot devices, the logs and traces reside in
RAM disks which are not persisted during reboots
o Consider redirecting logging and traces to persistent storage when
these devices are used as boot devices
o VMware does not recommend storing logs and traces on the VSAN
datastore. These logs may not be retrievable if Virtual SAN has an
3-node configurations
While Virtual SAN fully supports 3-node configurations, these configurations can
behave differently than configurations with 4 or greater nodes. In particular, in the
event of a failure, there are no resources to rebuild components on another host in
the cluster to tolerate another failure. Also with 3-node configurations, there is no
way to migrate all data from a node during maintenance.
In 3-node configurations, there are 2 replicas of the data and a witness, and these
must all reside on different hosts. A 3-node configuration can only tolerate 1 failure.
The implications of this are that if a node fails, Virtual SAN cannot rebuild
components, nor can it provision new VMs that tolerate failures. It cannot re-protect
virtual machine objects after a failure until the failed components are restored. Note
that in 3-node clusters, maintenance mode cant do a full data migration off of a
server needed for maintenance.
Design decision: Consider 4 or more nodes for the Virtual SAN cluster design for
maximum availability
vSphere HA considerations
Virtual SAN, in conjunction with vSphere HA, provide a highly available solution for
virtual machine workloads. If the host that fails is not running any virtual machine
compute, then there is no impact to the virtual machine workloads. If the host that
fails is running virtual machine compute, vSphere HA will restart those VMs on the
remaining hosts in the cluster.
In the case of network partitioning, vSphere HA has been extended to understand
Virtual SAN objects. That means that vSphere HA would restart a virtual machine on
a partition that still has access to a quorum of the VMs components, if the virtual
machine previously ran on a partition that lost access due to the partition.
There are a number of requirements for Virtual SAN to interoperate with vSphere
HA.
1. vSphere HA must use the Virtual SAN network for communication
2. vSphere HA does not use the Virtual SAN datastore as a "datastore heart
beating" location
3. vSphere HA needs to be disabled before configuring Virtual SAN on a cluster;
vSphere HA may only be enabled after the Virtual SAN cluster is configured.
Fault Domains
The idea behind fault domains is that we want to be able to tolerate groups of hosts
(chassis or racks) failing without requiring additional data copies. The
implementation allows Virtual SAN to save replica copies of the virtual machine data
in different domains, for example, different racks of compute.
In Virtual SAN 5.5, when deploying a virtual machine with a
NumberOfFailuresToTolerate = 1, there are 2n + 1 hosts required (where n =
NumberOfFailuresToTolerate. This means that to tolerate 1 failure, 3 ESXi hosts are
required. To tolerate 2 failures, 5 hosts were required and if the virtual machines
are to tolerate 3 failures (maximum), then 7 hosts were required.
NumberOfFailuresToTolerate
1
2
3
The same holds true in Virtual SAN 6.0 when fault domains are not enabled.
However if fault domains are enabled, this allows hosts to be grouped together to
form a fault domain. This means that no two copies/replicas of the virtual machines
data will be placed in the same fault domain. To calculate the number of fault
domains required to tolerate failures, use the same equation as before; when
deploying a virtual machine with a NumberOfFailuresToTolerate = 1 on a cluster
with fault domains, 2n + 1 fault domains (containing 1 or more hosts contributing
storage) is required.
NumberOfFailuresToTolerate
1
2
3
Lets consider the previous example, but now with 4 fault domains configured.
In this example, there are two virtual machines running on the ESXi host, each with
a single VMDK. The information needed here are the worldGroupID & Virtual SCSI
Disk handleID. This information is required to start the data collection.
This next command starts the data collection.
~ # vscsiStats -s -w 76634 -i 8192
vscsiStats: Starting Vscsi stats collection for worldGroup 76634,
handleID 8192 (scsi0:0)
Success.
With the statistics now being collected, the type of histogram to display can be
decided upon. The type of histogram can be one of: 'all, ioLength, seekDistance,
outstandingIOs, latency, interarrival'.
This first histogram is based on 'ioLength'. The command is as follows (-c is for
comma delimited output):
~ # vscsiStats -p ioLength -c -w 76634 -i 8192
This command will display a comma separate list of I/O lengths for commands,
reads and writes issued by the virtual machine. This output can easily be imported
into a spreadsheet for creating graphics.
This particular output has been captured after vscsiStats was run against an idle
Windows VM. A number of writes can be observed taking place during the period of
observation, mostly 4KB writes.
VMware Storage and Availability Documentation / 6 5
This next histogram displays the I/O latency on the same virtual machine and disk.
~ # vscsiStats -p latency -c -w 76634 -i 8192
Once again, this information may be imported into a spreadsheet, and a graph of
latency values for this virtual machine disk can be displayed. Again, the majority of
the latency values are in the 5000 microsecond (or 5 millisecond) mark.
Since all VMs are thinly provisioned on the VSAN datastore, the estimated storage
consumption should take into account the thin provisioning aspect before the flash
requirement can be calculated:
9.5TB Magnetic Disk required => 11 x 900GB SAS 10K RPM per host
200GB Flash required => 2 x 100GB SAS SSD per host
Why did we choose 2 x 100GB flash devices rather than 1 x 200GB flash device? The
reason is that we can only have a maximum of seven capacity devices in a disk group.
In this configuration, we have more than seven capacity devices, thus we need two
disk groups. Each disk group must contain a flash device, thus we choose two
smaller devices.
Since the hosts are booting from an SD card, we do not need an additional disk for
the ESXi boot image. With this configuration, a single disk group per host will suffice.
1 x VM Home Namespace
1 x VMDK
1 x VM Swap
0 x Snapshot deltas
This implies that there 3 objects per VM. Now we need to work out how many
components per object, considering that we are using a VM Storage Policy setting
that contains Number of Host Failures to Tolerate = 1 (FTT). It should be noted that
only the VM Home Namespace and the VMDK inherit the FTT setting; the VM Swap
Object ignores this setting but still uses FTT=1. Therefore when we look at the
number of components per object on each VM, we get the following:
Now we have a total of 9 components per VM. If we plan to deploy 100 VM, then we
will have a maximum of 900 components. This is well within our limits of 3, 000
components per host in Virtual SAN 5.5 and 9,000 per host in 6.0.
The estimation is that the Guest OS and application will consume 75% of the storage.
The VM Storage Policy setting is HostFailuresToTolerate (FTT) set to 1 and
StripeWidth set to 2. All other policy settings are left at the defaults. The ESXi hosts
will boot from disk.
Note that we are not including the capacity consumption from component metadata
or witnesses. Both of these are negligible.
Taking into account the considerations above, the calculation for a valid
configuration would be as follows:
Host Requirements: 3 hosts minimum for Virtual SAN, but might need more
Total CPU Requirements: 400 vCPUs
vCPU-to-core ratio: 4:1
Total CPU Core Requirements: 400 / 4 = 100 cores required
How many cores per socket? 12
Total Memory Requirements: 400 x 12GB = 4.8TB
Total Storage Requirements (without FTT): *
o (400 x 100GB) + (400 x 200GB)
o 40TB + 80TB
o = 120TB
Total Storage Requirements (with FTT): *
o = 120TB x 2
o = 240TB
Total Storage Requirements (with FTT) + VM Swap (with FTT): *
o = (120TB + 4.8TB) *2
o = 240TB + 9.6TB
o = 249.6TB
Estimated Storage Consumption (without FTT) for cache sizing:
o (75% of total storage)
o = 75% of 120TB
o = 90TB
Cache Required (10% of Estimated Storage Consumption): 9TB
Estimated Snapshot Storage Consumption: 2 snapshots per VM
o It is estimated that both of snapshot images will never grow larger
than 5% of base VMDK
o Storage Requirements (with FTT) = 240TB
o There is no requirement to capture virtual machine memory when a
snapshot is taken
o Estimated Snapshot Requirements (with FTT) = 5% = 12TB
Total Storage Requirements (VMs + Snapshots):
o = 249.6TB + 12TB
o = 261.6TB
Required capacity slack space: 30%
343.4TB Magnetic Disk required across the cluster implies ~60TB Magnetic
Disk required per host (6 hosts in the cluster)
o One option is to consider using 15 x 4TB SATA 7200 RPM per host
(although slightly below, it should meet our needs). This may be the
cheapest option, but performance may not be acceptable.
o 15 disks may entail the purchase of additional controllers, or SAS
extenders. Multiple controllers will offer superior performance, but at
a cost.
o Another design consideration is to use an external storage enclosure
to accommodate this many disks. Support for this is introduced in
version 6.0.
o Since there are only 7 disks per disk group, a minimum of 3 disk
groups is required.
9TB cache required per cluster implies 1.5TB cache required per host
o Need 3 flash devices, one for each of the above disk groups
o 3 x 500GB SSD per host implies 16 disk slots are now needed per host
o For future growth, consideration could be given to using larger flash
devices.
ESXi hosts boot from disk
o 17 disk slots now required per host
In this example, the customer now needs to source a server that contains 17 disk
slots for this rather large configuration. However, this design is achievable with a 6node cluster. However, if the customer now had a requirement to rebuild
components in the event of a failure, one additional fully populate server would
need to be added to the configuration. This brings the number of hosts required up
to 7.
1 x VM Home Namespace
2 x VMDK
1 x VM Swap
2 x Snapshot deltas
This implies that there 6 objects per VM. Now we need to work out how many
components per object, considering that we are using a VM Storage Policy setting
that contains Number of Host Failures to Tolerate = 1 (FTT) and Stripe Width = 2.
It should be noted that only the VM Home Namespace, the VMDK and the snapshot
deltas inherit the FTT setting; the VM Swap Object ignores this setting. Only the
VMDK and snapshot deltas inherit the VM Storage Policy
Next step is to look at the number of components per object:
Server choice
For option (1) customers server of choice is a HP DL380p, which needs to be
checked to see if it can indeed be configured with up to 25 disk slots. If this was not
possible, then customer may have to look at purchasing additional servers to meet
the storage requirement, and the calculations would have to be revisited. This host
can also meet the 12 cores-per-socket requirements and has 24 DIMM slots to meet
the memory requirement.
For option (2) a customer may need to look at external storage enclosures if a host
that does supports 41 slots is not found, which is likely. The alternative is to look at
purchasing additional servers to meet the storage requirement, and the calculations
would once again have to be revisited. If the customer has a server of choice, such as
the HP DL380p, then a new round of calculation would be needed using the
formulas discussed here.
Conclusion
Although most Virtual SAN design and sizing exercises are straightforward, careful
planning at the outset can avoid problems later.
Based on observed experiences to date, the most frequent design issues are:
-
Not properly sizing cache for capacity growth (e.g. thin volumes
progressively getting fatter), resulting in declining performance over time.
Using very large, slow disks for capacity, resulting in poor performance if an
application is not cache-friendly.
Further Information
VMware Compatibility Guide
http://www.vmware.com/resources/compatibility/search.php?deviceCateg
ory=vsan
https://communities.vmware.com/community/vmtn/vsan
Key bloggers
http://cormachogan.com/vsan/
http://www.yellow-bricks.com/virtual-san/
http://www.virtuallyghetto.com/category/vsan
http://www.punchingclouds.com/tag/vsan/
http://blogs.vmware.com/vsphere/storage
http://www.vmware.com/products/virtual-san/resources.html
https://www.vmware.com/support/virtual-san
VMware support
https://my.vmware.com/web/vmware/login
http://kb.vmware.com/kb/2006985 - How to file a Support Request
http://kb.vmware.com/kb/1021806 - Location of VMware Product log files
http://kb.vmware.com/kb/2032076 - Location of ESXi 5.x log file
http://kb.vmware.com/kb/2072796 - Collecting Virtual SAN support logs
Additional Reading
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright 2012 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws.
VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
VMware
VMware Storage an
d A v aisi a
l aregistered
b i l i t y Dtrademark
o c u m e nor
t a trademark
t i o n / of8 1
VMware, Inc. in the United States and/or other jurisdiction. All other marks and names mentioned herein may be trademarks of their respective
companies.