Docu59923 - VMAX3 TimeFinder SnapVX and Microsoft SQL Server White Paper PDF
Docu59923 - VMAX3 TimeFinder SnapVX and Microsoft SQL Server White Paper PDF
Docu59923 - VMAX3 TimeFinder SnapVX and Microsoft SQL Server White Paper PDF
ABSTRACT
With the introduction of the VMAX3 disk arrays and new local and remote replication
capabilities, administrators can protect their applications effectively and efficiently
with unprecedented ease of use and management. This white paper discusses EMC
VMAX TimeFinder SnapVX functionality in the context of deploying, planning, and
protecting the Microsoft SQL server.
July, 2015
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
Part Number H14273
2
TABLE OF CONTENTS
3
Use case 1B Impact of taking 256 SnapVX snapshots on production workload
database with a varying SLO ............................................................................. 28
Use case 2A Impact of No-Copy vs Copy mode linked target snapshots on with
workload on production..................................................................................... 30
Use case 2B Impact of No-Copy vs Copy mode linked target snapshots with workload
on both Production and Mount hosts ................................................................... 31
CONCLUSION .......................................................................................... 33
APPENDIXES........................................................................................... 34
Appendix I - Configuring SQL Server database storage groups for replication .......... 34
Appendix II SRDF modes and topologies .......................................................... 36
Appendix III Solutions Enabler CLI commands for TimeFinder SnapVX
management ................................................................................................... 38
Appendix IV Solutions Enabler CLI commands for SRDF Management .................. 40
APPENDIX V - Symntctl VMAX integration utility for windows disk management ....... 41
APPENDIX VI Scripting attach or detach FOR a SQL Server database using
WINDOWS POWERSHELL .................................................................................. 43
Appendix VII Example outputs ........................................................................ 44
4
EXECUTIVE SUMMARY
Many applications are required to be fully operational 24x7x365 and the data for these applications continues to grow. At the same
time, their RPO and RTO requirements are becoming more stringent. As a result, there is a large gap between the requirements for
fast and efficient protection and replication, and the ability to meet these requirements without overhead or operation disruption.
These requirements include the ability to create local and remote database replicas in seconds without disruption of Production host
CPU or I/O activity for purposes such as patch testing, running reports, creating development sandbox environments, publishing data
to analytic systems, offloading backups from Production, Disaster Recovery (DR) strategy, and more.
Traditional solutions rely on host-based replication. The disadvantages of these solutions are the additional host I/O and CPU cycles
consumed by the need to create such replicas, the complexity of monitoring and maintaining them across multiple servers, and the
elongated time and complexity associated with their recovery.
TimeFinder local replication values to Microsoft SQL Server include1:
The ability to create instant and consistent database replicas for repurposing, across a single database or multiple databases,
including external data or message queues, and across multiple VMAX3 storage systems.
TimeFinder replica creation or restore time takes seconds, regardless of the database size. The target devices (in case of a
replica) or source devices (in case of a restore) are available immediately with their data, even as incremental data changes are
copied in the background.
VMAX3 TimeFinder SnapVX snapshots are consistent by default. Each source device can have up to 256 space-efficient
snapshots that can be created or restored at any time. Snapshots can be further linked to up to 1024 target devices,
maintaining incremental refresh relationships. The linked targets can remain space-efficient, or a background copy of all the data
can take place, making it a full copy. In this way, SnapVX allows unlimited number of cascaded snapshots. 1
SRDF remote replication values to SQL Server include:
Synchronous and Asynchronous consistent replication of a single database or multiple databases, including external data or
message queues, and across multiple VMAX3 storage array systems if necessary. The point of consistency is created before a
disaster strikes, rather than taking hours to achieve afterwards when using replications that are not consistent across
applications and databases.
Disaster Recovery (DR) protection for two or three sites, including cascaded or triangular relationships, where SRDF always
maintains incremental updates between source and target devices.
SRDF and TimeFinder are integrated. While SRDF replicates the data remotely, TimeFinder can be used on the remote site to
create writable snapshots or backup images of the database. This allows DBAs to perform remote backup operations or create
remote database copies.
SRDF and TimeFinder can work in parallel to restore remote backups. While a remote TimeFinder backup is being restored to the
remote SRDF devices, in parallel SRDF copies the restored data to the local site. This parallel restore capability provides DBAs
with faster accessibility to remote backups and shortens recovery times.
AUDIENCE
This white paper is intended for database administrators, storage administrators, and system architects who are responsible for
implementing, managing, and maintaining SQL Server databases and EMC VMAX3 storage systems. It is assumed that readers have
some familiarity with Microsoft SQL Server and the VMAX3 family of storage arrays, and are interested in achieving higher database
availability, performance, and ease of storage management.
1
EMC AppSync has been updated to include support for SnapVX and allow for SQL Server integration through the use of VDI at the application and
operating system level. Please check the latest release notes and support matrix for AppSync to ensure support for SnapVX. One of the benefits of
TimeFinder SnapVX in the context of EMC AppSync will be the ability for application administrators to be able to manage application-consistent copies of
SQL Server databases.
5
VMAX3 PRODUCT OVERVIEW
TERMINOLOGY
The following table explains important terms used in this paper.
Term Description
Restartable vs. Recoverable SQL Server distinguishes between a restartable and recoverable database. A restartable state
database requires all log data to be consistent (see Storage consistent replications). SQL Server can be
simply started and perform automatic crash/instance recovery without user intervention. A
recoverable state requires a database media recovery, rolling forward transaction log to achieve
data consistency before the database can be opened.
RTO and RPO Recovery Time Objective (RTO) refers to the time it takes to recover a database after a failure.
Recovery Point Objective (RPO) refers to any amount of data loss after the recovery completes,
where RPO=0 means no data loss of committed transactions.
Storage consistent Storage consistent replication refers to storage replication operations (local or remote) that
replication maintain write-order fidelity at the target devices, even while the application is running. To the
SQL Server database the snapshot data looks as it does after a host reboot, allowing the database
to perform crash/instance recovery when starting.
VMAX3 HYPERMAX OS HYPERMAX OS is the industrys first open converged storage hypervisor and operating system. It
enables VMAX3 to embed storage infrastructure services like cloud access, data mobility, and data
protection directly on the array. This delivers new levels of data center efficiency and
consolidation by reducing footprint and energy requirements. In addition, HYPERMAX OS delivers
the ability to perform real-time and non-disruptive data services.
VMAX3 Storage Group A collection of host addressable VMAX3 devices. A Storage Group can be used to (a) present
devices to host (LUN masking), (b) specify FAST Service Levels (SLOs) to a group of devices, and
(c) manage grouping of devices for replication software such as SnapVX and SRDF.
Storage Groups can be cascaded. For example, the child storage groups can be used for setting
FAST Service Level Objectives (SLOs) and the parent used for LUN masking of all the database
devices to the host.
VMAX3 TimeFinder Snapshot Previous generations of TimeFinder referred to snapshot as a space-saving copy of the source
vs. Clone device, where capacity was consumed only for data changed after the snapshot time. Clones, on
the other hand, referred to full copy of the source device. With VMAX3, TimeFinder SnapVX
snapshots are always space-efficient. When they are linked to host-addressable target devices,
the user can choose whether to keep the target devices space-efficient or to perform full copy.
VMAX3 TimeFinder SnapVX TimeFinder SnapVX is the latest development in TimeFinder local replication software, offering a
higher scale and wider feature set, yet maintaining the ability to emulate legacy behavior.
The newest additions to the VMAX3 family, VMAX 100K, 200K and 400K, deliver the latest in Tier-1 scale-out multi-controller
architecture with consolidation and efficiency for the enterprise. It offers dramatic increases in floor tile density, high capacity flash
and hard disk drives in dense enclosures for both 2.5" and 3.5" drives, and supports both block and file (eNAS).
The VMAX3 family of storage arrays comes pre-configured from the factory to simplify deployment at customer sites and minimize
time to first I/O. Each array uses Virtual Provisioning to allow the user easy and quick storage provisioning. While VMAX3 can ship as
an all-flash array with the combination of EFD (Enterprise Flash Drives) and large persistent cache that accelerates both writes and
6
reads even further, it can also ship as hybrid, multi-tier storage that excels in providing FAST 2 (Fully Automated Storage Tiering)
enabled performance management based on Service Level Objectives (SLO). The new VMAX3 hardware architecture comes with
more CPU power, larger persistent cache, and a new Dynamic Virtual Matrix dual InfiniBand fabric interconnect that creates an
extremely fast internal memory-to-memory and data-copy fabric.
Figure 1 shows possible VMAX3 components. Refer to EMC documentation and release notes to find the most up to date supported
components.
The replicated devices can contain the database data, SQL Server home directories, data that is external to the database (for
example, image files), message queues, and so on.
VMAX3 TimeFinder SnapVX combines the best aspects of previous TimeFinder offerings and adds new functionality, scalability, and
ease-of-use features.
Some of the main SnapVX capabilities related to native snapshots (emulation mode for legacy behavior is not covered) include:
With SnapVX, snapshots are natively targetless. They only relate to a source devices and cant be otherwise accessed directly.
Instead, snapshots can be restored back to the source devices or linked to another set of target devices which can be made
host-accessible.
Each source device can have up to 256 snapshots and can be linked to up to 1024 targets.
Snapshot operations are performed on a group of devices. This group is defined by using a text file specifying the list of devices:
a device-group (DG), composite-group (CG), or a storage group (SG). The recommended way is to use a storage group.
Snapshots are taken using the establish command. When a snapshot is established, a snapshot name is provided with an
optional expiration date. The snapshot time is saved with the snapshot and can be listed. Snapshots also get a generation
number (starting with 0). The snapshot generation is incremented with each new snapshot even if the snapshot name remains
the same.
2
Fully Automated Storage Tiering (FAST) allows VMAX3 storage to automatically and dynamically manage performance service level goals across the
available storage resources to meet the application I/O demand, even as new data is added, and access patterns continue to change over time.
3
Additional drive types and capacities may be available. Contact your EMC representative for more details.
7
SnapVX provides the ability to create either space-efficient or full-copy replicas when linking snapshots to target devices.
Use the -copy option to copy the full snapshot point-in-time data to the target devices during link. This will make the target
devices a stand-alone copy. If the -copy option is not used, the target devices provide the exact snapshot point-in-time data
only until the link relationship is terminated, saving capacity and resources by providing space-efficient replicas.
SnapVX snapshots themselves are always space-efficient as they are simply a set of pointers pointing to the original data when
it is unmodified, or to the original version of the data when it is modified. Multiple snapshots of the same data utilize both
storage and memory savings by pointing to the same location and consuming very little metadata.
SnapVX snapshots are always consistent. That means that snapshot creation always maintains write-order fidelity. This allows
easy creation of restartable database copies. Snapshot operations such as establish and restore are also consistent; the
operation either succeeds or fails for all the devices as a unit.
Linked-target devices cannot restore any changes directly to the source devices. Instead, a new snapshot can be taken from
the target devices and linked back to the original source devices. In this way, SnapVX allows an unlimited number of cascaded
snapshots.
FAST Service Levels apply to either the source devices or to snapshot linked targets, but not to the snapshots themselves.
SnapVX snapshot data resides in the same Storage Resource Pool (SRP) as the source devices, and acquire an Optimized FAST
Service Level Objective (SLO) by default.
See Appendix III for a list of basic TimeFinder SnapVX operations.
For more information on SnapVX refer to the EMC VMAX3 Local Replication Technical Note.
Note: Even when consistency is enabled the remote devices may not yet be consistent while SRDF state is sync-in-
progress. This happens when SRDF initial synchronization is taking place before it enters a consistent replication state.
o SRDF consistency also implies that if a single device in a consistency group cannot replicate, then the whole group will
stop replicating to preserve target devices consistency.
o Multiple SRDF groups set in SRDF/A mode can be combined within a single array, or across arrays. Such grouping of
consistency groups is called multi-session consistency (MSC). MSC maintains dependent-write consistent
replications across all the participating SRDF groups.
SRDF sessions:
o An SRDF session is created when replication starts between R1 and R2 devices in an SRDF group.
o An SRDF session can establish replication between R1 and R2 devices. R1 and R2 devices will need full copy for the
first establish only. Any subsequent establish (for example, after SRDF split or suspend) will be incremental, only
passing changed data.
o An SRDF session can restore the content of R2 devices back to R1. Restores are incremental, moving only changed
data across the links. TimeFinder and SRDF can restore in parallel (for example, bring back a remote backup image).
o During replication, the devices to which data is replicated are write-disabled (read-only).
o An SRDF session can be suspended, temporarily halting replication until a resume command is issued
o An SRDF session can be split, which not only suspends the replication but also makes the R2 devices read-writable.
o An SRDF checkpoint command does not return the prompt until the content of the R1 devices has reached the R2
devices. This option helps in creating remote database backups when SRDF/A is used.
o An SRDF swap changes R1 and R2 personality and the replication direction for the session.
o An SRDF failover makes the R2 devices writable. R1 devices, if still accessible, will change to Write_Disabled (read-
only). The SRDF session is suspended and application operations proceed on the R2 devices.
o An SRDF failback copies changed data from R2 devices back to R1 and makes the R1 devices writable. R2 devices are
made Write_Disabled (read-only).
o SRDF replication sessions can go in either direction (bi-directional) between the two arrays, where different SRDF
groups can replicate in different directions.
For more information, see SRDF Modes and Topologies and SRDF CLI commands.
9
SQL SERVER AND SNAPVX CONSIDERATIONS
NUMBER OF SNAPSHOTS, FREQUENCY AND RETENTION
VMAX3 TimeFinder SnapVX allows up to 256 snapshots per source device with minimal cache and capacity impact. SnapVX minimizes
the impact of Production host writes by using intelligent Redirect-on-Write and Asynchronous-Copy-on-First-Write. Both methods
allow Production host I/O writes to complete without delay. Background data is copied while Production data is modified and the
snapshot data preserves its Point-in-Time consistency.
If snapshots are used as part of a disaster protection strategy then the frequency of creating snapshots can be determined based on
the RTO and RPO needs.
For a restart solution where no roll-forward is planned, snapshots taken at very short intervals (seconds or minutes) ensure
that RPO is limited to that interval. For example, if a snapshot is taken every 30 seconds, if it is needed to restore the database
without recovery there will be no more than 30 seconds of data loss.
For a recovery solution, frequent snapshots ensure that RTO is short as less data will need recovery during roll-forward of logs
to the current time. For example, if snapshots are taken every 30 seconds, rolling the data forward from the last snapshot will
be much faster than rolling forward from nightly backup or hourly snapshots.
Because snapshots consume storage capacity based on the database change rate, when no longer used, old snapshots should be
terminated to release their consumed storage capacity.
10
During the restart of SQL Server database from such copies, the following occurs:
1. All transactions that were recorded as committed and written to the transaction log, but which may not have had corresponding
data pages written to the data files, are rolled forward. This is the redo phase.
2. When the redo phase is complete, the SQL Server enters the undo phase where it looks for database changes that were recorded
(for example, a dirty page flushed by a lazy write) but which were never actually committed by a transaction. These changes are
rolled back or undone. The state attained is often referred to as a transactionally consistent point in time. It is essentially the
same process that the SQL Server would undergo should the server have suffered an unanticipated interruption such as a power
failure. Roll-forward recovery using incremental transaction log backups to a point in time after the database copy was created is
not supported on a Microsoft SQL Server restartable database copy. Hence, VMAX consistent split creates crash-consistent and
write-order-consistent, point-in-time copies of the database.
EMC AppSync can create and manage application-consistent copies of Microsoft SQL Server databases, including support for
advanced SQL features, such as AlwaysOn Availability Groups, protection for standalone and clustered production SQL Server
instances, and support for databases on physical hosts, RDMs, and virtual disks on virtual hosts. It uses Microsoft SQL Server's VDI
snapshot feature to create Full and Copy SQL Server backup types. Full backup type protects the database, and the active part of the
transaction log. This copy type is typically used when the copy will be considered a backup of the database or when the copy will be
mounted in order to use a third-party product to create a backup of the database. This type of copy allows you to restore transaction
logs to bring the database forward to a point in time that is newer than the copy, assuming you have backed up those transaction
logs. Copy backup type protects the database and the active part of the transaction log without affecting the sequence of the
backups. This provides SQL Server DBAs with a way to create a copy without interfering with third-party backup applications that
may be creating full and/or differential backups of the SQL Server databases
4
The target storage groups contain the linked-target devices of Productions snapshots. They should be added to a masking view to make the target
devices accessible to the mount host.
11
point-in-time of the snapshot. If the data on the linked target needs to be reset to the original point-in-time snapshot, it can be
relinked to that snapshot. SnapVX management Solutions Enabler CLI commands on how to create a SnapVX snapshot (Figure 2),
link it (Figure 3), relink to it and restore from it (Figure 4) are provided in Appendix III.
Figure 2 shows SnapVX sessions being created for a SQL Server Production OLTP database at the parent storage group level. It
shows various snapshots being created at certain intervals for protection and future use. These snapshots include both data and log,
and by default inherit the SLO set on the production storage group.
To reinitialize/refresh the mounted linked target, it is necessary to reverse the processes that were executed prior to mounting. The
steps to refresh a linked target include:
1. Drop or detach the SQL Server database.
2. Unmount the volumes from the mount server using appropriate symntctl commands (see Appendix V).
3. Relink the target SnapVX.
4. Remount the volumes and attach the database.
As part of the refresh, perform the following steps to re-mount a SnapVX replica to SQL Server:
1. Detach SQL server database.
2. Unmount the volumes from the mount server.
3. Relink the target SnapVX SQL storage group to the same or a different SnapVX.
4. Follow the steps for mounting a SnapVX replica.
The SnapVX steps necessary to link or relink a SnapVX session using symsnapvx are shown in the following example:
Steps to link or relink a SnapVX session to the Target (Mount) SQL Server storage group
symsnapvx sid 536 sg PROD_SQL_SG establish name PROD_SQL_SnapVX ttl delta 2 nop
3. Link or (Re)Link SnapVX to Mount SQL database storage group in default no-copy mode.
symsnapvx sid 536 sg PROD_SQL_SG lnsg MOUNT_SQL_SG snapshot_name PROD_SQL_SnapVX link nop
(or)
symsnapvx sid 536 sg PROD_SQL_SG lnsg MOUNT_SQL_SG snapshot_name PROD_SQL_SnapVX relink nop
4. Verify link.
13
performed on a linked target need to be restored to production; more of these restore modes are discussed in later sections. Figure
4 shows an example of a direct SnapVX restore.
The SnapVX steps necessary to do a direct restore are shown in the following example:
14
Direct SnapVX restore to Production SQL database
symsnapvx sid 536 sg PROD_SQL_SG establish name PROD_SQL_SnapVX ttl delta 2 nop
15
Figure 5. Indirect restore to production database using establish and link
The actual steps in executing an indirect restore are shown below.
Create a SnapVX replica of surgically repaired mount SQL database storage group
Detach the production SQL server database
The next few steps illustrate the actual step-by-step process of executing an in-direct SnapVX restore:
1. Execute a SnapVX establish (See Part A-Figure 6)
2. Link to mount (See Part B-Figure 7)
3. Re-snap (See Part C-Figure 8)
4. Re-link to production (See Part D-Figure 9)
5. Execute cleanup for participating SnapVX sessions that are no longer needed
16
Figure 6. Indirect SnapVX restore (Part A), SnapVX creation
Figure 7. Indirect SnapVX restore (Part B), linking SnapVx replica to target SQL storage group
Figure 8. Indirect SnapVX restore (Part C), establish a SnapVx replica of the mount host storage group
17
Figure 9. Link the SnapVX replica (Part D) of the Mount SQL database storage group
The equivalent symsnapvx steps for Figure 6 (Part A), Figure 7(Part B) are listed below.
Indirect SnapVX restore of modified linked target back to Production SQL database (Part A & Part B)
symsnapvx sid 536 sg PROD_SQL_SG name PROD_SQL_SnapVX establish ttl delta 2 nop
symsnapvx sid 536 sg PROD_SQL_SG lnsg MOUNT_SQL_SG snapshot_name PROD_SQL_SnapVX link copy nop
4. Verify link.
The equivalent symsnapvx steps for Figure 8 (Part C), Figure 9(Part D) are listed below.
Indirect SnapVX restore of modified linked target back to Production SQL database (Part C & Part D)
1. Make surgical repairs or modifications on the Mounted SQL database, and then create a SnapVX replica of this Mount SQL
database storage group (cascaded).
symsnapvx sid 536 sg MOUNT_SQL_SG name MOUNT_SQL_SnapVX establish ttl delta 2 nop
symsnapvx sid 536 sg MOUNT_SQL_SG snapshot_name MOUNT_SQL_SnapVX verify -summary
2. Link the SnapVX replica of the Mount SQL database storage group back to the Production SQL database in copy mode.
symsnapvx sid 536 sg MOUNT_SQL_SG lnsg PROD_SQL_SG snapshot_name MOUNT_SQL_SnapVX link copy nop
18
3. Verify link copy progress.
Unlink all targets to both Production and Mount SQL storage groups before terminating
symsnapvx sid 536 sg MOUNT_SQL_SG lnsg PROD_SQL_SG snapshot_name MOUNT_SQL_SnapVX unlink nop
symsnapvx sid 536 sg PROD_SQL_SG lnsg MOUNT_SQL_SG snapshot_name PROD_SQL_SnapVX unlink nop
19
LEVERAGING VMAX3 REMOTE SNAPS FOR DISASTER RECOVERY
VMAX3 SRDF allows both synchronous and asynchronous replication of Production databases to multiple target sites for disaster
recovery (DR). The remote copies can be used to restore a production database in the event of disaster. Refer to Appendix IV for
specific steps on how to set up SRDF between source and target site. Periodic point-in-time remote snapshots on the R2 site can be
used for DR testing, TEST/DEV, and for restoring back to the R1 site. Figure 10 illustrates a sample configuration of SRDF and
remote snapshots.
3. If DR from a point-in-time snapshot is desired, identify R2 Snap and restore that to R2 devices.
6. As soon as the restore from snap is initiated, SRDF restore can be started. SRDF will start performing incremental restore from
R2 to R1. The devices will show SyncInProg to indicate that the restore is going on. The state of Synchronized will indicate
completion of the restore.
20
SQL SERVER OPERATIONAL DETAILS
MICROSOFT SQL SERVER AlwaysOn WITH SnapVX
SQL Server high availability and native continuous data protection with AlwaysOn and TimeFinder SnapVX restartable snapshots
provide better protection levels and reduced outage. The SQL Server AlwaysOn Availability Group (AAG) feature is a disaster-
recovery solution that improves database availability and reduces downtime with multiple database copies. An Availability Group
supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Secondary databases
can be made available for read-only access, backup, reporting, and database consistency checking operations. The primary and
secondary copies of databases are not sharing storage in an AAG; each node in the cluster needs to have their own separate copies
of storage configured and zoned on it. The database copy on each secondary node of an AAG is independent of the primary copy. If a
logical corruption replicates across the AAG databases, TimeFinder SnapVX can create a lagged copy to help return to a previous
point in time. Figure 11 illustrates an example of SQL Server AlwaysOn deployment.
21
SQL server OLTP database storage group was on Platinum SLO. Different results may be observed, depending on factors like
retention time, number of snaps, change rate, competing workloads run on the linked targets, and so forth.
Non shared
400 tracks
Shared Tracks
200
0
Figure 13. Pool Allocation (GB) During Snap Life Cycle
Windows Server 2012 supports the ability to detect thinly provisioned storage and issue T10 standard UNMAP or TRIM based reclaim
commands against the storage. Reclaim operations are performed in the following situations:
When the target linked volumes are no longer in use and the volumes are formatted with the quick option. The quick option
requests that the entire size of the volume be reclaimed in real-time. Figure 14 shows an example track allocation layout before
22
and after a quick formatted set of Windows volumes 00020 through 00024. The unlinked target volumes after quick format are
reduced to 702 total written tracks.
When the optimize options is selected for a volume as part of a regularly scheduled operation, or when optimize-volume
Windows PowerShell is used with the retrim option, or selected from the Defragment and Optimize Drives GUI. Figure 15
shows an example.
23
When a group of SQL Server database files are deleted from the target file system, Windows automatically issues reclaim
commands for the area of the file system that was freed based on the file deletion. Figure 16 shows the effect of reclamation on
Device ID 00023.
Figure 16. Reclaimed tracks after SQL server database files were deleted on target FS
24
SnapVX PERFORMANCE USE CASES WITH MICROSOFT SQL SERVER
TEST BED CONFIGURATION
Figure 17 shows the use cases test environment. It consists of a test production server running OLTP workload and a target mount
server for linked target replicas used for test/development/reporting or other lightweight OLTP workloads.
HYPERMAX OS 5977.596
25
Table 2 Test host environment
Configuration aspect Description
C:\OLTP1_Logs OLTP1_log.ldf
TEST OVERVIEW
General test notes:
OLTP1 was configured to run a 90/10 read/write ratio OLTP workload derived from an industry standard. No special database
tuning was done as the focus of the test was not on achieving maximum performance and rather comparative differences of a
standard database workload.
All the tests maintained a steady OLTP workload on production SQL Server database with a variable OLTP change rate in the
range of 30-60MB/sec on a 1.1 TB SQL dataset, even when running the SnapVX replications.
Most of the SnapVX snapshots were taken at an aggressive 30 sec interval, to minimize RTO and RPO, and show the role
SnapVX replications play in continuous data protection (CDP).
DATA and LOG Storage Groups were cascaded into a common parent storage group for ease of provisioning, replication and
performance management.
26
Data collection included storage performance metrics using Solutions Enabler and Unisphere for VMAX, host performance
statistics using Windows Perfmon, and EMC PowerPath statistics.
Test results:
1. Table 5 and Figure 18 show the observed pattern for SQL Batch Requests/sec and SQL response times (ms). Note that while the
workload is being run there is no major significant impact of creating snaps or terminating snaps on the production workload.
The Create SnapVX phase shows a minimal decrease in batch requests/sec from 2441 to 2366, and a slight increase in SQL
Server response times from 2.2ms to 2.28ms, but well within the boundaries of Gold SLO compliance set on the SQL storage
group.
The increase in SQL Batch Requests/sec after termination of SnapVX snapshot is likely due to Redirect-On-Write (ROW) technology
in VMAX3, where new writes are asynchronously written to the new location while snapshots and their deltas point to the original
location. These new locations could be located in the flash tier, based on availability, capacity, compliance, and SLO track movement.
Refer to the EMC VMAX3 TM Local Replication technical notes for further details on ROW.
Table 5 SQL Server Database performance statistics as an effect of SnapVX on Gold SLO
27
Figure 18. SQL Stats as part of Use case 1A Taking 256 SnapVx snapshots
2. Create 256 SnapVX snapshots on production SQL Server storage group at 30 second intervals and record the performance
statistics for SQL Server database.
28
3. Continue to run the OLTP workload on Gold for the next 3 hours.
4. Terminate the previously created SnapVX snapshots at 30 second intervals.
5. Repeat the above steps; however, during the SnapVX creation phase, transition the SQL storage group to Platinum SLO, and
create 256 SnapVX snapshots at 30 second intervals.
6. Continue to run the OLTP workload load at steady state and with the SQL storage group set at Platinum SLO for the next 3
hours.
Test results:
Table 6 and Figure 19 for Use Case 1B show the database transaction rate in SQL Batch Requests/sec and the SQL Response time
(ms), for both Steady State, and SnapVX for each SLO. With Gold SLO, SnapVX showed the SQL transaction rate at 2366 Batch
requests/sec, while a similar run with Platinum workload improved to 3260 Batch requests/sec. The SQL response times remained in
the range of 1.44 ms to 2.28 ms for Platinum and Gold respectively. SLO rules of compliance and objectives were met.
Figure 19. Use Case 1B SQL Batch Request changes, SQL Reponse time with Gold to Platinum changes
29
USE CASE 2A IMPACT OF NO-COPY VS COPY MODE LINKED TARGET SNAPSHOTS ON WITH
WORKLOAD ON PRODUCTION
Objectives:
This use case shows the impact of SnapVX linked targets in No-Copy mode and in Copy mode. In the No-Copy mode test, the 256
Snaps were created and linked in No-Copy mode with targets, while an OLTP workload is running on the production SQL Server for 3
hours. In the Copy mode test, the 256 SnapVX snaps were created and linked to a target storage group 5 in Copy mode every 30
seconds. The Copy mode tracks were allocated and created asynchronously in the background for new changes being generated for
the OLTP workload. This test case used Gold SLO for SQL Server Data files and Bronze SLO for SQL Server transaction LOG storage
group. Note: There was no workload run on the target mount SQL Server.
Test results:
Table 7 and Figure 20 show that the Average Batch Requests/sec for SQL Server is slightly higher for Copy mode test at 1710
compared to No-Copy LINK at 1684. The response times were slightly lower at 3.23 ms for COPY LINK compared to 3.62 ms for NO-
COPY LINK. As we can see from the Figure 20 graphs, No-Copy linked snaps and Copy linked snaps are almost identical in behavior
when there is no intensive workload run on the mount SQL storage group. The very slight differences between COPY and NO-COPY
SQL stats can be attributed to the ROW (Redirect-on-Write) technology in VMAX3, to align with Gold SLO compliance latency range.
Refer to the EMC VMAX3 TM Local Replication technical notes for further details on ROW.
5
The target storage groups contain the linked-target devices of Productions snapshots. They should be added to a masking view to make the target
devices accessible to the Mount host.
30
Figure 20. Use Case 2A results, for No-Copy versus Copy linked SQL target storage groups
USE CASE 2B IMPACT OF NO-COPY VS COPY MODE LINKED TARGET SNAPSHOTS WITH
WORKLOAD ON BOTH PRODUCTION AND MOUNT HOSTS
Objectives:
This use case is different from Use Case 2A in that only 30 SnapVX replicas were created, instead of 256 SnapVX replicas, but an
OLTP workload was run on the linked storage group that belongs to the target host set with the Bronze SLO. The workload was
kicked off on 30th relinked snap in both cases. Also note that the 30 Snaps with COPY mode linked targets were created after at least
the first SnapVX replica was linked and fully defined, and copied over to the target storage group 6. All subsequent relinks were to the
newly created snaps. The mount host was running a similar OLTP workload on bronze SLO compared to similar OLTP workload on the
production host running Gold SLO.
Hence this test shows the impact of running workload on the linked target replica in No-copy mode and Copy-mode replica, with
different SLOs set on the storage groups. Both in No-Copy mode and Copy mode tests, the 30 Snaps were created and linked to
target storage group every 30 seconds, while an OLTP workload was running on the production SQL Server and target mount SQL
Server for 3 hours. The copy-mode tracks were allocated and created asynchronously in the background for new changes being
generated for the OLTP workload. This test case used the Bronze SLO for SQL Server transaction LOG storage group.
Test case execution steps:
1. Run OLTP workload on Production SQL database for 3 hours, with 30 SnapVX snapshots created every 30 seconds.
2. Run parallel OLTP workload on Mount SQL database on linked target storage group in no-copy mode.
3. Gather SQL performance statistics.
4. Repeat the above steps, but this time with new set of 30 SnapVX snapshots created and linked to target storage group in copy
mode.
6
The target storage groups contain the linked-target devices of Productions snapshots. They should be added to a masking view to make the target
devices accessible to the Mount host.
31
Note before creating the 30 SnapVX snapshots in copy mode, one initial full copy of the production volumes were defined and
copied (defined) over to the target volumes. This creates a situation, where following copy-mode SnapVX snapshots when
recreated and relinked had fewer tracks to be asynchronously copied over.
5. Gather SQL performance statistics for this 3 hour run as well.
Results
Table 8 and Figure 21 show SQL Server Batch Requests/sec and SQL Response time for this test. As seen from Figure 21, the SQL
database on the mount host met the expectations of the Bronze SLO. Note that the Average SQL Batch Requests/sec for Target (NO
COPY) in the Bronze SLO is 1976 while that of Target (COPY) in Bronze SLO is 1453. The reason is that the mount volumes in the
NO COPY state still share tracks with the production SnapVX snapshots and production volumes in the Gold SLO.
Figure 21. SQL Server stats for Production and Mount for Copy and No-Copy (OLTP on Mount Host)
32
CONCLUSION
VMAX3 SnapVX local replication technology enables SQL administrators to meet their protection and backup needs with scale, speed,
and ease of use. SnapVX capability reduces host I/O and CPU overhead, allowing the database host to focus on servicing database
transactions. It not only helps reduce RPO and RTO, but also enables multiple copies of the production database for
test/development/reporting purposes, with the added benefits of reduced array space usage.
REFERENCES
EMC VMAX3 Family with HYPERMAX OS Product Guide
Unisphere for VMAX Documentation set
EMC Unisphere for VMAX Database Storage Analyzer
EMC VMAX3 Local Replication Tech Note
Deployment Best Practice for Microsoft SQL Server with VMAX3 SLO Management
33
APPENDIXES
APPENDIX I - CONFIGURING SQL SERVER DATABASE STORAGE GROUPS FOR REPLICATION
VMAX3 TimeFinder SnapVX and SRDF allow the use of VMAX3 Auto-Provisioning Groups Storage Groups to provision storage for
SQL Server database and also to create Enginuity Consistent Assist based write-order-consistent snapshots. Any changes to SQL
Server database provisioning using these storage groups will also be reflected into any new snapshots created after that, making it
very easy to manage database growth. This simplifies configuration and provisioning SQL Server database storage for data
protection, availability and recoverability.
Cascading DATA and LOG into a parent SG allows the creation of restartable copies of the database. Separating transaction logs from
this group allows independent management of data protection for transactions logs, while providing desired control over SLO
management. Figure 22 shows how to provision storage for SQL Server DATA and transaction logs to ensure database recover SLAs
are achievable. Following this provisioning model along with the use cases described earlier provides proper deployment guidelines
for SQL Server databases on VMAX3 to database and storage administrators.
Figure 23. Creating snapshots for SQL Server database storage groups
34
Linking SQL Server database snapshots for backup offload or repurposing
Figure 24 shows how to select an existing snapshot to link to a target storage group for backup offloading or repurposing. By default
the snapshots are linked in space-saving no copy mode where copy operation is differed until the source tracks are written. If the full
copy is desired, the select the Copy checkbox. One snapshot can be linked to multiple targets storage groups. If relink to the same
target storage group is desired, select the existing target storage group option.
Figure 24. Linking SQL Server database snapshots for backup offload or repurposing
35
Creating cascaded snapshot from an existing snapshot
TimeFinder Snap VX allows creating snaps from an existing snapshot for repurposing the same point-in-time copy for other uses.
Figure 26 shows how to do so.
o Writes to the R1 devices are grouped into cycles. The capture cycle is the cycle that accepts new writes to R1 devices
while it is open. The transmit cycle is a cycle that was closed for updates and its data is being sent from the local to
the remote array. The receive cycle on the remote array receives the data from the transmit cycle. The destaged
cycle on the remote array destages the data to the R2 devices. SRDF software makes sure to only destage full cycles to
the R2 devices.
- The default time for the capture cycle to remain open for writes is 15 seconds, though it can be set differently.
36
- In legacy mode (at least one of the arrays is not a VMAX3) cycle time can increase during peak workloads as
more data needs to be transferred over the links. After the peak, the cycle time will go back to its set time (default
of 15 seconds).
- In multi-cycle mode (both arrays are VMAX3) cycle time remains the same, though during peak workload more
than one cycle can be waiting on the R1 array to be transmitted.
- While the capture cycle is open, only the latest update to the same storage location will be sent to the R2, saving
bandwidth. This feature is called write-folding.
- Write-order fidelity is maintained between cycles. For example, two dependent I/Os will always be in the same
cycle, or the first of the I/Os in one cycle and the dependent I/O in the next.
- To limit VMAX3 cache usage by capture cycle during peak workload time and to avoid stopping replication due to
too many outstanding I/Os, VMAX3 offers a Delta Set Extension (DSE) pool which is local storage on the source
side that can help buffer outstanding data to target during peak times.
o The R2 target devices maintain a consistent replica of the R1 devices, though slightly behind, depend on how fast the
links can transmit the cycles and the cycle time. For example, when cycles are received every 15 seconds at the remote
storage array its data will be 15 seconds behind production (if transmit cycle was fully received), or 30 seconds behind
(if transmit cycle was not fully received it will be discarded during failover to maintain R2 consistency).
o Consistency should always be enabled when protecting databases and applications with SRDF/A to make sure the R2
devices create a consistent restartable replica.
SRDF Adaptive Copy (SRDF/ACP) mode allows bulk transfers of data between source and target devices without maintaining
write-order fidelity and without write performance impact to source devices.
o While SRDF/ACP is not valid for ongoing consistent replications, it is a good way to transfer changed data in bulk
between source and target devices after replications were suspended for an elongated period of time, accumulating
many changes on the source. ACP mode can be maintained until a certain skew of leftover changes to transmit is
achieved. Once the amount of changed data has been reduced, the SRDF mode can be changed to Sync or Async as
appropriate.
o SRDF/ACP is also good for migrations (also referred to as SRDF Data Mobility) as it allows a point-in-time data push
between source and target devices.
SRDF Topologies
A two-site SRDF topology includes SRDF sessions in SRDF/S, SRDF/A, and/or SRDF/ACP between two storage arrays, where each
RDF group can be set in a different mode, and each array may contain R1 and R2 devices of different groups.
Three-site SRDF topologies include:
Concurrent SRDF: Concurrent SRDF is a three-site topology in which replication take place from site A simultaneously to site B
and site C. Source R1 devices are replicated simultaneously to two different sets of R2 target devices on two different remote
arrays. For example, one SRDF group can be set as SRDF/S replicating to a near site and the other as SRDF/A replicating to a
far site.
Cascaded SRDF: Cascaded SRDF is a three-site topology in which replication take place from site A to site B, and from there to
site C. R1 devices in site A replicate to site B to a set of devices called R21. R21 devices behave as R2 to site A, and as R1 to
site C. Site C has the R2 devices. In this topology, site B holds the full capacity of the replicated data and if site A fails and
Production operations continue on site C, site B can turn into the DR site for site C.
SRDF/EDP: Extended data protection SRDF topology is similar to cascaded SRDF. Site A replicates to site B, and from there to
site C. However, in EDP, site B does not hold R21 devices with real capacity. This topology offers capacity and cost savings, as
site B only uses cache to receive the replicated data from site A and transfer it to site C.
SRDF/Star: SRDF/Star offers an intelligent three-site topology similar to concurrent SRDF, where site A replicates
simultaneously to site B and site C. However, if site A fails, site B and site C can communicate to merge the changes and resume
DR. For example, SRDF/Star replication between site A and B uses SRDF/S and replication between site A and C uses SRDF/A. If
37
site A fails, site B can send the remaining changes to site C for a no-data-loss solution at any distance. Site B can become a DR
site for site C afterwards, until site A can come back.
SRDF/AR: SRDF Automatic Replication can be set as either a two or a three-site replication topology. It offers slower replication
when network bandwidth is limited and without performance overhead. In two-site topology, SRDF/AR uses TimeFinder to create
a PiT replica of production on site A, then uses SRDF to replicate it to site B, where another TimeFinder replica is created as a
gold copy. Then the process repeats. In a three-site topology site A replicates to Site B using SRDF/S. In site B TimeFinder is
used to create a replica which is then replicated to site C. In site C the gold copy replica is created and the process repeats itself.
There are also 4-site topologies, though they are beyond the scope of this paper. For full details on SRDF modes, topologies, and
other details refer to the VMAX3 Family with HYPERMAX OS Product Guide.
# symsnapvx -sid 536 -sg SQLDB_SG -name SQLDB_SG snapshot_name SQLDB_Snap_1 establish [-ttl delta <#of days>]
Execute Establish operation for Storage Group SQLDB_SG (y/[n]) ? y
Establish operation execution is in progress for the storage group SQLDB_SG. Please wait...
Polling for Establish.............................................Started.
Polling for Establish.............................................Done.
Polling for Activate..............................................Started.
Polling for Activate..............................................Done.
Flgs:
(F)ailed : X = Failed, . = No Failure
(L)ink : X = Link Exists, . = No Link Exists
(R)estore : X = Restore Active, . = No Restore Active
(G)CM : X = GCM, . = Non-GCM
7
The target storage groups contain the linked-target devices of Productions snapshots. They should be added to a masking view to make the target
devices accessible to the Mount host.
38
Linking the snap to a storage group
This command shows how to link a snap to target storage group 8. By default, linking is done using no_copy mode.
# symsnapvx -sid 536 -sg SQLDB_SG -snapshot_name SQLDB_Snap_1 -lnsg SQLDB_MNT link [-copy]
Flgs:
(F)ailed : F = Force Failed, X = Failed, . = No Failure
(C)opy : I = CopyInProg, C = Copied, D = Copied/Destaged, . = NoCopy Link
(M)odified : X = Modified Target Data, . = Not Modified
(D)efined : X = All Tracks Defined, . = Define in progress
8
The target storage groups contain the linked-target devices of Productions snapshots. They should be added to a masking view to make the target
devices accessible to the Mount host.
39
Restore from a snap
This command shows how to restore a storage group from a point-in-time snap. Once the restore operation completes, the restore
session can be terminated while keeping the original point-in-time snap for subsequent use.
# symsnapvx -sid 536 -sg SQLDB_SG -snapshot_name SQLDB_Snap_1 terminate restored
# symsnapvx -sid 536 -sg SQLDB_SG -snapshot_name SQLDB_Snap_1 verify summary
# symsnapvx -sid 536 -sg SQLDB_SG -snapshot_name SQLDB_Snap_1 terminate -restored
40
Listing the status of SRDF groups
This command shows how to get information about the existing SRDF group.
# symrdf -sid 536 list -rdfg 20
Symmetrix ID: 000196700536
41
Set and clear volume flags
Flush any pending cached file system data to disk
42
APPENDIX VI SCRIPTING ATTACH OR DETACH FOR A SQL SERVER DATABASE USING
WINDOWS POWERSHELL
Figure 28 and Figure 29 illustrate the steps for attaching or detaching an SQL Server database.
Figure 28. Detach a sqlserver database, using Windows PowerShell and sp_detach_db
43
APPENDIX VII EXAMPLE OUTPUTS
Example 1 in Figure 30 shows a listing of an SQL database storage group and its storage device details.
Example 2 in Figure 31 shows a listing of allocations and tracks allocations for a range of devices (0020-0024) that are part of an
SQL database storage group.
44
Example 3 in Figure 32 shows how to create a TimeFinder SnapVX snapshot for an entire storage group. In cases where the SQL
server database is the entire storage group, SnapVX has the ability to perform SnapVX operations at the storage group level itself.
This simplifies the task for an SQL server admin to perform replication for a group of devices.
Example 4 in Figure 33 shows a SnapVX listing details for storage group snapsource.
Example 5 in Figure 34 illustrates linking a TimeFinder SnapVX snapshot in default no-copy mode to a target SQL storage group.
45
Example 6 in Figure 35 lists the output of a linked TimeFinder SnapVX snapshot in copy mode to a target SQL storage group.
Example 7 in Figure 36 illustrates re-linking a target SQL storage group to a TimeFinder SnapVX snapshot. Re-linking provides
incremental refreshes of a linked target storage group from a different SnapVX snapshot with a different PiT.
46
Example 8 in Figure 37 shows a TimeFinder SnapVX snapshot list of a linked no-copy storage group.
Example 9 in Figure 38 shows a copied and de-staged linked storage group output with -detail.
Example 10 in Figure 39 shows source volumes (500GB each) which will be participating in a TimeFinder SnapVX linked session in
copy mode. The linked target volumes (1TB each) are larger than the source volumes, as shown in Example 11. Example 10 and
Example 11 together show how linked target volumes in copy mode that are larger than the source volume can be mounted back to
the original host.
47
Figure 39. Source volumes of sizes (500GB each)
Example 11 in Figure 40 shows linked target windows volumes of size 1TB that are targets of source volumes of smaller size (500GB
each), see above. The targets can be extended using Windows disk management to realize their full capacity. This helps meet
capacity growth needs (although not seamless) with LUN expansion to the database and might involve production I/O downtime
during switch-over to larger sized volumes.
Figure 40. Linked targets can be extended using windows disk management
48
Example 12 in Figure 41 demonstrates the Symdev free all command. This command frees all allocations on unlinked target
windows volumes. The Symdev free all command provides the ability to wipe volumes of all data. The sleep mentioned in the
script is an arbitrary value. In this example, device ranges 00020 through 00024 are freed of all allocations, and the allocations are
reclaimed into the storage resource pool (SRP). The free all command should be used with utmost caution in production
environments and only with the knowledge of administrators. The example highlights the end result of free all in Pool allocated
tracks % for the range of devices to be at zero.
49