SQL Server To SQL Server PDW: Migration Guide (AU3)
SQL Server To SQL Server PDW: Migration Guide (AU3)
to SQL
Server PDW
Migration Guide
(AU3)
Contents
4 Summary Statement
4 Introduction
4 SQL Server Family of Products
10 PDW Community
10 Migration Planning
11 Determine Candidates for Migration
11 Migration Considerations
12 Migration Approaches
14 Table Geometry
36 Syntax Differences
51 Conclusion
51 Feedback
52 Appendix
SQL Server Microsoft offers a wide range of SQL Server products to accommodate
the different business needs and technical requirements to provide a
Products
Compact Edition
Microsoft SQL Server Compact 4.0 is a compact database ideal for
embedding in desktop and web applications. SQL Server Compact 4.0
gives developers a common programming model with other SQL Server
editions for developing both native and managed applications. SQL
Server Compact provides relational database functionality in a small
footprint.
Express Edition
SQL Server 2014 Express Edition is available for free from Microsoft and
provides a powerful database engine ideal for embedded applications or
for redistribution with other solutions. Independent software vendors
use it to build desktop and data-driven applications. If you need more
advanced database features and support for greater than 10 GB
databases, SQL Server Express is fully compatible with other editions of
SQL Server can be seamlessly upgraded to enterprise versions of SQL
Standard Edition
SQL Server 2014 Standard edition is a robust data management and
business intelligence database for departments and small workgroups to
support a wide variety of applications. SQL Server Standard Edition also
supports common development tools for on premise and cloud.
Enabling effective database management with minimal IT resources. SQL
Server Standard Edition is compatible with other editions.
Web Edition
SQL Server 2014 Web edition is a low total-cost-of-ownership option for
Web hosters and Web VAPs to provide scalability, affordability, and
manageability capabilities for small to large scale Web properties.
Enterprise Edition
SQL Server 2014 Enterprise edition delivers comprehensive high-end
datacenter capabilities with blazing-fast performance, unlimited
virtualization, and end-to-end business intelligence. Enabling high
service levels for mission-critical workloads and end user access to data
insights.
SMP and
deployment, Parallel Data Warehouse brings Massively Parallel
Processing (MPP) to the world of SQL Server. Essentially parallelizing and
MPP
distributing the processing across multiple SMP compute nodes. SQL
Server Parallel Data Warehouse is only available as part of Microsofts
Analytics Platform System (APS) appliance.
Before diving into the PDW architecture, let us first understand the
differences between the architectures mentioned above.
In addition, every node maintains its own lock table and buffer pool,
eliminating the need for complicated locking and software or hardware
consistency mechanisms. Because shared nothing does not typically
have a severe bus or resource contention it can be made to scale to
hundreds or even thousands of machines.
Admin Console
The Admin Console is a web application that visualises the
appliance state, health, and performance information.
MPP Engine
The MPP Engine is the brain of the SQL Server Parallel Data Warehouse
(PDW) and delivers the Massively Parallel Processing (MPP) capabilities
by doing the following:
The Shell database manages the metadata for all distributed user
databases.
TempDB contains the metadata for all user temporary tables across
the appliance.
Master is the master table for SQL Server on the Control node
Configuration Tool
The Configuration Tool (dwconfig.exe) is used by appliance
administrators to configure the Analytics Platform System.
Domain Controller
Performs authentication among the Analytics Platform System
nodes, and manages the authentication of SQL Server PDW
Windows Authentication logins
The DHCP service creates IP addresses so that the hosts within the
appliance domain can join the appliance network without having a
pre-configured IP address.
http://www.microsoft.com/en-us/download/details.aspx?id=45294
http://www.microsoftvirtualacademy.com/training-courses/big-
data-with-the-microsoft-analytics-platform-system
Candidates Therefore the first step for migration planning is to identify and
determine which objects and processes you wish to migrate.
Staging Databases
Archive Databases
Tables
Views
Stored Procedures
Migration After determining candidates for migration, you should also consider
the following areas and determine how you wish to manage them as
Considerations part of the migration. For example, you may be planning to migrate
multiple legacy SQL Server data marts onto a single SQL Server Parallel
Data Warehouse, doing so may increase the importance of the APS
appliance and therefore require an increase in availability and disaster
recovery.
Mission Critical
Database Design
Migration There are many different options to migrating data and applications
from an existing SQL Server system to a SQL Server Parallel Data
SQL Server discusses main migration steps, PDW Migration Advisor, table geometry,
and optimization within PDW.
to PDW
Migration
Main This white paper provides you guidance in migrating the objects and
highlights areas which are not supported within SQL Server PDW. To aid
Migration with identifying the migration areas to focus on, you can make use of
the PDW Migration Advisor.
Steps Once all objects have been migrated, the next step would be to migrate
any integration or ETL jobs.
PDW The PDW Migration Advisor (PDWMA) is a tool that can be used to
inspect your Microsoft SQL Server database schemas in order to display
Migration the changes that need to be made in order to migrate the database to
Microsoft SQL Server PDW.
Advisor
APS The APS Migration Utility is a tool that can be used to automatically
migrate the Microsoft SQL Server database schemas and data, whilst
Migration providing the ability to specifiy the table geometry during the migration
process.
Table As we learnt within the previous sections, SQL Server Parallel Data
Warehouse (PDW) is a Massively Parallel Processing (MPP) appliance
Geometry which follows the Shared Nothing Architecture. This means that we need
to distribute or spread the data across the compute nodes in order to
benefit from the storage and query processing of the MPP architecture.
SQL Server PDW provides two options to define how the data can be
partitioned, these are distributed and replicated tables.
Distributed Tables
A distributed table is a table in which all rows have been spread across
the SQL Server PDW Appliance compute nodes based upon a row hash
function. Each row of the table is placed on a single distribution,
assigned by a deterministic hash algorithm taking as input the value
contained within the defined distribution column.
Distributed tables are what gives SQL Server PDW the ability to scale out
the processing of a query across multiple compute nodes.
Because you are duplicating all data for each replicated table on each
compute node, you will require extra storage, equivalent to the size of a
single table multiplied by the number of compute nodes, therefore a
table containing 100MB of data on a PDW appliance with 10 compute
nodes will require 1GB of storage.
When loading new data, or updating existing data you will require far
more resources to complete the task, than if it were to be executed
against a distributed table. This is due to the fact that each operation
will need to be executed on each compute node. Therefore it is essential
that you take into account the extra overhead when performing ETL/ELT
style operations against a replicated table.
DISTRIBUTION FIRST
What we mean by Distribution First is that you should always begin
with designing your data model to be distributed, as if you do not, you
will not be able to realize the benefits of the Massive Parallel Processing
(MPP) architecture. Therefore always begin with a distributed table.
Access
Distribution
Volatility
The ideal distribution key is one that meets all three criteria. The reality
is that you are often faced with trading one off from the other.
within PDW resources however these are still finite and overall query and
concurrency performance will benefit from optimization of your data
model using one or more of the following options.
Replicated Table
A replicated table is a table in which a complete duplicate of all data is
stored on each of the SQL Server PDW compute nodes. Replicating a
table can provide the ability for it to become distribution compatible
when joined to another table. This removes the need to perform data
movement via a SHUFFLE MOVE operation at the expense of data
storage and load performance.
The ideal candidate for replicating a table is one that is small in size, i.e.
less than 5GB, changes infrequently and has been proven to be
distribution incompatible. Before selecting to replicate a table, ensure
that you have exhausted all options for keeping the table distributed.
The dimensional model above depicts one fact table and four dimension
tables. Within this example the customer dimension is significantly larger
Clustered Indexes
A Clustered Index physically orders the pages of the data in the table. If
the table is distributed, then the physical ordering of the data pages is
applied to each of the distributions individually. If the table is replicated,
then the physical ordering of the pages is applied to each of replicate
tables on each of the compute nodes.
All data pages within a clustered index table are linked to the next and
previous data pages in a doubly linked list to provide ordered scanning.
In other words, the records in the physical structure are sorted according
to the elds that correspond to the columns used in the index.
You can only have one clustered index on a table, because the table
cannot be ordered in more than one direction.
The following diagram shows what a Clustered Index table might look
like for a table containing city names:
PREDICATE QUERIES
Data within the table is clustered on value and an index tree constructed
for direct access to these data pages, therefore clustered indexes will
provide the minimal amount of I/O required in which to satisfy the
needs of a predicate query. Consider using a clustered index on
column(s) commonly used as a valued predicate.
RANGE QUERIES
All data within the table is clustered and ordered on value, which
provides an efficient method to retrieve data based upon a range query.
Therefore consider using a clustered index on column(s) commonly used
within a range predicate, for example where a given date is between two
dates.
Non-Clustered Indexes
Non-Clustered Indexes are fully independent of the underlying table and
up to 999 can be applied to both heap and clustered index tables. Unlike
Clustered Indexes, a Non-Clustered Index is completely separate from
the data, and on the index leaf page, there are pointers to the data
pages. The pointer from the index to a data row is known as a row
locator. The row locator is structured differently depending on whether
the underlying table is a heap table or a clustered index table.
The row locator for a heap table will contain a pointer to the actual data
page, plus the row number within the data page. If that row is updated
and a data page split is required (if the updated row requires additional
space), a forwarding pointer is left to point to the new data page and
row. Over time and frequent updates can cause poor performance of the
Non-Clustered Indexes, requiring maintenance via a rebuild of the
underlying table and indexes.
The row locator for a clustered index table is different as it will contain
the cluster key from the cluster index. To find a value, you first traverse
the non-clustered Index, returning the cluster key, which you then use to
traverse the clustered index to reach the actual data. The overhead to
performing this is minimal as long as you keep the clustering key
optimal. While scanning two indexes is more work than just having a
pointer to the physical data page location, overall it can be considered
better because minimal reorganization is required for any modification
of the values in the table. This benet is only true if the cluster key rarely,
or never, changes.
The same reasons for selecting a clustered index are also true for a non-
clustered index with the additional rules that they should only be used
to optimize queries which return small result sets or when they could be
utilized for covering indexes, reducing the base table IO requirements
(However with the advent of Clustered ColumnStore Indexes this is less
relevant).
KEY CHARACTERISTICS
A clustered columnstore index in SQL Server Parallel Data Warehouse
(PDW) has the following characteristics:
Includes all columns in the table and is the method for storing the
entire table.
Fully updateable.
Can be partitioned.
Queries often select only a few columns from a table, requiring less
total I/O resources.
KEY TERMS
The following are key terms and concepts that you will need to know in
order to better understand how to use clustered columnstore indexes.
Partitioning
Partitioning allows you to physically divide up tables and indexes
horizontally, so that a groups of data are mapped into individual
partitions. Even though the table has been physically divided, the
partitioned table is treated as a single logical entity when queries or
updates are performed on the data.
BENEFITS
Partitioning large tables within SQL Server Parallel Data Warehouse
(PDW) can have the following manageability and performance benefits.
For distributed tables, table partitions determine how rows are grouped
and physically stored within each distribution. This means that data is
first moved to the correct distribution before determining which
partition the row will be physically stored.
For clustered columnstore indexes, every partition will contain their own
columnstore and deltastore. To achieve the best possible performance
and maximise compression, it is recommended that you ensure each
partition for each distribution is sized so that it contains more than 1
million rows.
Within SQL Server you are able to create a table with a constraint on the
partitioning column and then switch that table into a partition. Since
PDW does not support constraints, we do not support this type of
partition switching method, but rather would require the source table to
be partitioned with matching ranges.
Statistics Collection
SQL Server Parallel Data Warehouse (PDW) uses a cost based query
optimizer and statistics to generate query execution plans to improve
query performance. Up-to-date statistics ensures the most accurate
estimates when calculating the cost of data movement and query
operations.
SQL Server PDW stores two sets of statistics at different levels within the
appliance. One set exists at the control node and the other set exists on
each of the compute nodes.
When you create statistics on a replicated table, SQL Server PDW creates
a statistics object for each compute node, but since each compute node
will contain the same statistics, the control node will only copy one
statistics object from one compute node.
When you create statistics on an external table, SQL Server PDW will first
import the required data into PDW so that it can then compute the
statistics. The results are stored on the control node.
Statistics stored on the control node are not made available to client
applications. The DMVs report only statistics on the compute nodes and
so do not report statistics on the control node.
Partition Column
Join Columns
Aggregate Columns
Database
Schema
Objects
Type Migrating SQL Server Data Types is relatively easy as most data types
are the same between SQL Server and PDW.
NUMERIC
Translates to:
However its possible that issues may arise when loading data. In many
countries the date and time formats are different from the default of
ymd. In those cases the format needs to be handled in order to avoid
loading errors via DWloader. DWloader is one of the ways to load data
into PDW from text files. Text files however do not contain any type
information and so its possible that the format may be miss interpreted
and cause errors forcing rows to be skipped (rejected) and logged
during data loads. Dwloader.exe has numerous parameters in order to
handle the type conversion into the appropriate format.
Example:
The date format in this case assumes that all the columns have the date
values ordered as day (d), month (m), year (y). Its also possible that
formats differ between columns. In that case a format file will have to
be used that specifies the input format on a per column basis. The
format file can be specified with the dt parameter.
LastReceiptDate=ymd
ModifiedDate=mdy
When importing text data its also possible that due to errors in the
source data the date format for a specific column varies from row to row
and this can either lead to errors or loading incorrect data. There may
also be corruption errors in the input data that need to be detected. In
such cases dwloader.exe offers some options to detect and log the rows
that caused a load error. This would then allow bayou to remediate the
data and reload.
-rv reject_value
Specifies the number or percentage of row rejections to allow
before halting the load. The -rt option determines if reject_value
refers to the number of rows or the percentage of rows. The
default reject_value is 0. When used with -rt value, the loader
stops the load when the rejected row count exceeds reject_value.
When use with -rt percentage, the loader computes the
percentage at intervals (-rs option). Therefore, the percentage of
failed rows can exceed reject_value.
-rs reject_sample_size
Used with the -rt percentage option to specify the incremental
percentage checks. For example, if reject_sample_size is 1000, the
Loader will calculate the percentage of failed rows after it has
attempted to load 1000 rows. It recalculates the percentage of
failed rows after it attempts to load each additional 1000 rows.
-R load_failure_file_name
If there are load failures, dwloader stores the row that failed to
load and the failure description the failure information in a file
named load_failure_file_name. If this file already exists, dwloader
-e character_encoding
Specifies a character-encoding type for the data to be loaded
from the data file. Options are ASCII (default), UTF8, UTF16, or
UTF16BE, where UTF16 is little endian and UTF16BE is big endian.
These options are case insensitive.
-m
Use multi-transaction mode for the second phase of loading;
when loading data from the staging table into a distributed
table.
TIMESTAMP
Timestamp in PDW is treated differently to SQL Server. In SQL server the
current practice is to use the rowversion keyword instead of the
timestamp keyword however the two commands are functionally
equivalent in SQL Server. In PDW there is no direct functional equivalent
to the rowversion that auto updates if the row is changed. In PDW
partially similar functionality can be achieved by using the datetime2
The issue can sometimes arise when data needs to be migrated to the
data warehouse and a column is needed to contain the source
rowversion data. The binary data type can provide a solution here.
For example:
For example:
TEXT
VARCHAR(MAX)
NVARCHAR(MAX)
VARBINARY(MAX)
XML
When dealing with these data types there are several options that need
to be evaluated.
2. Break up BLOB data into logical fields and then migrate. In many
cases VARCHAR(MAX) columns are being used to store data that
could be broken down further into different columns and different
data types. While this is not always possible a simple investigation
of the actual data in the VARCHAR(MAX) fields can yield a lot of
information and provide more choices. When following this
approach you also need to consider the row size limitations.
For example:
If the size of the data is still too big to fit in the row and including it
would exceed the 8KB per page limit then another potential option is to
have the large text column in a separate table and related to the main
row by an ID.
Base raw data can be migrated from SQL Server and stored in existing
PDW data types with no loss. The data can be retrieved via normal T-
SQL queries and the spatial processing can be done via SQL server.
Importing Data - Spatial Data can be converted into data which can then
be stored within a PDW compatible data type such as an NVARCHAR()
or VARCHAR(). Importing Spatial Types and converting them to
VARCHAR can be accomplished via SSIS, CLR or custom data conversion
application/script.
Retrieving Data - SQL Server can connect to PDW via a linked server
connection, querying data directly or retrieving data via SSIS. Both
connection types operate over the fast Infiniband network to support
fast data transfers.
Linked servers can be queried within SQL server in much the same
fashion as tables. A view would then be created to convert back to a
spatial type to support geometric calculations within SQL Server. Overall
this solution would be cost effective to implement since the heavy duty
data retrieval and aggregations work is done by PDW in a scalable
fashion. Geospatial calculations and any post processing can be
performed on less expensive SMP servers running SQL Server Standard
Edition.
Syntax There are syntax differences to be aware of between SQL Server and
PDW. In most cases the extra complexity needed in the SQL Server
Differences syntax has been removed as its not applicable for PDW. This reduced
syntax leveraging the appliance model is both easier to learn as well as
less complex to support in the long term.
DISTRIBUTED_SIZE
The amount of data in GB allocated for distributed tables. This is
the amount of data allocated over the entire appliance. Meaning
that in a on a per compute node basis the disk space allocated
equals the DISTRIBUTED_SIZE divided by the number of compute
nodes in the appliance.
LOG_SIZE
The amount of data in GB allocated for the transaction log. This
is the amount of data allocated over the entire appliance.
Meaning that in a on a per compute node basis the disk space
allocated equals the LOG_SIZE divided by the number of
compute nodes in the appliance.
AUTOGROW
Governs if the database size is fixed or if the appliance will
automatically increase the disk allocation for the database as
needed until the amount of physical disk space in the appliance
is consumed. It is recommended that the database be created
big enough before loading data in order to minimize auto grow
events that can impact data loading performance.
Tables with less than 5GB of heap data are typically replicated.
Once the table sizes are known and categorized then its a simple matter
of adding a percentage for growth on a year to year basis. It is also
important to allow sufficient space to be able to make a copy of the
largest distributed table should the need arise. If partitioning is used on
the fact table(s) since its so large then sufficient space for at least a few
extra partitions should be allocated. These simple guidelines together
with the categorization of tables between distributed vs. replicated will
provide sufficient insight into sizing the database.
For example:
WITH
(
DISTRIBUTION = REPLICATE -- <- Nominates the
TABLE as replicated
);
Specifying Partitions
Partitions must be specified as part of the create table statement. The
partition scheme and partition function syntax that is required for SQL
Server is not required for PDW. The partition functions and schemes are
setup automatically by PDW.
For example:
For example:
Migrating Triggers
PDW does not support Triggers functionality in the current release.
Triggers are mainly used in OLTP solutions and therefore have very
limited application in data warehouse solutions. Triggers in OLTP
solutions are used for implementing cascading update functionality or
other logic based on changed rows. In data warehouse solutions data
changes can be implemented via operational data stores (ODS) or in the
ETL/ELT processes that cleanse and layout data for maximum retrieval
query and processing performance.
USE AdventureWorks2012;
GO
CREATE SCHEMA Sprockets AUTHORIZATION mwinter
CREATE TABLE NineProngs
(
source int
, cost int
, partnumber int
)
GRANT SELECT ON SCHEMA::Sprockets TO julius
DENY SELECT ON SCHEMA::Sprockets TO danny;
GO
USE AdventureWorks2012;
CREATE SCHEMA Sprockets AUTHORIZATION mwinter
GO
CREATE TABLE Sprockets.NineProngs
(
source int
, cost int
, partnumber int
)
WITH (DISTRIBUTION = HASH(source))
GRANT SELECT ON SCHEMA::Sprockets TO julius
DENY SELECT ON SCHEMA::Sprockets TO danny
GO
Views
Views operate in a similar fashion in PDW as they do in SQL Server. In
PDW the view definitions are stored in the control node and expanded
during query compilation. As such schema binding and materialized
views are not supported.
SELECT
c.ID
, c.FirstName
, c.LastName
, e.JobTitle
, a.AddressLine1
, a.City
, a.PostalCode
INTO dbo.AddressesList
FROM Person.Person AS c
INNER JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = c.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS bea
ON e.BusinessEntityID = bea.BusinessEntityID
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
INNER JOIN Person.StateProvince as sp
ON sp.StateProvinceID = a.StateProvinceID
The body of the select statement remains the same within the PDW
syntax translation. All that has changed is the addition of the table
creation portion of the syntax. These relatively minor differences in
syntax can provide orders of magnitude better performance between
SMP and MPP platforms.
UPDATE r
SET update_ts = CURRENT_TIMESTAMP
FROM region AS r
WHERE EXISTS (SELECT 1
FROM nation AS n
WHERE n.region_key = r.region_key
AND n.nation_key = 0)
UPDATE region
SET update_ts = CURRENT_TIMESTAMP
FROM nation AS n
WHERE n.region_key = region.region_key
AND n.nation_key = 0
UPDATE c
SET population_ct = population_ct + 1
FROM census AS c
INNER JOIN location AS l
ON c.location_key = l.location_key
WHERE l.country_nm = 'Australia'
AND l.state_nm = 'Victoria'
UPDATE census
SET population_ct = population_ct + 1
FROM location AS l
WHERE census.location_key = l.location_key
WHERE l.country_nm = 'Australia'
AND l.state_nm = 'Victoria'
Update statements cannot contain more than one join in the FROM
clause. For example, the following statement is not allowed:
UPDATE order_line
SET dispatch_ts = CURRENT_TIMESTAMP
FROM customer AS c
, order AS o
WHERE order_line.order_key = o.order_key
AND order_line.customer_key = c.customer_key
AND o.order_no = 'MS12345678-90'
AND c.customer_nm = 'Microsoft'
UPDATE order_line
SET dispatch_ts = CURRENT_TIMESTAMP
FROM order_product AS op
WHERE order_line.order_key = op.order_key
AND order_line.product_key = p.product_key
AND op.order_no = 'MS12345678-90'
AND op.supplier_nm = 'Microsoft'
Delete statements are executed on a row by row basis and are logged
operations and depending on indexing complexity a lot of database
pages need to be updated and freed. This can cause fragmentation as
well as slow performance to delete large amounts of data due to the
large amount of updates needed.
In the example below the pre 1998 date would be only 20% of say a 10
billion row table or approximately 2 billion rows. In order to delete this
amount of data there would be four possible options.
IF @count > 0
BEGIN
DELETE FROM dbo.BigFactTable
WHERE ID IN (SELECT ID FROM #DeleteInc)
END
END
@@ROWCOUNT Workaround
PDW currently does not support @@ROWCOUNT or ROWCOUNT_BIG
functions. If you need to obtain the number of rows affected by the last
INSERT, UPDATE or DELETE statement, you can use the following SQL:
-- @@ROWCOUNT
SELECT CASE WHEN MAX(distribution_id) = -1
THEN SUM(DISTINCT row_count)
ELSE SUM(row_count)
END AS row_count
FROM sys.dm_pdw_sql_requests
WHERE row_count <> -1
AND request_id IN (SELECT TOP 1
request_id
FROM sys.dm_pdw_exec_requests
WHERE session_id = SESSION_ID()
ORDER BY end_time DESC)
The following stored procedure features are not supported in SQL Server
PDW:
Default parameters
Optimizer Hints Optimizer hints should not be migrated from SQL Server to
PDW.
Syntax Comparisons
CREATE LOGIN
The CREATE LOGIN command is simplified substantially in PDW. Login
creation with a certificate or an asymmetric key is not supported in the
current release.
The following T-SQL functions can be used to query the SQL Server
security metadata, allowing you to perform command level validation on
individual users and related object permissions.
fn_my_permissions()
This lists out the permissions that a given user context has been granted
on a securable object.
For example:
SELECT DISTINCT
class_desc
FROM fn_builtin_permissions(default)
;
The example below lists the permissions that user "danny" has on the
object "table_1". The object could be a table, view, procedure etc. Since
the "fn_my_permissions()" function is executed in the current user
context the user context must be set and reset as needed. The user must
exist in the database.
This example lists the permissions that user "danny" has on the database
called "testdb".
Did this paper help you? Please give us your feedback. Tell us on a scale
Feedback of 1 (poor) to 5 (excellent), how would you rate this paper and why have
you given it this rating? For example:
This feedback will help us improve the quality of the white papers we
release.
Send feedback.
ALTER ALTER DATABASE Compatibility Level Not Applicable in PDW. Compatibility level in PDW has been set to
emulate SQL Server so that third party applications continue to work.
ALTER ALTER DATABASE File and Filegroup Options Not Applicable in PDW.
Files and Filegroups are automatically defined by PDW and no user
intervention is required.
BACKUP / RESTORE BACKUP SERVICE MASTER KEY Use BACKUP CERTIFICATE instead.
Collation Functions / Operators SQL Server Collation Name Not Applicable in PDW.
Operators (Compound) &= (Bitwise AND EQUALS) (T-SQL) Not Applicable in PDW.
Use expanded syntax instead