This post was authored by Casey Karst; Program Manager II; Data Systems.
We are pleased to announce that PolyBase in SQL Server 2016 and later can connect to Hadoop clusters with the hadoop.rpc.protection configuration set to Integrity, Privacy or Authentication. By default, PolyBase uses the Authentication setting, but to connect to more secure Hadoop clusters Integrity or Privacy are needed.
Supporting this configuration allows PolyBase to connect and query Hadoop clusters that have wire encryption turned on. This enables a secure connection between Hadoop and SQL Server; as well as, among the Hadoop Data Nodes.
To connect to a Hadoop cluster with the hadoop.rpc.protection set to privacy or integrity, you will need to alter the core-site.xml file that is installed with PolyBase. This file is generally found at C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf.
In order to use the new configuration, you will need to add a new property with the name hadoop.rpc.protection and a value of either privacy or integrity. These values must match the hadoop.rpc.protection configuration on your Hadoop cluster.
<!-- RPC Encryption information, PLEASE FILL VALUE IN ACCORDING TO HADOOP CLUSTER CONFIG --> <property> <name>hadoop.rpc.protection</name> <value>privacy</value> </property>
Note: When changing XML files, please ensure that you input the correct values and maintain the validity of the XML file format. If the changes are invalid, PolyBase will not run.
Note, PolyBase supports Hadoop encryption zones starting with SQL Server 2016 SP1 CU7 and SQL Server 2017 CU3. This functionality is not available in Azure SQL Data Warehouse, Azure SQL Database, or Analytics Platform System.
For more information, please visit documentation on PolyBase Configuration.