Data Virtualization with PolyBase for SQL Server 2022

Part of the SQL Server 2022 blog series.

Microsoft SQL Server 2022 introduces the newest version of PolyBase, and with it the capability to query data where it lives, virtualize data, and use REST APIs. REST APIs enable SQL Server to be both more flexible and lightweight while expanding its range of supported connectors and file formats.

SQL Server 2022 now supports CSV, Parquet, and Deltafiles stored on Azure Storage Account v2, Azure Data Lake Storage Gen2, or any simple storage service (S3)–compliant object storage—the last as an on-premises offering or in the cloud. Finally, SQL Server 2022 can now use Create External Table as Select (CETAS), together with commands like OPENROWSET, Create External Table (CET), and all the new T-SQL enhancements. SQL Server 2022 is a powerful data hub.

How does it work?

Data virtualization for SQL Server 2022 is both more flexible and easier to use now that its capabilities are inside the engine itself. The figure below gives a better understanding of the architecture:

Examples of S3-compliant object storage providers compatible with SQL Server 2022.
This figure provides a non-exhaustive list of S3-compliant object storage providers. SQL Server 2022 is compatible with any object storage provider compatible with S3-REST APIs.

OPENROWSET: Lightweight command that allows SQL engine to access data outside SQL Server, either a file or another database. Recommended for loading data or data exploration.

CREATE EXTERNAL TABLE (CET): Creates a table where the data stays in its original location outside of SQL Server, and when selected, the SQL engine will provide the requested data to the user. External table benefits from reusability and can leverage the use of statistics for better performance.

CREATE EXTERNAL TABLE as SELECT (CETAS): It performs a combination of operations in a single command. First, it allows SQL Server to transform and convert a given data stored inside or outside the database. Second, it then exports the data to a different location, either a network location or Azure. Finally, it creates an external table targeting the newly exported data.

These operations are secured by a combination of database master key and external credentials for simplified management.

If the data is stored on Azure Storage Account v2 (abs), Azure Data Lake Gen2 (ADLs), or an S3-compliant Object Storage, SQL Server 2022 will use the REST API implementation. If not, SQL Server 2022 will use PolyBase services—PolyBase services installation is required for both cases.

For a complete list of data sources, please refer to our CREATE EXTERNAL DATA SOURCE documentation.

External file support has also increased, and SQL Server 2022 now supports CSV, Parquet, and Delta type. Please find a complete list of supported external file formats in our CREATE EXTERNAL FILE FORMAT (TRANSACT-SQL) documentation.

Benefits

Major benefits of Data Virtualization with PolyBase on SQL Server 2022:

  • No data movement: Access the data where it is.
  • T-SQL language: Ability to leverage all the benefits of the T-SQL language, its commands, enhancements, and familiarity.
  • One source for all your data: Users and applications can use SQL Server 2022 as its single data source for all of the required data, while database administrators and data engineers have a single environment to maintain.
  • Security: Leverage SQL Server security features for granular permissions, credential management, and control.
  • Cost: PolyBase is available in all SQL Server 2022 editions.

Next steps

Download the latest release of SQL Server 2022 if you haven’t already done so and check out the SQL Server 2022 Overview and What’s New references.

Learn more

See the following SQL Server resources on Microsoft Learn: