Introducing UTF-8 support in SQL Server 2019 preview

December 18, 2018

With the first public preview of SQL Server 2019, we announced support for the widely used UTF-8 character encoding as an import or export encoding, and as database-level or column-level collation for string data. This is an asset for companies extending their businesses to a global scale, where the requirement of providing global multilingual database applications and services is critical to meet customer demands, and specific market regulations. The benefits of introducing UTF-8 support extend to scenarios where legacy applications require internationalization and use inline queries: the amount of changes and testing involved to convert an application and underlying database to UTF-16 can be costly, by requiring complex string processing logic that affect application performance.

To limit the amount of changes required for the above scenarios, UTF-8 is enabled in existing the data types CHAR and VARCHAR. String data is automatically encoded to UTF-8 when creating or changing an object’s collation to a collation with the “UTF8” suffix, for example from LATIN1_GENERAL_100_CI_AS_SC to LATIN1_GENERAL_100_CI_AS_SC_UTF8. Refer to Set or Change the Database Collation and Set or Change the Column Collation for more details on how to perform those changes. NCHAR and NVARCHAR remain unchanged and only allow UTF-16 encoding.

UTF-8 is only available to Windows collations that support supplementary characters, as introduced in SQL Server 2012. You can see all available UTF-8 collations by executing the bellow command in your SQL Server 2019 CTP:

SELECT Name, Description FROM fn_helpcollations() 
WHERE Name like '%UTF8';

Additionally, if your dataset uses primarily Latin characters, significant storage savings may also be achieved as compared to UTF-16 data types. For example, changing an existing column data type from NCHAR(10) to CHAR(10) using an UTF-8 enabled collation, translates into nearly 50 percent reduction in storage requirements. This is because NCHAR(10) requires 22 bytes for storage, whereas CHAR(10) requires 12 bytes for the same Unicode string.

Getting started

To get started with SQL Server 2019 preview, find download instructions on the SQL Server 2019 web page.
You can read more about Unicode support in SQL Server, including details on UTF-8 support.

Pedro Lopes

Senior Program Manager, SQL Server

See more articles from this author

PublishedApr 1

6 min read read

Provision Premium SSD v2 Storage for Microsoft SQL Server on Azure Virtual Machines in the Microsoft Azure portal
PublishedMar 25

4 min read read

Update on the support of DBCC CLONEDATABASE for production use
PublishedFeb 28

4 min read read

SQL Server Integration Services (SSIS) Change Data Capture Attunity feature deprecations
PublishedApr 26, 2023

3 min read read

Capitalize on your investments with the new centrally managed Azure Hybrid Benefit for SQL Server

Introducing UTF-8 support in SQL Server 2019 preview

Getting started

Related Posts

Explore SQL Server 2022

Getting started

Related Posts

Provision Premium SSD v2 Storage for Microsoft SQL Server on Azure Virtual Machines in the Microsoft Azure portal

Update on the support of DBCC CLONEDATABASE for production use

SQL Server Integration Services (SSIS) Change Data Capture Attunity feature deprecations

Capitalize on your investments with the new centrally managed Azure Hybrid Benefit for SQL Server