For the past decade, data science has become integral to many enterprise applications. Languages such as R and Python have left the realm of data scientists and are being used more frequently by data engineers supporting them. R and Python are the most popular data science languages currently for creating, training, and scoring models. Modernization has also accelerated the use of these languages leveraging the benefits of the cloud to enable in-database processing of machine learning algorithms and models.
In SQL Server 2016 through 2019, we added R and Python language support, which enable secure execution of R and Python programs in the context of a SQL Server query. This enables a wide range of scenarios such as performing advanced text and data preparation tasks, and reaching out to external APIs to get data and also training machine learning models and model scoring.
Previously, we announced a Java extension. Today, we are sharing that we are open sourcing the R and Python language extensions for SQL Server for both Windows and Linux on GitHub.
These extensions are the latest examples using an evolved programming language extensibility architecture which allows integration with a new type of language extension. This new architecture gives customers the freedom to bring their own runtime and execute programs using that runtime in SQL Server, while leveraging the existing security and governance that the SQL Server programming language extensibility architecture provides.
Choosing which runtime to use does provide the flexibility to choose different distributions of R and Python, and as newer versions of the R and Python runtimes get released, this architecture will make it easier to upgrade the R and Python runtime. Enterprises need to have a support contract in place for their R and Python runtime.
Now that support is not an issue, let’s look at what use cases R and Python can enable inside SQL Server. Bringing R and Python workloads closer to the data opens a variety of possibilities:
- Run R and Python scripts to do data preparation and general purpose data processing.
- Train machine learning models in database.
- Deploy your models and scripts into production in stored procedures.
- Furthermore, this will help avoid unnecessary data movement and latency when data must be retrieved from SQL Server and moved into the app tier to do the business logic processing.
- Data Security Model of database logins and roles extend to external scripts.
- Avoid impersonation attempts.
- Prevent the installation of malware.
Why Open Source?
The R and Python language extension leverages the Extensibility Framework API for SQL Server to communicate and exchange data with SQL Server. This API has been publicly documented. The API in combination with the open source code of the R and Python language extension provides an end to end example implementation of how a programming language extension can be built. This makes it easier for additional programming language extensions to be built for SQL Server by the community. What language extensions would you like to see?
Whether you are interested in creating your own language extension or just using the R and Python language extension for SQL Server, here is some more information to get you started.