Data science, machine learning, and analytics have re-defined how we look at the world. The R community plays a vital role in that transformation and the R language continues to be the de-facto choice for statistical computing, data analysis, and many machine learning scenarios.
The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance. SparkR, sparklyr, and PySpark are also available as part of SQL Server Big Data Clusters. We remain committed to R.
With that said, much has changed in the world of data science and analytics since 2016. Microsoft’s approach to open-source software has undergone a similar transformation in the same period. It is therefore time for us to share how we, in Azure SQL and SQL Server, are changing to meet the needs of our users and the R community moving forward.
Today we are making the following announcements to clearly state our direction and intent for R within Azure SQL and SQL Server.
Microsoft R-Open (MRO) will be phased out in favor of the official CRAN distribution
Microsoft R-Open (MRO) is Microsoft’s distribution of R. Azure SQL and SQL Server products and services will begin to phase out the use of MRO in favor of the CRAN distribution of R. The next release of SQL Server will use the CRAN distribution of R for SQL ML Services. Similarly, Azure SQL Managed Instance, SparkR, and sparklyr in SQL Server Big Data Clusters will also be upgraded to the CRAN distribution of R in a future update.
Version 4.0.2 will be the last release of the MRO runtime. Customers using MRO as their R runtime should transition to the CRAN distribution of R. It is important to note that the CRAN Time Machine will be unaffected by this change and will continue to be supported.
Microsoft R and Python packages will be released as open-source and supported on the CRAN distribution of R
As part of the acquisition of Revolution Analytics, Microsoft acquired proprietary technology for running models with unprecedented performance and scalability. We will open source the RevoScaleR and revoscalepy packages, making them freely available under the MIT license. The Python packages will be made available in PyPi while the R packages will be ported from MRO to the CRAN distribution of R. Packages will be maintained by Microsoft as new versions of Python and the CRAN R runtime are released. To simplify the installation and creation of development environments, these R packages will be built and published into the Azure R universe.
ML Server retired, moving forward with SQL Server ML Services and SparkR/sparklyr in SQL Server Big Data Clusters
Version 9.4.7 will be the last release of the Microsoft Machine Learning Server and the product will be retired, effective July 1, 2022. The operationalization of machine learning models using MRO in the app pool of SQL Server Big Data Clusters will also be retired. The next version of SQL Server will not include the Machine Learning Server (Standalone) role as part of the setup experience. However, SQL Machine Learning Services will not be impacted by this change.
Existing customers of Machine Learning Server will be able to access the software for the next 12 months via the volume licensing download site. Support for Machine Learning Server has been extended by a period of 12 months from the date of this announcement, providing existing customers with the opportunity to migrate while still having a supported platform by Microsoft. Customers using Machine Learning Server as a development environment for R programming will be able to create their own environments with the Revo packages once they have been open sourced.
Moving forward, Azure SQL and SQL Server machine learning product investments will focus on SQL Machine Learning Services and the T-SQL language surface area. We encourage you to explore programming in R using Azure SQL Managed Instance or SQL Server 2019 with SQL Machine Learning Services. Enterprise R scoring scenarios such as linear regression, logistic regression, or boosted decision trees using the T-SQL PREDICT function provide market-leading performance and scalability that is unmatched in the industry. For native scoring, this is the recommended path forward.
Customers using R for machine learning training scenarios should also explore sp_execute_external_script as it provides greater flexibility, letting you run any valid R code without having to take the data out of your database.
“I am delighted that next version of SQL Server and Azure SQL will introduce new machine learning and advanced analytics capabilities based on the CRAN distribution of R, its original home. My customers rely on the amazing performance and the ease of use of the T-SQL PREDICT statement. Being able to do nanosecond-scale predictions, using billions of rows per second, is a superb feature of SQL Server! However, we also like the performance and efficiency of running more traditional R and Python code within the database thanks to the rest of RevoScale technology, like the very convenient RxSqlServerData. We have always wanted to have that functionality available in the open-source R, without having to rely on some infrequent Microsoft distribution of it. It is great that this will be possible in the upcoming version. I think it will make the combination of R and both Azure SQL and SQL Server even more powerful, easier to use, and more popular.”—Rafal Lukawiecki, Data Scientist at Project Botticelli
However, Azure SQL and SQL Server are by no means your only options. SQL Server Big Data Clusters includes SparkR and sparklyr, providing a gateway for R users into the world of Apache Spark. With the Microsoft Spark SQL connector customers also have a high-performance bulk loading interface to read and write data to Azure SQL and SQL Server databases from Apache Spark. For distributed processing of R we therefore recommend using SparkR and sparklyr in SQL Server Big Data Clusters.
Finally, Microsoft Azure Machine Learning, Microsoft’s flagship machine learning PaaS platform, is now also available on Azure Arc to power your hybrid and multi-cloud machine learning topologies. You can learn about the public preview of Azure Arc-enabled machine learning from the recent product announcement. Plus, check out the product documentation for Azure Arc-enabled machine learning.
Looking to the future with the R community
Microsoft provides several great machine learning platforms. We also understand that the future and strength of R is in its open-source community. The Azure SQL and SQL Server teams are embracing that spirit of openness by sharing our direction and intent with you. We are at the beginning of this journey, and we look forward to serving the needs of the entire R community moving forward.