This post was co-authored by Jeff Shepherd, Deepak Mukunthu, and Vijay Aski.
Recently, we blogged about performing automated machine learning on SQL Server 2019 big data clusters. In today’s post, we will present a complementary automated machine learning approach leveraging Azure Machine Learning service (Azure ML) invoked from SQL Server. While the previous post dealt with a Spark-based implementation tuned for big data, this post presents an approach that runs directly in SQL Server running on a single server. This is well suited for use with data residing in SQL Server tables and provides an ideal solution for any version of SQL Server that supports SQL Server Machine Learning Services.
Azure Machine Learning service
Azure Machine Learning service is a cloud service. We call the service from SQL Server to manage and direct the automated training of machine learning models in SQL Server. Automated machine learning tries a variety of machine learning pipelines. It chooses the pipelines using its own machine learning model based on the scores from previous pipelines. Automated machine learning can be used from SQL Server Machine Learning Services, python environments such as Jupyter notebooks and Azure notebooks, Azure Databricks, and Power BI.
Starting in SQL Server 2017, SQL Server includes the ability to run Python code using the sp_execute_external_script stored procedure. This allows SQL Server to call Azure ML automated machine learning. Instructions and code for running the following example are available on GitHub.
Example: Energy demand forecasting
Automated machine learning can be used for regression (the prediction of continuous values), classification, or forecasting. This example focuses on energy demand forecasting, where the goal of a power grid operator is to predict future energy demand given forecasted weather data. Although the context of our example is energy demand forecasting, the methods used can be applied to many other contexts and use cases.
This example uses an energy demand data set consisting of a table with four columns: timeStamp, demand, precip, and temp. We use three Transact-SQL stored procedures including AutoMLTrain, AutoMLGetMetrics, and AutoMLPredict. The AutoMLTrain procedure returns a model that predicts the label column – in this case, demand – based on the remaining columns that include timeStamp, precip and temp. Here, we run the AutoMLTrain procedure using data in rows with a time stamp before February 1, 2017:
The resulting model is stored in a SQL Server table so that it can be used later for prediction. The training can be viewed in the Azure Portal under Machine Learning service workspaces:
Optional parameters on the AutoMLTrain allow for explicit train/validate split, cross validation, sample weights, number of iterations, exit score, blacklist and whitelist models, forecasting, and time limits.
The AutoMLGetMetrics stored procedure returns several metrics for each pipeline. These can be stored and queried in SQL Server.
Finally, the AutoMLPredict stored procedure can predict new values based on the model returned by AutoMLTrain. Having trained our model using data before February 1, 2017, we will forecast demand for February 1, 2017 and later:
In this blog post, we have seen how Azure ML automated machine learning can be used from within SQL Server to train models and then predict new values. SQL Server developers can now train and use machine learning models without needing to learn Python and without needing detailed knowledge of machine learning.