SQL Server as a Machine Learning Model Management System

October 17, 2016

Product
SQL Server 2016

This post was authored by Rimma Nehme, Technical Assistant, Data Group

Machine Learning Model Management

If you are a data scientist, business analyst or a machine learning engineer, you need model management – a system that manages and orchestrates the entire lifecycle of your learning model. Analytical models must be trained, compared and monitored before deploying into production, requiring many steps to take place in order to operationalize a model’s lifecycle. There isn’t a better tool for that than SQL Server!

SQL Server as an ML Model Management System

In this blog, I will describe how SQL Server can enable you to automate, simplify and accelerate machine learning model management at scale – from build, train, test and deploy all the way to monitor, retrain and redeploy or retire. SQL Server treats models just like data – storing them as serialized varbinary objects. As a result, it is pretty agnostic to the analytics engines that were used to build models, thus making it a pretty good model management tool for not only R models (because R is now built-in into SQL Server 2016) but for other runtimes as well.

SELECT * FROM [dbo].[models]

Figure 1: Machine Learning model is just like data inside SQL Server.

SQL Server approach to machine learning model management is an elegant solution. While there are existing tools that provide some capabilities for managing models and deployment, using SQL Server keeps the models “close” to data, thus leveraging all the capabilities of a Management System for Data to be now nearly seamlessly transferrable to machine learning models (see Figure 2). This can help simplify the process of managing models tremendously resulting in faster delivery and more accurate business insights.

Figure 2: Pushing machine learning models inside SQL Server 2016 (on the right), you get throughput, parallelism, security, reliability, compliance certifications and manageability, all in one. It’s a big win for data scientists and developers – you don’t have to build the management layer separately. Furthermore, just like data in databases can be shared across multiple applications, you can now share the predictive models. Models and intelligence become “yet another type of data”, managed by the SQL Server 2016.

Why Machine Learning Model Management?

Today there is no easy way to monitor, retrain and redeploy machine learning models in a systematic way. In general, data scientists collect the data they are interested in, prepare and stage the data, apply different machine learning techniques to find a best-of-class model, and continually tweak the parameters of the algorithm to refine the outcomes. Automating and operationalizing this process is difficult. For example, a data scientist must code the model, select parameters and a runtime environment, train the model on batch data, and monitor the process to troubleshoot errors that might occur. This process is repeated iteratively on different parameters and machine learning algorithms, and after comparing the models on accuracy and performance, the model can then be deployed.

Currently, there is no standard method for comparing, sharing or viewing models created by other data scientists, which results in siloed analytics work. Without a way to view models created by others, data scientists leverage their own private library of machine learning algorithms and datasets for their use cases. As models are built and trained by many data scientists, the same algorithms may be used to build similar models, particularly if a certain set of algorithms is common for a business’s use cases. Over time, models begin to sprawl and duplicate unnecessarily, making it more difficult to establish a centralized library.

Figure 3: Why SQL Server 2016 for machine learning model management.

In light of these challenges, there is an opportunity to improve model management.

Why SQL Server 2016 for ML Model Management?

There are many benefits to using SQL Server for model management. Specifically, you can use SQL Server 2016 for the following:

Model Store and Trained Model Store: SQL Server can efficiently store a table of “pre-baked” models of commonly used machine learning algorithms that can be trained on various datasets (already present in the database), as well as trained models for deployment against a live stream for real-time data.
Monitoring service and Model Metadata Store: SQL Server can provide a service that monitors the status of the machine learning model during its execution on the runtime environment for the user, as well as any metadata about its execution that is then stored for the user.
Templated Model Interfaces: SQL Server can store interfaces that abstract the complexity of machine learning algorithms, allowing users to specify the inputs and outputs for the model.
Runtime Verification (for External Runtimes): SQL Server can provide a runtime verification mechanism using a stored procedure to determine which runtime environments can support a model prior to execution, helping to enable faster iterations for model training.
Deployment and Scheduler: Using SQL Server’s trigger mechanism, automatic scheduling and an extended stored procedure you can perform automatic training, deployment and scheduling of models on runtime environments, obviating the need to operate the runtime environments during the modeling process.

Here is the list of specific capabilities that makes the above possible:

ML Model Performance:

Fast training and scoring of models using operational analytics (in-memory OLTP and in-memory columnstore).
Monitor and optimize model performance via Query store and DMVs. Query store is like a “black box” recorder on an airplane. It records how queries have executed and simplifies performance troubleshooting by enabling you to quickly find performance differences caused by changes in query plans. The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review. It separates data by time windows, allowing you to see database usage patterns and understand when query plan changes happened on the server.
Hierarchical model metadata (that is easily updateable) using native JSON support: Expanded support for un-structured JSON data inside SQL Server enables you to store properties of your models using JSON format. Then you can process JSON data just like any other data inside SQL. It enables you to organize collections of your model properties, establish relationships between them, combine strongly-typed scalar columns stored in tables with flexible key/value pairs stored in JSON columns, and query both scalar and JSON values in one or multiple tables using full Transact-SQL. You can store JSON in In-memory or Temporal tables, you can apply Row-Level Security predicates on JSON text, and so on.
Temporal support for models: SQL Server 2016’s temporal tables can be used for keeping track of the state of models at any specific point in time. Using temporal tables in SQL Server you can: (a) understand model usage trends over time, (b) track model changes over time, (c) audit all changes to models, (d) recover from accidental model changes and application errors.

ML Model Security and Compliance:

Sensitive model encryption via Always Encrypted: Always Encrypted can protect model at rest and in motion by requiring the use of an Always Encrypted driver when client applications to communicate with the database and transfer data in an encrypted state.
Transparent Data Encryption (TDE) for models. TDE is the primary SQL Server encryption option. TDE enables you to encrypt an entire database that may store machine learning models. Backups for databases that use TDE are also encrypted. TDE protects the data at rest and is completely transparent to the application and requires no coding changes to implement.
Row-Level Security enables you to protect the model in a table row-by-row, so a particular user can only see the models (rows) to which they are granted access.
Dynamic model (data) masking obfuscates a portion of the model data to anyone unauthorized to view it. Return masked data to non-privileged users (e.g. credit card numbers).
Change model capture can be used to capture insert, update, and delete activity applied to models stored in tables in SQL Server, and to make the details of the changes available in an easily consumed relational format. The change tables used by change data capture contain columns that mirror the column structure of a tracked source table, along with the metadata needed to understand the changes that have occurred.
Enhanced model auditing. Auditing is an important mechanism for many organizations to serve as a checks and balances. In SQL Server 2016 are there any new Auditing features to support model auditing. You can implement user-defined audit, audit filtering and audit resilience.

ML Model Availability:

AlwaysOn for model availability and champion-challenger. An availability group in SQL Server supports a failover environment. An availability group supports a set of primary databases and one to eight sets of corresponding secondary databases. Secondary databases are not backups. In addition, you can have automatic failover based on DB health. One interesting thing about availability groups in SQL Server with readable secondaries is that they enable “champion-challenger” model setup. The champion model runs on a primary, whereas challenger models are scoring and being monitored on the secondaries for accuracy (without having any impact on the performance of the transactional database). Whenever a new champion model emerges, it’s easy to enable it on the primary.

ML Model Scalability

Enhanced model caching can facilitate model scalability and high performance. SQL Server enables caching with automatic, multiple TempDB files per instance in multi-core environments.

In summary, SQL Server delivers the top-notch data management with performance, security, availability, and scalability built into the solution. Because SQL Server is designed to meet security standards, it has minimal total surface area and database software that is inherently more secure. Enhanced security, combined with built-in, easy-to-use tools and controlled model access can help organizations meet strict compliance policies. Integrated high availability solutions enable faster failover and more reliable backups – and they are easier to configure, maintain, and monitor, which helps organizations reduce the total cost of model management (TCMM). In addition, SQL Server supports complex data types and non-traditional data sources, and it handles them with the same attention – so data scientist can focus on improving the model quality and outsource all of the model management to SQL Server.

Conclusion

Using SQL Server 2016 you can do model management with ease. SQL Server is unique from other machine learning model management tools, because it is a database engine, and is optimized for data management. The key insight here is that “models are just like data” to an engine like SQL Server, and as such we can leverage most of the mission-critical features of data management built into SQL Server for machine learning models. Using SQL Server for ML model management, an organization can create an ecosystem for harvesting analytical models, enabling data scientists and business analysts to discover the best models and promote them for use. As companies rely more heavily on data analytics and machine learning, the ability to manage, train, deploy and share models that turn analytics into action-oriented outcomes is essential.

@rimmanehme