Speeding up Transactions with In-Memory OLTP in SQL Server 2016 and Azure SQL Database

In-Memory OLTP, a performance feature in SQL Server since 2014, allows you to speed up transactional workloads up to 30X. SQL Server 2016 and Azure SQL DB have In-Memory OLTP in public preview, and contain many improvements to In-Memory OLTP, making it drastically easier to adopt in an existing application, and allowing many other applications to benefit from the performance improvements.

Besides speeding up transactional workloads (up to 30X), In-Memory OLTP can also be beneficial in a number of other ways, such as ETL and transient data (think temp table) scenarios. For analytics query workloads, we recommend using In-Memory Analytics (Columnstore). For mixed workloads, we recommend combining the technologies to gain real-time insight based on operational data.

In-Memory OLTP can also potentially simplify your application architecture. For scaled-out implementations, you can typically reduce the number of shards, or potentially even consolidate all shards into one machine with SQL Server 2016. With SQL Azure DB, you can improve performance without increasing the database service tier. For example, one customer was able to consolidate a deployment of 18 servers into one server using In-Memory OLTP, resulting in cost savings of up to $100K per year.

If you are currently using a caching layer in your architecture because the database could not keep up with the user requests – either there was not enough CPU, or there were locking/blocking issues – you can potentially eliminate this caching layer by using In-Memory OLTP, which uses less CPU to do the same work, and removes locking/blocking issues through its lock-free architecture.

Performance benefits

The main performance benefits of In-Memory OLTP derive from more efficient CPU utilization and the removal of any locks/latches. Memory-optimized data access and native compilation reduce the cost of data and transaction processing, allowing a higher throughput using the same set of resources – a given CPU can process more transactions per second. The lock-free transaction processing engine allows scaling linearly with the number of cores in the system, meaning transactions executing concurrently in the system performing data access do not get in each other’s way, so you can get the most out of the cores you have in the system. Lock-free in this case means doing away with any type of locks, including latches and spinlocks. See the recent post, Concurrency Control in the In-Memory OLTP Engine for more details.

Now, what does all that mean in terms of actual performance? The following numbers are based on using pre-release bits of SQL Server 2016.

In the lab using a modern four-socket server in the ASP.net session state scenario, we have seen a sustained throughput of 1.2 million batch requests per second, with four kilobytes per request. That is 4.8 gigabytes of data processed per second, on a single server! This workload is purely in-memory using SCHEMA_ONLY tables, which are not logged and not persisted to disk. For durable tables, which are fully logged, using an order processing workload similar to some of the classical database benchmarks, we are seeing a sustained throughput of close to one gigabyte per second in log generation, again using a four-socket server.

Even for lower-end machines you can expect solid performance improvement from In-Memory OLTP. On my modest consumer-grade laptop that has two physical cores, I can achieve 17,000 transactions per second with the sample workload for In-Memory OLTP using AdventureWorks2016CTP3 — a 10X performance improvement compared with traditional tables. To run this sample workload on your own hardware, download the AdventureWorks2016CTP3 database and sample scripts and follow the instructions in the zip file.

Does this mean you can expect performance benefits for every workload? Well, no. For analytics queries, which scan and/or do aggregation on large data sets, you’ll want to use Columnstore indexes. You can actually add a Columnstore index to your memory-optimized table! For transactional workloads, how much gain you will see from using In-Memory OLTP in your application largely depends on where the performance bottleneck is. In-Memory OLTP really optimizes the time spent in data processing within the database engine. If, for a given user request, most of the time is being spent in the mid-tier app, or if most of the time of a given transaction is being spent waiting for the log IO device, you may not see as much performance gain. Good indicators of whether In-Memory OLTP is going to benefit a transactional workload are high CPU utilization and/or a lot of lock and latch waits.

Relieving tempdb Contention and Improving Scalar UDF performance

The performance numbers quoted above are for workloads using In-Memory OLTP’s memory-optimized tables and natively compiled stored procedures at their core. But you don’t have to use In-Memory OLTP for all of your workloads; that is the beauty of the deep integration of In-Memory OLTP in the SQL Engine. Here are a couple of thoughts on how you could use In-Memory OLTP to relieve tempdb contention and improve scalar UDF performance in your existing app with minimal changes:

  • Use SCHEMA_ONLY memory-optimized tables to replace temp tables in your app. The advantages are that tempdb is never touched, and the schema of the table is persisted, meaning you can always rely on this table being available. But do keep in mind that you’ll want to create the table before running the workload; In-Memory OLTP is optimized for runtime, not create time.
  • Use memory-optimized table types to replace traditional table types in your app. Memory-optimized types can be used in any place you use traditional table types, meaning you will no longer need to touch tempdb for these variables since memory-optimized table variables live purely in the memory space of the user database Plus, you benefit from the more efficient memory-optimized data access.
  • Use natively compiled scalar User-Defined Functions to replace traditional scalar UDFs that do not perform data access. Native compilation significantly reduces the runtime of the UDF. Native UDFs cannot access traditional disk-based tables; this is not an issue if the UDF does not perform data access; if it does, consider migrating these tables to memory-optimized.

Getting started

Here are a few ways you can get started with In-Memory OLTP:

See the other posts in the SQL Server 2016 blogging series.

Try SQL Server 2016 RCLearn more