This post was authored by Liang Yang, Principal Performance Engineer on the Hyper-V team and Jos de Bruijn, Senior Program Manager on the SQL Server team.
With Windows Server 2016, Microsoft has significantly bumped up the Hyper-V Virtual Machine (VM) scale limit to embrace new scenarios such as running e-commerce large in-memory databases for Online Transaction Processing (OLTP) and Data Warehousing (DW) purposes. In this post, we highlight the performance of in-memory transaction processing at scale using SQL Server 2016 running in a Windows Server 2016 Hyper-V VM.
The result: 343,000 transactions per second using an order processing workload with a 4 terabyte in-memory database, running in a VM with 5.5 terabytes of memory and 128 virtual processors. The performance in the VM was >95% of the performance on a physical server.
Below, we first present our findings, comparing the performance of a physical host versus a Hyper-V large-scale VM. We then recap the increased scalability for VMs with Hyper-V in Windows Server 2016, in-memory transaction processing support in SQL Server 2016, and the benchmark workload we used to obtain these performance numbers. Finally, we detail the hardware and software setup used in the experiment.
SQL Server In-Memory OLTP Performance Comparison
We used BenchCraft, an internal Microsoft tool, to run an OLTP (Online Transaction Processing) workload against the in-memory database. The in-memory database size is approximately 4TB with durable, fully logged tables – some of those tables have over 5 billion rows. We achieved over 340K OLTP Transactions per Second (TPS) when running the OLTP workload in a single large-scale Hyper-V VM. Compared with 358K TPS performance with the physical host, Hyper-V VM delivers ~95% of what a physical host can do.
|Hyper-V Large-Scale VM
|TPS (Transactions per Second)
|RT (Average Response Time)
|80% over 144 LPs
|95% over 128 VPs
Table 1. In-Memory Processing Performance Numbers Comparison
Hyper-V Large-Scale VM Support in Windows Server 2016
Windows Server 2016 increases the scalability for Hyper-V virtual machines (VMs), with support for up to 240 virtual processors and 12 terabytes of memory. Below is the comparison between Windows Server 2012/2012 R2 and Windows Server 2016 for Hyper-V scalability on a per VM basis:
Windows Server 2012/2012 R2
Standard & Datacenter
Windows Server 2016
Standard & Datacenter
|Up to 4 TB per physical server
|Up to 24 TB per physical server (6x)
|Physical (Host) Logical Processor Support
|Up to 320 LPs
|Up to 512 LPs (1.6x)
|Virtual Machine Memory Support
|Up to 1 TB per VM
|Up to 12 TB per VM (12x)
|Virtual Machine Virtual Processor Support
|Up to 64 VPs per VM
|Up to 240 VPs per VM (3.75x)
Table 2. Hyper-V Scalability Limit Comparison
Note: to use the new Hyper-V large scale VMs, the follow conditions must be met:
- Both host and VM guest OS must be Windows Server 2016 or newer.
- VM type must be generation 2 and VM version must be 8.0 or newer.
SQL Server 2016 In-Memory OLTP
In-Memory OLTP is the premier performance technology of SQL Server which significantly increases performance of transactional workloads, compared with traditional (disk-based) database technology. It was first released as part of SQL Server 2014, with further usability and performance improvements in SQL Server 2016 and in Azure SQL Database.
With the memory-optimized tables and natively compiled stored procedures introduced by In-Memory OLTP, customers achieve up to 30x (30-fold) performance improvement compared with traditional tables and stored procedures. Data storage and data access in memory-optimized tables is optimized around the data being memory-resident, and the Transact-SQL logic in natively compiled stored procedures is compiled to machine code. These efficiencies result in very low latency of user transactions, which results in short response times and allows a single CPU core to execute that many more transactions per second. In addition, transaction processing for memory-optimized tables is completely latch- and lock-free – even spinlocks are avoided completely. This allows all concurrent transactions in the system to execute without interference, resulting in linear scaling of transaction throughput with the number of CPU cores in the server.
For more details about In-Memory OLTP see In-Memory OLTP (In-Memory Optimization).
Order Processing Workload
The benchmark for the performance experiments represents an example order processing and fulfillment workload. It is described on the SQL Server Samples GitHub repository: Order Processing Benchmark using In-Memory OLTP. Both experiments use a 4 terabyte version of the database as a starting point of the run. Below is the transaction breakdown in details for the workload being used.
|Select customer info, orders, and order lines for a given customer.
|Select top 10 products of a given type, ordered by price.
|Select all products in a given ID range, ordered by price.
|Select top 20 products in a given ID range with the highest computed “closeness” factor against the PurchaseCriteria.
|Insert a new order for a given customer with up to five order lines.
|Fulfill 10 oldest outstanding orders.
Table 3. OLTP In-Memory Workload Characteristics
We used the same client setup for both experiments. We used 26 workload drivers, with 800 threads per driver, for a total of 20,800 client connections. SQL server and clients are running on separate but two identical machines using dedicated network (100GbE x 4) to connect them together.
To compare the performance of the benchmark between physical and virtual machines, we used the same physical hardware. For the first experiment, we ran the workload with SQL Server 2016 running on the physical machine. For the second experiment we created a Hyper-V VM on the same physical machine, and ran the workload with SQL Server running in the VM.
Find below the configuration for the physical machines used in the experiments, as well as the Hyper-V virtual machine (VM). One physical machine was used for all the workload drivers, while the other machine was used to run SQL Server in the first experiment, and was used as Hyper-V host in the second experiment. Four dedicated 100GbE network links are used to connect the two physical machines, for both experiments.
Physical Machine Configuration:
- Operating System: Windows Server 2016 Datacenter
- Physical Machine Settings: We used two HPE ProLiant DL580 Gen9 servers which are four socket enterprise x86 units offering compelling performance and virtualization efficiencies. Each system was configured with 6TB RAM (64GB LRDIMM x 96), 144 logical processors (4 Intel Xeon E7-8890 v3 2.50GHZ 18-core CPUs with Hyper-Threading enabled) and 9 PCIe Gen3 cards (5 for storage and 4 for network).
- Notes: 1. Maximum(High) performance is used as the Power Profile in both machine BIOS and OS settings; 2. SR-IOV support is enabled in machine BIOS as well.
- Physical Machine Storage Settings: Each HPE ProLiant DL580 Gen9 server is configured with five PCIe Gen3 x8 6.4TB HHHL NVMe cards. Each of these NVMe can deliver up to 6000MB/s for sequential reads and up to 2000MB/s for sequential writes.
- Note: OS system drive is sitting on a RAID managed by a separate 12G SAS RAID controller.
- Physical Machine Network Settings: Each HPE ProLiant Gen9 server is configured with four InfiniBand EDR/Ethernet 100Gb 1-port network adapters which are based on Mellanox ConnectX-4 technology for scalability and fabric flexibility.
- Notes: 1. SR-IOV support is enabled; 2. Jumbo packet size support is increased from default 1514 to 9014 to better support the workloads driven by the BenchCraft;
Hyper-V VM Configuration:
- Operating System: Windows Server 2016 Datacenter.
- Virtual Machine Settings:
- Type and version: generation 2 and version 8.0
- Virtual Machine Memory: 5.5TB RAM
- Number of virtual processors(VP): 128. Note: we assigned 128 out of 144 CPUs to the VM so we can reserve some CPU cycles for the host to avoid possible contention due to overlapping between the host root VP and VM VP under CPU intensive workloads. To help better scale, vNUMA(4 nodes) is also enabled to align with the underlying physical NUMA settings.
- Number of virtual SCSI controllers: 4
- Note: for best performance reasons, we used separate virtual SCSI controllers for VM OS drive(x1), SQL Server In-Memory OLTP checkpoint files(x2) and transactional log files(x1).
- Virtual Machine Storage Settings: For performance reasons, we attached five NVMe SSDs to the VM directly as passthru disks which allows us to minimize overhead associated with the file based virtualized storage path as well as file system overhead incurred on the host.
- Note: 1. To achieve maximum parallelism purpose, four passthru disks within VM are dedicated to host SQL Server In-Memory OLTP checkpoint files (16 in total and 4 per SSD); 2. SQL Server transactional log files are sitting on the 5th passthru disk to avoid resource contention with checkpoint file I/Os.
- Virtual Machine Network Settings: For performance reasons, four 100GbE network adapters were attached to the VM directly via SR-IOV to further drive down the network latency.
- Note: 1. RSS (Receive Side Scaling) is enabled and number of receive queues is set to 16; 2. Network adapter profile is set to NUMA Static.
The new scale limits for Windows Server 2016 Hyper-V allow large scale VMs to host all sizes of in-memory database management systems on the Cloud and provide great performance for in-memory transaction processing using SQL Server 2016.
To get more information on the benefits of running SQL Server 2016 on Windows Server 2016: