Whenever I work with the Windows Server “8” storage, I get a huge smile as I think about what customers are going to be able to do and the excellent engineering that went into the features. Whether you’re using a block-based storage area network (SAN) or a file-based solution, we are heavily invested in both types of storage based on your input. In this blog, we are going to focus on our investments in file-based storage. Our teams worked together to innovate up and down the entire storage stack. From the way we flush data, to Storage Spaces, to Resilient File System (ReFS), to improvements in the Server Message Block (SMB) protocol and Cluster Shared Volume (CSV), to the support for SMI-S enabled storage devices, Windows Server “8” fundamentally changes how we think about storage architectures and solutions. It boils down to this – Windows Server “8” minimizes the Capex/Opex of storage– in some cases, dramatically. If you are involved with storage, you need to stop what you are doing and take the time to understand Windows Server “8” storage and rethink how you do things going forward.
In this post, we explore a new scenario enabled by Windows Server “8”: server application storage on file shares. It all started with a simple question, “Why can’t server applications take advantage of our file servers?” That’s when the program managers, developers and testers rolled up their sleeves, and broke the problem down into its parts and designed a comprehensive set of features to make it happen. I encourage you to download the beta and see for yourself.
Claus Joergensen and Jose Barreto, Principal Program Managers in our File Server team, authored this post.
Since the beginning, the Windows file server has primarily been used for storing end user data. Typical business scenarios are the user home share for non-shared data and team shares for collaboration. The Windows Server “8” file server introduces support for server applications, such as Hyper-V™ and Microsoft SQL Server, that can store live data on Windows file shares. For example, a user can configure a Hyper-V virtual machine with its configuration file, VHD files, and snapshot files stored on a Windows file share. The following screenshot shows a virtual machine configured with its VHD file stored on a Windows file share, with the Universal Naming Convention (UNC) path:
Enabling new scenarios
During planning for Windows Server “8”, our customers showed particular interest in the ability to store server application data on Windows file shares. For many small and medium business customers, this is an affordable alternative to SAN infrastructures because they can leverage Ethernet infrastructures and use industry-standard servers. Also, when file shares are created, customers can more easily manage the file shares because they do not need to provision LUNs or configure zones. In addition to being a more affordable alternative and simpler to manage, the ability to store server application data allows larger customers and hosters to gain more flexibility when moving workloads around in the datacenter without having to reconfigure storage. If data is stored on a UNC path, with the right administrative credentials, it can be accessed from anywhere in the datacenter.
With support for this new scenario, all customers have an additional storage option and can choose between Fibre Channel, iSCSI SANs, shared SAS storage arrays, or file shares, depending on their preferences, budgets, and required features.
Enabling server application storage on file shares
To enable server applications to store their live data on file shares, there are two requirements. First, the server role or application needs to support it. This includes updating the application to support UNC paths (\\server\share\file.vhd) in its setup and management tools, as well as fully testing the applications in the use cases in this scenario. In Microsoft SQL Server 2008 R2, there is added support for storing SQL user databases on SMB file shares. Microsoft SQL Server 2012 adds support for the SQL system database, as well as configuring SQL Server as a cluster. As demonstrated at the //BUILD conference, Windows Server “8” has also added support for storing virtual machines files on SMB file shares.
Second, the file server itself needs to support allowing server applications to store their data on file shares. During Windows Server “8” customer planning engagements, we identified the following top-level requirements for the file server to support storing server applications:
– Continuous availability. Server applications expect storage to always be available and in general, do not handle input/output (I/O) errors or unexpected closures of file handles well. These types of events may cause virtual machines to crash because the virtual machine can no longer write to its disk or cause databases to go offline. Customers commonly deploy hardware redundancy, such as multiple network adapters, network switches, and Windows cluster configurations to mitigate hardware outages. While such configurations allow the file server to quickly recover from a failure, the recovery is not transparent to the application and virtual machines must be restarted and databases brought online. The Windows Server “8” file server solution must be able to quickly and transparently recover from network or node failures, with no downtime or administrator intervention required.
– Performance. Some server roles, such as Hyper-V and SQL Server, are very sensitive to storage performance, including bandwidth, latency and I/O per second (IOPS). It is also important to ensure that CPU consumption when accessing storage is kept to a minimum to provide as much CPU time to the application as possible. Finally, server applications tend to have an access pattern that is very different than that of user applications. Where user applications mostly read or write a file in full, server applications tend to append or update existing data. The Windows Server “8” file server solution must be able to deliver storage bandwidth to server applications almost equivalently to that of multiple 10Gbps Ethernet network or Infiniband adapters with latency, IOPS, and CPU consumption rivaling that of Fibre Channel.
– Scalability. The configurations for a Windows file server cluster are often deployed in active-passive configurations, which leaves at least one node unused. A workaround is to configure multiple file server instances in a cluster. This allows you to use all of the hardware in the cluster. However, this requires additional administration and the bandwidth available for a share is still limited to the bandwidth available on the node where it is currently online. The Windows Server “8” file server must be able to support active-active configurations where a share can be accessed through any node, increasing the maximum bandwidth to the aggregate of the cluster nodes and simplifying administration.
– Data protection. Another key ability is creation of application-consistent shadow copies of the data for backup purposes. In Windows, this is usually accomplished using the Volume Shadow Copy Service (VSS) infrastructure. VSS, in its current form, only supports local storage. The Windows “8” file server solution must be able to support application consistent shadow copies through full integration with VSS and with minimal impact on existing VSS requestors, writers, and providers.
As you can see, this is quite a demanding list of requirements. However, we agreed that we needed to address all of them to provide a reliable, available, and serviceable file server with great performance for server application storage.
Supporting server application storage on file shares in Windows Server “8” was a major decision for the product team. Several features were introduced specifically to make sure file storage could meet or exceed the requirements commonly applied to block storage, without losing file storage’s inherit benefits in ease of management and cost effectiveness. This also required the introduction of a new version of SMB, which is Window’s main remote file protocol. These new capabilities include:
SMB Transparent Failover: Enables administrators to perform hardware or software maintenance of nodes in a clustered file server without interrupting server applications storing data on these file shares. Also, if a hardware or software failure occurs on a cluster node, this feature enables SMB clients to transparently reconnect to another cluster node without interrupting server applications that are storing data on these file shares. This is achieved regardless of the type of operation that is under way when the failure occurs. For block-based storage, this is the equivalent of having a multi-controller storage array.
SMB Multichannel: Enables you to simultaneously use multiple connections and network interfaces with two main benefits: increased throughput and fault tolerance. For instance, if you have four 10GbE interfaces on both the SMB client and server, you can simultaneously use them to effectively achieve 40Gbps throughput from the four 10Gbps network adapters. In the event that one of the network adapters or cables fails, your SMB client will continue to use the network uninterrupted, at a lower throughput. Best of all, this is achieved without additional configuration steps. You only need to configure the multiple network interfaces as you normally would.
SMB Direct: One of the main advantages of Fibre Channel block storage is the ability to have low latency and fast, offloaded data transfers. To match that in the file server world, SMB introduces support for network adapters that have RDMA capability and can function at full speed with very low latency, while using very little CPU. When using one of three RDMA technologies (Infiniband, iWARP or RoCE), the SMB client has a low CPU overhead, which is comparable to Fibre Channel, and saves CPU cycles for the main workload on the box, such as Hyper-V or SQL Server. Best of all, these network interfaces are detected and function without requiring additional SMB configuration steps. If RDMA interfaces are available, they will be automatically used.
SMB Scale-Out: Taking advantage of Cluster Shared Volume (CSV) version 2, administrators can create file shares that provide simultaneous access to data files, with direct I/O, through all nodes in a file server cluster. This means that the maximum file serving capacity for a given share is no longer limited by the capacity of a single cluster node, but rather the aggregate bandwidth across the cluster. Also, this active-active configuration lets you balance the load across cluster nodes by moving file server clients without any service interruption. Finally, SMB Scale-Out simplifies the management of clustered file servers and file shares.
VSS for SMB File Shares: The ability to create application-consistent snapshots of the server application data is critical to backing up the data. In Windows, this is accomplished using the Volume Shadow Copy Service (VSS) infrastructure. VSS for SMB file shares extends the VSS infrastructure to perform application-consistent shadow copies of data stored on remote SMB file shares for backup and restore purposes. In addition, VSS for SMB file shares enable backup applications to read the backup data directly from a shadow copy file share rather than involving the server application computer during the data transfer. Because this feature leverages the existing VSS infrastructure, it is easy to integrate with existing VSS-aware backup software and VSS-aware applications, such as Hyper-V.
SMB-specific Windows PowerShell cmdlets: Managing file shares is now accomplished using either the new Windows Server Manager GUI supporting file server clusters, which includes several profiles for creating SMB shares, or using the all new SMB Windows PowerShell cmdlets, which use the familiar Windows PowerShell infrastructure for command-line and scripting. This complete new set of Windows PowerShell version 3 cmdlets was created to manage file shares, file share permissions, client mappings, server configuration, and client configuration. There is also an extensive set of cmdlets to monitor sessions, open files, connections, network interfaces, and multichannel connections. These cmdlets are built upon a standards-based management protocol using WMIv2 classes that allow developers, on Windows and Linux, to create automated solutions for file server configuration and monitoring.
SMB Performance Counters: In the application server world, storage performance is paramount, as is the ability to measure it. With that in mind, Windows Server “8” includes server and client performance counters that allow administrators to easily look into the key metrics for file storage, including IOPs, latency, queue depth, and throughput. These counters match the familiar block storage performance counters, making it simple to leverage your existing skills and guidance for storage performance for Windows Server.
Performance: Performance was also a key area of focus in SMB. In addition to making the large maximum transmission unit (large MTU) enabled by default, there was a significant amount of work to optimize performance for different kinds of workloads, covering both small and large I/O, and both sequential and random access. These optimizations were developed while investigating typical end-to-end workloads, such as online transaction processing, data warehousing, virtual web servers in a private cloud, virtual desktop infrastructure, and consolidated home folders. These investigations led to specific improvements in many areas of the operating system.
Let us take a closer look at SMB Transparent Failover. SMB Transparent Failover requires:
- A failover cluster running Windows Server “8” with at least two cluster nodes and configured with the file server role. The cluster must pass the cluster validation tests in “Validate a Configuration Wizard”.
- File shares created with the continuous availability property, which is the default setting for clustered file shares.
- Computers accessing the clustered file shares must be running Windows “8” Consumer Preview or Windows Server “8”.
When the SMB client initially connects to the file share, the client determines whether the file share has the continuous availability property set. If it does, this means the file share is a clustered file share and supports SMB transparent failover. When the SMB client subsequently opens a file on the file share on behalf of the application, it requests a persistent file handle. When the SMB server receives a request to open a file with a persistent handle, the SMB server interacts with the Resume Key filter to persist sufficient information about the file handle, along with a unique key (resume key) supplied by the SMB client, to stable storage. The SMB client uses the resume key to reference the handle during a resume operation after a failover. To protect against data loss from writing data into an unstable cache, persistent file handles are always opened with write through.
If a failure occurs on the file server cluster node to which the SMB client is connected, the SMB client attempts to reconnect to another file server cluster node. Once the SMB client successfully reconnects to another node in the cluster, the SMB client starts the resume operation using the resume key. When the SMB server receives the resume key, it interacts with the Resume Key filter to recover the handle state to the same state it was prior to the failure with end-to-end support (SMB client, SMB server and Resume Key filter) for operations that can be replayed, as well as operations that cannot be replayed. The application running on the SMB client computer does not experience any failures or errors during this operation. From an application perspective, it appears the I/O operations are stalled for a small amount of time.
It is very important to keep the number of I/O stalls during a failover to a minimum. Since SMB is sitting on top of TCP/IP, the SMB client would normally rely on TCP timeout to determine if the file server cluster node has failed. However, relying on TCP timeouts can lead to fairly long I/O stalls, since each timeout is typically ~20 seconds. SMB Witness was created to enable faster recovery from unplanned failures, allowing the SMB client to not have to wait for a TCP timeout. SMB Witness is a new service that is installed automatically with the failover clustering feature. When the SMB client initially connects to a file server cluster node, the SMB client notifies the SMB Witness client, which is running on the same computer. The SMB Witness client obtains a list of cluster members from the SMB Witness service running on the file server cluster node. The SMB Witness client picks a different cluster member and issues a registration request to the SMB Witness service on that cluster member.
If an unplanned failure occurs on the file server cluster node, the SMB Witness service on the other cluster member receives a notification from the cluster service. The SMB Witness service also notifies the SMB Witness client, which in turns notifies the SMB client that the file server cluster node has failed. Upon receiving the SMB Witness notification, the SMB client immediately starts reconnecting to a different file server cluster node, which significantly speeds up recovery from unplanned failures.
When planning Windows Server “8”, from an end-to-end perspective, the two main areas of focus for file storage for server applications are Hyper-V over SMB and SQL Server over SMB.
For example, when using Hyper-V, SMB file storage is now fully supported for both standalone and clustered configurations of Hyper-V. In fact, it is now possible to configure a Hyper-V cluster using only file shares as the shared storage.
For the file server configuration, there are three main modes for deployment:
Even though, the single-node or standalone file server is not highly available, it provides the most inexpensive file server solution. Hyper-V includes added flexibility of shared storage and allows you to cluster the Hyper-V nodes using this solution (although the overall solution should not be considered highly available). When compared to block storage, this is the equivalent of a single-controller block storage array.
The dual-node file server is expected to be the most common file server configuration, providing continuous availability (via SMB transparent failover) at a low cost. Using shared Serial Attached SCSI (SAS) storage (just-a-bunch-of-disks [JBOD] s or a SAS-based Storage Array), this solution can scale to a few hundred disks. Paired with a few Hyper-V servers and network switches, this solution could be used to create a rack-sized building block for a private cloud solution. This would be the equivalent of a dual-controller block storage array.
The third option, with a larger number of file servers using Fibre Channel as shared storage, allows for larger configurations. This file server cluster can leverage features like SMB Scale-Out and SMB Direct to create a shared storage infrastructure to serve dozens, if not hundreds, of Hyper-V nodes. There are significant savings in limiting the Fibre Channel connectivity to the file server nodes when using a 10GbE or InfiniBand to connect the Hyper-V computer nodes to the file servers.
We are very excited by the new opportunities for file server storage with server applications. In fact, we have very positive feedback early adopters, both internally and externally, who have agreed to deploy these scenarios and features. They share our enthusiasm for the ease of use, manageability, scalability, and performance of the solution.
If you would like to learn more, we have a series of presentations that were delivered during the //Build/ conference in September, 2011. These presentations are available here, and they provide hours of additional details recorded in video. We encourage you to review them. Here is a list of the main sessions on Windows Server “8” related to this blog post:
- Keynote #2 – Building Continuous Services
- 973 – Windows Server 8
- 443 – Business and partnering opportunities: Windows Server 8 continuous availability
- 474 – Platform storage evolved
- 446 – Designing systems for continuous availability and scalability
- 450 – Designing systems for continuous availability – multi-node with block storage
- 444 – Designing systems for continuous availability – multi-node with remote file storage
- 451- Building continuously available systems with Hyper-V
- 449 – Building continuously available file server NAS appliances
I hope you enjoy watching them as much as we enjoyed putting them together…
Claus Joergensen and Jose Barreto, Principal Program Managers in the Windows File Server team.