Servers can be expensive so we spend a lot of time and effort figuring out how to get the best performance out of them. Windows Server 2012 had quite a few features to increase the performance and scale of servers. There are things that we can do and then there are things you can do. In today’s post Ahmed Talat, a Senior PM lead in the Windows Performance team talks about a great tool that you can use to understand what is going on with your servers and how to tune them to get maximal performance out of them.
Starting with Windows Server 2008, we have published a server tuning guide each release designed to help system administrators and IT professionals get the best performance out of their server deployments. For Windows Server 2012 we published the Windows Server 2012 Tuning Guide, but this time there is a twist. This time we harnessed our performance knowledge from the tuning guide and embodied some of it as part of a newly redesigned Server Performance Advisor (SPA) tool.
SPA 3.0 helps IT administrators collect metrics to diagnose performance issues on Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008 for up to 100 servers unobtrusively without adding software agents or reconfiguring production servers. It generates comprehensive performance reports as shown in figure 1 below and historical charts with recommendations.
In this post we discuss how SPA works and some of the unique features available at your fingertips once you download the tool.
Figure 1: A snapshot of a SPA performance report with two warnings
At a high level, SPA is composed of two parts. The first is a management console or dashboard for the user to pick which servers they plan to collect data on, the corresponding role of the server, how long they need to collect data for, and how often collections happen. The console has a set of requirements listed on the download page.
The second part of SPA is the Advisor Packs or “APs”. APs contain a set of performance rules. An AP serves two purposes; first it defines what data gets collected from the server when that AP is instantiated. Second, the AP rules are used for assessing the server’s behavior. For example if data from a server shows more than 10% packet retransmit rate for any network adapter and that network adapter has a lot of send activity on it per system counters, then a warning is logged in a report.
How it Works
Now let’s step back and give you a visual representation of the end to end process so you have a better understanding for the role each SPA component plays and how they interact with each another. The different steps in the flow are numbered 1 through 6 with each step described in greater detail below.
Figure 2: SPA workflow collecting data from remote servers
1. Setting up the data collection sessions
The user installed SPA and chooses which APs they want depending on the target server role. SPA ships with some built in APs like the Core OS AP, IIS AP, and the Hyper-V host AP. The Core OS AP is a generic AP covering the fundamentals like I/O and resource utilization, while the other 2 are role specific. The APs are imported into SPA to help define what data is collected and later used to assess the servers’ performance and generate the report. Users can choose and run multiple APs in a single data collection session.
2. Starting data collection on the target servers
The user defines how long they want to collect data for and chooses if they want a onetime collection or if they want to collect data at regular intervals. Keep in mind that the longer the collection the more data there will be to process and send back to the console machine. The console then sends a data collection request to the target servers over the Performance Logs and Alerts (PLA) service used by tools like Perfmon and SPA starts collecting the data.
3. Data collection on servers
Each target server receives a request from the console machine to start collecting a predefined set of data. SPA collects data from Event Tracing for Windows (ETW) events, Windows Management Infrastructure (WMI), performance counters, configuration files, and registry keys.
4. Saving the data for post processing
Each server writes the performance data to a pre-defined file share. We expect administrators to specify a file share on the console and avoid disk impact on the target server, but they can also choose where to create the file share from within SPA’s menu options.
5. Storing the data for generating a report
When the console pulls the data from the file share, it stores it in a SQL database. Because the data is stored in a database, SPA provides historical charts that help with trending performance behaviors over a period of days or hours. Users can also delete older reports from the database.
6. Generating the Performance report
SPA analyzes results based on the AP rules. It summarizes the findings in a report, identifies issues with some possible mitigation and lets the administrator decide if they want to make changes. NOTE: There is an expectation that this tool is used by experienced administrators that understand the intent of an implementation and are knowledgeable enough to determine whether a suggested mitigation is appropriate for the target system.
Now that you have an idea of how the process works end-to-end, let’s shift our focus to what users should come to expect from using SPA 3.0 in terms of installation, data collection, and report viewing.
- Zero agent installation on the server To use SPA, you just need to install it on a console machine meeting the requirements. SPA can collect performance data locally or from remote servers. Because SPA uses PLA for remote data collection, a user can point SPA at a remote server and start collecting performance data immediately (while the workload is running). A user can also have data collections on multiple servers going in parallel. Of course, the console machine needs to have the right authentication and ports open to ensure success. NOTE: The collection overhead is typically minimal with no impact to most workloads. The exceptions are extreme low latency workloads where the collection can alter a workload’s behavior.
- Extensible scriptable APs The APs is where all the performance knowledge is embodied and captured. The APs define what data is captured and what gets used to assess the server’s performance. Because it’s written in T-SQL, composing an AP is simple. We encourage you to check out the AP development guide to help you write a custom AP to help you expedite diagnosing performance issues.
- Multiple data sources for a cohesive view of system performance Windows has a very rich set of instrumentation points and APs can take advantage of this. The built-in APs certainly do! SPA collects data from a number of different sources mentioned previously, which makes it a very powerful tool. It correlates all of this data together, draws causality between the data points, and presents the user with a rolled up view of how their system is behaving with actionable recommendations addressing any reported issues. The user can then act on the recommendations which provide good insight into some key performance metrics like latency, scalability, and throughput.
- Side by side comparison of performance and server configuration data This new feature in SPA 3.0 allows users to compare data collected at two different points in time for the same server, collected at two different points in time for different servers, or collected for two different servers at the same time. The ability to compare how the server was behaving before a certain date or before applying a certain patch is very useful in narrowing down the point in time at which a change happened and correlating that with a drop in performance. The report uses triangles with a yellow exclamation mark to represent warnings and uses check marks inside of green circles to indicate no action is necessary.
Figure 3: Side by side comparison report for the same machine at 2 different points in time
- Charting and historical trending Charting the performance characteristics and metrics of a server over a specified period of time helps users recognize patterns and anomalies associated with specific days of the week where a certain activity takes place. For example, an administrator may have a scheduled indexing task kicking in and impacting system performance. In this case it can increase disk latency because of the incurred disk I/O writes from the indexing service kicking in and spinning the media. The following figures show some of the cool capabilities built-in SPA 3.0 for charting and trending data.
Figure 4: Historic trending chart showing performance metrics over time
Figure 5: Trend for the minimum file write throughput achieved over a period of 7 days
- Built-in APs for key server roles like Hyper-V and IIS To help users get started with using SPA 3.0, we provide three built-in APs as part of the download package. The first is the Core OS AP that focuses on basic resource characteristics like CPU utilization, network traffic, memory consumption, and storage related events. The IIS AP which has Web server specific rules like the top 10 URLs accessed and some of the common configuration parameters in IIS. The third is the Hyper-V host AP that focuses on a server hosting multiple virtual machines and provides virtualization specific diagnostics for performance issues.
- Configurable sampling intervals and durations for collecting data Some users want the ability to collect data on demand, while others would like to have a predefined collection interval. The latter can be helpful if you are trying to isolate a performance behavior that doesn’t happen on a regular basis and are trying to catch it by setting up frequent data collection intervals for a specified period of time. There are also users who want to manually kick off a data collection especially if they just downloaded an update or a patch, or they just did a hardware or firmware upgrade in their environment and want to quantify the performance impact before and after using the side by side comparison view
- PowerShell scripts support System administrator can write scripts to invoke SPA cmdlets and schedule remote periodic data collections on target servers within certain time intervals. They can also query the database for information about which APs were run and the target servers with associated SPA reports. The SPA manual has more details and syntax for the supported PowerShell cmdlets.
- Configurable thresholds inside of APs The built-in APs ship with a predefined set of thresholds for all the rules, but given that different workloads and different customers will have different Service Level Agreements (SLAs) and different alarm points, users have the flexibility to set their own thresholds in the APs to better suite their environment and workload.
Figure 6: Users can get more information about a rule in the Details view
In this blog we introduced the redesigned Server Performance Advisor 3.0. We walked you through a high level overview of how the different SPA components interact and what role they each play. We also shared some of the exciting new features and capabilities available with SPA. We hope you enjoyed this post and we invite you to download and try out the latest bits from the SPA MSDN download page. Please share your experiences and send us feedback at firstname.lastname@example.org.