I remember being a kid and having a big secret that I couldn’t tell anyone. Stuff like Bobby has a crush on Susie or Jeffrey has a tail. Okay, I didn’t know anyone who had a tail but you can imagine how hard it would be for a 10 year old to keep a secret like that.
A few months ago we completed our runs for the Top500 list. For those of you not familiar with this bi-annual benchmark, the Top500 list represents the 500 most powerful computers in the world. It is the supercomputing supergeek superlist. We completed runs with the National Center for Supercomputing Application (NCSA) and with Umea University. The problem is that even though we did the runs months ago we weren’t allowed to discuss the results until this week, the week of the International Supercomputing Conference in Dresden, Germany. We had to keep it a secret. Ugh.
The NCSA cluster is amazing. 1200 nodes, each with 8 cores, creating a 9600 core cluster. NCSA installed Beta 1 of Windows HPC Server 2008 and ran the benchmark. The results were outstanding: 68.5 teraflops and 77.7% efficiency. Using our beta software NCSA beat their November score by over 10%. This is the fastest Windows cluster to date. Check out the customer video and case study.
The Umea University cluster, “Akka”, is located in northern Sweden. This system was also running Beta 1 and hit 46 teraflops on 5,376 cores with a VERY impressive 85.5% efficiency score. This is the BEST efficiency score for an x86 architecture cluster on the Top 500 list. Umea University will run the new supercomputer at its facility known as “HPC2N”. The university’s cluster employs 672 IBM blade servers, and also marks the first time that Windows HPC Server 2008 has been run publicly on IBM hardware. 
So, the benchmarking numbers are looking pretty good, and those benchmarks were with our first beta. We shipped our second beta last month and we’re shipping our first release candidate at the end of this month.
How did we do so well on the benchmarks? We’ve made big improvements in the Microsoft MPI stack. MPI (Message Passing Interface) is used for tightly coupled communications between servers running in parallel. The biggest improvements were in what are called shared memory interfaces, that is, the interfaces used for communication between processor cores on the same system. Our MPI stack is based on Argonne National Lab’s MPI stack called MPICH2. We will contribute our changes back to Argonne for inclusion in the open source version of MPICH2. These are some of the largest contributions to the open source community by Microsoft. Yep, open source and Microsoft.
Network Direct, our new RDMA (Remote Direct Memory Access) networking stack was another area of improvement. We collaborated with partners like Mellanox, NetEffect, and Myricom to build a very efficient RDMA stack. Improvements in MPI and Network Direct contributed hugely to our great score.
Very impressive benchmark results for a product that’s not even released to manufacturing yet and the benchmark scores were a very hard secret to keep.  The release candidate of Windows HPC Server 2008 will be available for customers to download the last week of June.
Ryan Waite,
Group Program Manager on the HPC Dev Team