Whew! Friday at 2:18PM we signed off on Beta 2 of Windows HPC Server 2008. It’s a good thing too since the Redmond team is looking at the first sunny and hot Northwest weekend this year. Mother nature usually gives us these days on weekdays. It’s been a hard push since November when we shipped our last beta. Since then we’ve done test runs on a cluster with over 1000 nodes, fixed over 1000 bugs, coded a bunch of new features, and made a bunch of design changes based on customer feedback. For example, one beta customer was using our new WCF Broker for financial risk modeling but wanted a totally reliable messaging solution. We built a solution leveraging MSMQ that still provides high throughput while allowing for reliable messaging.
Now that Beta 2 is finished our Technology Adoption Partners (TAP) will put this beta into production environments. We’ll carry pagers to help them out if they run into a crit-sit after hours. Actually, we have cell phones. Pagers have gone the way of sock punch cards, teletypes, and sock garters. I suspect there are teenagers wandering around that don’t know what a pager is.
Anyway, there’s a bunch of new stuff in Beta 2.
We checked in high availability for the head node and a new set of diagnostic tests to help people identify and troubleshoot their clusters. The new UI model is really coming together but for users more comfortable with command line interfaces we provide scripting support through COM and PowerShell. Finally, administrators can run administrative scripts in parallel across the cluster using our improved Clusrun feature.
A bunch of humbling (heh) usability testing pushed us to redesign the To Do List. It should be much easier for people to get through setting up a cluster, adding drivers to images, and configuring patching for the cluster (new feature!). The heat map is working so well we’ve thrown out our internal monitoring tools we use on Top500 runs.
After lots of, um, passionate debate we’ve finalized the APIs for job submission. It will continue to be easy for ISVs to integrate directly with our job scheduler while at the same time working with a cluster that may have thousands of jobs in the queue, each job with thousands of tasks.
A lot of people don’t know that we co-chair the HPC Basic Profile working group at the Open Grid Forum. With Beta 2, we ship our support for “HPC Basic Profile,” allowing us to interop with the LSF and PBSPro job schedulers.
We completed a few great Top500 runs in the last few weeks. We can’t talk about the numbers until the International Supercomputing Conference in June but it looks like Beta 2’s new MPI stack and new Network Direct RDMA interface are starting to hum.
Finally, our new programming model based on SOA is getting some nice usage from beta customers. Most of the feedback has come from folks in computational finance but there are also a couple folks in the life sciences industry that are kicking the tires. For example, what if you came up with a new theory about cancer and wanted to search through thousands of medical scans to see if your theory was correct. For Beta 2 we improved scalability, reduced latency and improved session initialization time. Beta 2 supports multiple WCF Brokers, allowing HPC Server 2008 to run really big SOA workloads.
So, we’re done with Beta 2. Lots of new features (whew) and lots of scalability improvements. We’ve posted build 1345, Beta 2, up at http://connect.microsoft.com
Ryan Waite
Group Program Manager – HPC