High-performance computing (HPC) is a set of computing, networking, and storage resources integrated and orchestrated to optimize workloads or applications in cloud-enabled environments. HPC enables optimization and orchestration of on-demand central processing unit (CPU), graphics processing unit (GPU), and other computing resources needed to run intensive jobs or workloads.
An organization has several options available to manage their computing requirements—on-premises, cloud, and hybrid cloud—are the primary scenarios you are most likely familiar with along with variations of each. Utilizing a cloud computing environment to develop, deploy, manage, and use applications to support the business is one of the scenarios we are all most familiar with.
On-premises high-performance computing
But what about computing operations that require dedicated computing power to run complex calculations such as machine learning, AI, or simulation work? A traditional on-premises environment would be built around a complex network of servers and computing resources including storage (hot, cold, and archive), power, networking, and complex management by dedicated technology resources. Not only is this extremely capex intensive, but it also tends to be less efficient as these computing resources are often designated by department or workload where we see high idle or low utilization time, resulting in higher energy consumption and cost. And, from a utilization standpoint, many traditional licensing agreements for on-premises applications and resources may be built around capacity limitations which can hamper the ability to drive larger, more complex jobs without making adjustments that could require additional costs and complexities.
The nature of many HPC workloads like AI, simulation, and testing, is that while they are intensive, they are often only required in “bursts” of activity, particularly to support specific projects like engineering or product development within the specific organization or organizational structure. In the automotive, mobility, and transportation space, for example, this can be highlighted in autonomous driving simulation, testing, verification, and validation, or analyzing complex routing or global logistics simulation scenarios for transportation and logistics operations. With an on-premises environment in these situations, you may be constrained by the amount of time or permutations of the scenarios that you can manage at the same time. Again, the long-term cost implications of this scenario make inefficient use of capital investments and technical resources which could be leveraged for other projects.
Cloud-based high-performance computing
Continuing with the automotive industry example, advances in mobility are driving intense investment around the core industry initiatives supporting C.A.S.E—Connected, Autonomous, Shared Mobility, and Electrification. Modern vehicle engineering is considered a large control loop involving the physical world, sensors, controllers, and actuators. An excellent example of leveraging a cloud-based HPC environment involves the engineering around Autonomous Driving (AD), Advanced Driver Assistance Systems (ADAS), and Autonomous Vehicles (AV). These advanced systems require a plethora of sensors and computer systems on the vehicle which must be aware of, sense, and respond to stimuli both within the vehicle and in the vehicle’s immediate surrounding environment. Everything from weather, road conditions and markings, traffic and traffic control devices (TCDs), pedestrians, street signs, and other anomalies need to be recognized and processed to enable an appropriate vehicular response that ensures, above all else, safety.
For these systems to be put into use, they must be simulated, and tested, based on a defined verification and validation environment for a specific global region and Operational Design Domain (ODD). You’ve probably seen road-going autonomous vehicles plying local streets with their array of spinning sensors and protruding appendages. Even before this critical on-road data collection phase, these sensors must be rigorously tested in a simulated environment built on HPC, requiring intense computing resources to simulate real-world conditions. For many original equipment manufacturers (OEM), this capability would typically be supported by a traditional, mostly on-premises computing environment. But with autonomous development workloads that support multi-physics, perception training, and simulation, for instance, an enormous amount of data and computing power are involved. To put the scale of this problem into perspective, sensor simulation work can generate over four gigabytes of data per second, per vehicle. Managing this amount of data and running the required scenarios can strain even the best on-premises environments.
Hybrid cloud high-performance computing
In support of autonomous driving, multi-physics, and perception training, vehicle sensors such as cameras, Light Detection and Ranging (LiDAR), or RADAR, must be on-par and in many cases better than human perception to support anticipatory and near-immediate responses. Think of the scenario where a driver encounters a hidden object in the roadway at the last moment but cannot take a certain evasive action due to an oncoming vehicle. These split-second human decisions must be made by the autonomous vehicle planning system based on the sensor or perception environment and these precise scenarios must be simulated with extreme granularity. Even physical and environmental elements such as the location of sensors, the impact of weather, road conditions, as well as vehicle dynamics, must be simulated to understand autonomous system impact. This software-in-the-loop (SIL) and hardware-in-the-loop (HIL) simulation regimes must be run with exhaustive permutations to yield the best results which can become critical restraints in a non-HPC environment. Other challenges include inflexible software contracts, restricting the amount of software utilization for a given job, fixed resources that can hinder the amount of computing power available, storage management, orchestration obstacles, and geographic disparity, all adding to longer development cycles and lost time to market.
To combat these challenges, companies in automotive and other industries with complex computing requirements are finding relief by leveraging hybrid cloud extension models. In this model, complex jobs can be assigned to cloud-based HPC environments at will with efficient orchestration. The advantages include the ability to rapidly scale up to provide optimal resource availability and then rapidly scale down to avoid costly underutilization of HPC resources when they are no longer needed, greatly minimizing the impact to on-premises resources, and reducing the costly overhead associated with upscaling or maintaining a long-term on-premises environment. Critically, this model enables a business to maintain select, mission-critical, or other applications or processes in their own on-premises or other environments, such as HIL, while enabling practically unlimited computing resources to complete other complex workloads in a cloud environment, creating enhanced agility. In the above automotive simulation example, an HPC environment can simulate billions of miles over millions of scenarios enabling the shortest, most cost-effective path to production.
While leveraging a hybrid cloud scenario to work these complex scenarios can yield significant benefits, getting there can require a significant shift in strategy and approach. Critical to enabling this transformation is the orchestration of on-premises and cloud workloads. For complex training of machine learning algorithms for AI and simulation like we see in automotive, an HPC environment works very well. It enables a focus on continual optimization cycles through iterative training to improve AI results. The value, in this case, is a “mature” and accurate solution that requires increased computing time and resources to achieve.
For other workloads, a company may want to create rules in the orchestration process that requires specific thresholds and requirements to be met to define when jobs will run in either a cloud or on-premises environment and work to integrate the two environments. The orchestration process can also be aware of data sources and security requirements as well.
From a business opportunity perspective, leveraging a hybrid HPC environment can provide the best of both on-premises access and cloud extensibility, agility, and access to a vast amount of computing resources to tackle complex modeling and simulation jobs, in a cost-effective manner.
Benefits of hybrid or cloud high-performance computing
One other very important benefit to leveraging a hybrid or cloud HPC environment is the environment itself. In company boardrooms of all sizes, sustainability is no longer discussed as an option. It’s imperative that all facets of business operation can be viewed through a sustainability lens. There is no exception from a computing standpoint where a Microsoft study has shown cloud to be as much as 93 percent more energy-efficient, and as much as 98 percent more carbon-efficient, than on-premises solutions. A win-win for your bottom line and the environment as well.
For more information, visit the Azure high-performance computing for automotive webpage. Learn how Audi AG leveraged HPC and cloud to meet their complex storage and computing challenges.