ONNX Runtime 0.5, the latest update to the open source high performance inference engine for ONNX models, is now available. This release improves the customer experience and supports inferencing optimizations across hardware platforms.

Since the last release in May, Microsoft teams have deployed an additional 45+ models that leverage ONNX Runtime for inferencing. These models are used in key products and services that reach millions of customers.

ONNX Runtime 0.5 Release Summary

Building on the momentum of our last release, new features in ONNX Runtime 0.5 are targeted towards improving ease of use for experimentation and deployment. This release includes:

  • A convenient C++ Inferencing API (in addition to existing C, C#, and Python APIs)
  • A custom operator that supports running Python code even when official operators are missing (in preview)
  • ONNX Runtime Server as a hosted application for serving ONNX models with HTTP and GRPC endpoints (in preview)

With these additions, we advance the journey of making ONNX Runtime the preferred solution for operationalizing ML inferencing workflows.

We are also excited by the community’s continuous collaboration and enthusiasm, contributing multiple Execution Providers (EPs) to ONNX Runtime that further the baseline CPU and NVIDIA GPU CUDA-based EPs. This furthers our mission to endorse choice and versatility in hardware compute targets.

Whether your target is PC, Mac, Linux on the cloud, local machines, light-weight or heavy IoT devices, ONNX Runtime strives to take advantage of available hardware capabilities to provide the best performance possible. Our continued collaboration allows ONNX Runtime to fully utilize available hardware acceleration on specialized devices and processors.

The release of ONNX Runtime 0.5 introduces new support for Intel® Distribution of OpenVINO™ Toolkit, along with updates for MKL-DNN. It’s further optimized and accelerated by NVIDIA CUDA and NVIDIA TensorRT GPU platforms from the cloud to the edge. Additional information can be found below.

ONNX Runtime Execution Providers

Hardware platforms use custom libraries to execute and accelerate computations used in neural network models. These libraries and interfaces are different for every platform. To accelerate a machine learning model, developers need to be aware of these differences and write endpoint specific code.

ONNX Runtime Execution Providers (EPs) allow you to run any ONNX model using a single set of inference APIs that provide access to the best hardware acceleration available. In simple terms, developers no longer need to worry about the nuances of hardware specific custom libraries to accelerate their machine learning models.

Intel® Distribution of the OpenVINO™ Toolkit

ONNX Runtime 0.5 includes OpenVINO toolkit as an Execution Provider (EP), now in public preview. OpenVINO empowers developers to create applications and solutions that emulate human vision. Built to optimize execution of convolutional neural networks (CNNs), the toolkit extends workloads for developers to using ONNX models across Intel® hardware, including accelerators such as VPUs, vision acceleration design cards, and the Neural Compute Stick (NC2). With the integration of Intel OpenVINO EP and ONNX Runtime, developers can take advantage of the neural network execution capabilities across the breadth of Intel platforms using their ONNX models.

This is a significant milestone for ONNX Runtime as it is the first public preview EP integration for inferencing on IoT edge devices such as the Intel powered UP2 AI Vision Kit or IEI TANK AIoT platforms. Data scientists can use Azure Machine Learning service to train models using their framework of choice, export to ONNX, deploy via Azure IoT edge, and run hardware accelerated inferencing with ONNX Runtime. With the OpenVINO toolkit integration, we have now enabled inferencing across a variety of Intel edge devices.

This Notebook provides a sample tutorial covering the end-to-end scenario for deploying models with ONNX Runtime and OpenVINO EP, demonstrating how to train models in Azure Machine Learning, export to ONNX, and then deploy with Azure IoT Edge. This tutorial has been validated in the Intel UP2 and IEI TANK reference platform containing Intel’s NN accelerator chips.

NVIDIA Jetson Nano

As part of our continued partnership with NVIDIA, we worked with NVIDIA’s Jetson team to build a reference solution for deploying trained models from Azure Machine Learning to the NVIDIA Jetson Nano with Azure IoT Edge.

Today, we are releasing a new tutorial for developers to deploy ONNX models on the NVIDIA Jetson Nano. The Jetson Nano is the latest platform within the AI at the edge family of Jetson products, offering low-power and high-compute for IoT edge devices.

This tutorial is a reference implementation for IoT solution developers looking to deploy AI workloads to the edge using Azure cloud and NVIDIA’s GPU acceleration capabilities. In the tutorial, the captured data is processed on-device and only the inference results are sent to Azure Blob Storage for visualization in PowerBI. This approach expedites business outcomes by taking advantage of on-device compute capabilities to execute AI models.

The tutorial splits the device’s processing logic into three separate containers to enable modular customizations for user specific scenarios. To customize, users can train models in Azure Machine Learning service, export to ONNX, and change the reference code in the tutorial to include the trained model. ONNX Runtime executes the model in the inference container by taking advantage of the TensorRT libraries and provides significant inference capabilities at the edge.

Additional Resources

To learn more about ONNX Runtime Execution Providers, watch this video.

Get started on GitHub.

Have feedback or questions about ONNX Runtime? File an issue on GitHub, and follow us on Twitter.