One year after ONNX Runtime’s initial preview release, we’re excited to announce v1.0 of the high-performance machine learning model inferencing engine. This release marks our commitment to API stability for the cross-platform, multi-language APIs, and introduces a breadth of performance optimizations, broad operator coverage, and pluggable accelerators to take advantage of new and exciting hardware developments.
Year in review
In its first year, ONNX Runtime was shipped to production for more than 60 models at Microsoft, with adoption from a range of consumer and enterprise products, including Office, Bing, Cognitive Services, Windows, Skype, Ads, and others. These models span from speech to image to text (including state of the art models such as BERT) and ONNX Runtime has improved the performance of these models by an average of 2.5x over previous inferencing solutions.
In addition to performance gains, the interoperable ONNX model format has also provided increased infrastructure flexibility, allowing teams to use a common runtime to scalably deploy a breadth of models to a range of hardware. Across Microsoft technologies, ONNX Runtime is serving hundreds of millions of devices and billions of requests daily.
We also collaborated with a host of community partners to take advantage of ONNX Runtime’s extensibility options to provide accelerators for a variety of hardware options. With active contributions from Intel, NVIDIA, JD.com, NXP, and others, today ONNX Runtime can provide acceleration on the Intel® Distribution of the OpenVINO™ Toolkit, Deep Neural Network Library (DNNL) (formerly Intel® formerly MKL-DNN), nGraph, NVIDIA TensorRT, NN API for Android, the ARM Compute Library, and more.
What’s new in 1.0
We’ve made some changes to the C API for clarity of usage and introduced versioning to accommodate future updates.
- C APIs are ABI compatible and follow Semantic Versioning. Programs linked with the current version of the ONNX Runtime library will continue to work with subsequent releases without updating any client code or re-linking.
- We’ve also enabled some new capabilities through the Python and C# APIs for feature parity, such as providing registration of execution providers in Python and setting additional run options in C#.
ONNX 1.6 compatibility with opset 11
Keeping up with the evolving ONNX spec remains a key focus for ONNX Runtime and this update provides the most thorough operator coverage to date. ONNX Runtime supports all versions of ONNX since 1.2 with backwards and forward compatibility to run a comprehensive variety of ONNX models.
Execution Provider (EP) updates
- General Availability of the OpenVINO™ EP for Intel® CPU, Intel® Integrated Graphics, Intel® Neural Compute Stick 2, and the Intel® Vision Accelerator Design with Intel® Movidius™ Myriad™ VPU powered by OpenVINO™nGraph EP support of new operators.
- nGraph EP support of new operators.
- TensorRT EP updated to the latest TensorRT 6.0 libraries.
New Execution Providers in preview
- NUPHAR (Neural-network Unified Preprocessing Heterogeneous ARchitecture) is a TVM and LLVM based EP offering model acceleration by compiling nodes in subgraphs into optimized functions via JIT.
- DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows, providing GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.
- Support for Intel® Vision Accelerator Design with Intel® Arria™ 10 FPGA powered by OpenVINO™.
- ARM Compute Library (ACL) Execution Provider targets ARM CPUs and GPUs for optimized execution of ONNX operators using the low-level libraries.
Outside of adding new Execution Providers for hardware acceleration, we’ve also made a host of updates to minimize default CPU and GPU (CUDA) latency for inference computations.
To facilitate production usage of ONNX Runtime, we’ve released the complementary ONNX Go Live tool, which automates the process of shipping ONNX models by combining model conversion, correctness tests, and performance tuning into a single pipeline as a series of Docker images. We’ve also refreshed the quantization tool to support improved performance and accuracy for inferencing quantized models in ONNX Runtime, with updates for node fusions and bias quantization for convolutions.
This release contains many bug fixes identified during the past few months. As an active growing project, we do expect bugs to be uncovered as the breadth of models expands. We continue striving towards quality and are committed to actively resolve issues as they are uncovered. You can always report bugs on Github.
For full release notes, please see https://aka.ms/onnxruntime-release.
ONNX Runtime 1.0 is a notable milestone, but this is just the beginning of our journey. We support the mission of open and interoperable AI and will continue working towards improving ONNX Runtime by making it even more performant, extensible, and easily deployable across a variety of architectures and devices between cloud and edge. You can find our detailed roadmap here.
We thank our community of contributors and look forward to even greater impact to further innovation and operationalization of ML in the field.