The V1.8 release of ONNX Runtime includes many exciting new features. This release launches ONNX Runtime machine learning model inferencing acceleration for Android and iOS mobile ecosystems (previously in preview) and introduces ONNX Runtime Web. Additionally, the release also debuts official packages for accelerating model training workloads in PyTorch.
ONNX Runtime is a cross-platform runtime accelerator for machine learning models that takes advantage of hardware accelerators to performantly execute machine learning model inferencing and training on an array of devices.
ONNX Runtime for Web
Running AI capabilities in the browser has been a hot topic given the benefits web-side inferencing can provide. This includes reducing server-client communication, protecting user privacy, as well as offering install-free and cross-platform in-browser machine learning experience. ONNX Runtime Web is a new feature of ONNX Runtime that enables AI developers to build machine learning-powered web experience on both central processing unit (CPU) and graphics processing unit (GPU).
For CPU workloads, WebAssembly is used to execute models at near-native speed. ONNX Runtime Web compiles the native ONNX Runtime CPU engine into WebAssembly backend by using Emscripten. This allows it to run any ONNX model and support most functionalities native ONNX Runtime offers, including full ONNX operator coverage, multi-threading, quantization, and ONNX Runtime on Mobile.
For accelerated performance on GPUs, ONNX Runtime Web leverages WebGL, a popular standard for accessing GPU capabilities. Operator coverage and performance optimizations with the WebGL backend are continuously improving, and we are also exploring new techniques such as WebGPU to further speed up ONNX Runtime Web inference on GPUs.
Follow the examples of ONNX Runtime Web to try it out.
ONNX Runtime on Mobile
Supporting AI execution on mobile devices is also a popular use case, particularly for offline processing, privacy, or latency sensitive scenarios. We previewed the ONNX Runtime Mobile feature in V1.6 and have since added many new features, including support for NNAPI and CoreML execution providers to accelerate model execution on mobile phones. We are announcing the preview of the pod for C/C++ library for integrating ONNX Runtime Mobile with iOS applications. Both Android and iOS (in preview) packages are available as pre-built packages for installation from Maven Central and CocoaPods respectively. The Android package utilizes the native NNAPI accelerator on Android devices when available. The iOS package utilizes CoreML for accelerating model execution in addition to supporting direct execution on the ARM CPU.
ONNX Runtime Training for PyTorch
As part of the PyTorch ecosystem and available via the torch-ort package, ONNX Runtime Training is a backend for PyTorch to accelerate distributed training experiments of large transformer models. The training time and infrastructure cost are reduced with a one-line code change to the original PyTorch training script to add ONNX Runtime in the training loop. ONNX Runtime Training includes optimized kernels for GPU execution and efficient GPU memory management. This delivers up to 1.4X training throughput acceleration and enables large models to fit onto smaller GPUs thereby improving GPU utilization efficiency. Because the PyTorch training loop is unmodified, ONNX Runtime for PyTorch can compose with other acceleration libraries such as DeepSpeed, Fairscale, and Megatron for even faster and more efficient training. This release includes support for using ONNX Runtime Training on both NVIDIA and AMD GPUs.
For the full list of new features in this release, please see the release notes.