Introducing ONNX Runtime mobile – a reduced size, high performance package for edge devices

By Scott McKay, Principal Software Engineer
Manash Goswami, Principal Program Manager, Machine Learning Platform

Content type
Project updates

Topic
AI + Machine Learning

Programming language
PyTorch

ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce ONNX Runtime release v1.5 as part of our AI at Scale initiative. This release includes ONNX Runtime mobile, a new feature targeting smartphones and other small storage devices. This capability extends ONNX Runtime to support the optimized execution of Machine Learning (ML) models in edge scenarios. Edge includes any compute enabled devices such as PCs, smartphones, special-purpose embedded devices, or IoT devices.

ONNX Mobile diagram ONNX Runtime is the inference engine used to execute ONNX models. ONNX Runtime is supported on different Operating System (OS) and hardware (HW) platforms. The Execution Provider (EP) interface in ONNX Runtime enables easy integration with different HW accelerators. There are published packages available for x86_64/amd64 and aarch64 or developers can build from source for any custom configuration. Today’s announcement enables developers to create an optimized package for the ONNX Runtime to be used across the diverse set of edge devices. This package supports the same APIs for the application code to manage and execute the inference sessions.

AI is infused in applications to solve many scenarios, delivering rich user experiences and efficient workflows for different tasks. These applications make use of runtime engines from the host platform (OS) or package the library to execute the ML models. Also, the disk footprint and in-memory size for these applications need to be reasonably small to optimize resource consumption on iOS / Android smartphones, Windows 10 PCs and other edge device categories.

These requirements motivated us to develop the ONNX Runtime mobile feature. Developers can create a smaller runtime package to use in client devices for executing their ONNX models. The size reduction is achieved by building the runtime package as a custom binary for a user-defined set of ONNX models and by using a new optimized file format for the model file. Details for the ONNX Runtime mobile packaging steps are available on GitHub.

ONNX Runtime mobile can execute all standard ONNX models. The size of the runtime package varies depending on the models you wish to support. As shown in the chart below, the size of the ONNX Runtime mobile package for Mobilenet is the same (~1% difference) as TensorFlowLite’s reduced build package.

Package size for Mobilenet chart

*TfLite package size from: Reduce TensorFlow Lite binary size
†ONNX Runtime full build is 7,546,880 bytes

ONNX Runtime v1.5 Updates

The ONNX Runtime v1.5 update is released today. Here are some of the highlights of the updates for this release for Inference and Training feature areas. Full details can be found in the release notes.

Inference features

Reduced Operator Kernel build
Transformer model performance optimizations
Improved quantization support for CNN models (performance, per-channel quantization)
Execution Providers
- CUDA: updated to CUDA 10.2 / cuDNN 8.0
- TensorRT 7.1 support
- OpenVINO 2020.4 support
- DirectML updates for more complete operator support
- NNAPI major updates for better usability and operator support
- MIGraphX updates for more complete operator support and improved performance

Training features

Today we introduce updates for improved developer experience to use ONNX Runtime training in your distributed training experiments:

New and improved API to simplify integration with PyTorch trainer code
Updated CUDA 11 / cuDNN 8.0 support to accelerate in NVIDIA A100

We announced the preview of the training feature in ONNX Runtime during //build 2020 as part of the AI at Scale initiative.

Get Started

Questions or feedback? Please let us know in the comments below.

Scott McKay

Principal Software Engineer

See more articles from this author

Manash Goswami

Principal Program Manager, Machine Learning Platform

See more articles from this author

Mar 26 •

10 min read

Hyperlight Wasm: Fast, secure, and OS-free

We're announcing the release of Hyperlight Wasm: a Hyperlight virtual machine “micro-guest”…
November 12, 2024 •

8 min read

What’s new with Microsoft in open-source and Kubernetes at KubeCon North America 2024

At Microsoft, we are committed to innovation in the cloud-native ecosystem through…
August 28, 2024 •

4 min read

Microsoft at Open Source Summit Europe 2024: Driving collaboration and innovation

Connect with other open source enthusiasts at Open Source Summit Europe 2024…
June 27, 2024 •

5 min read

5 things we learned from sponsoring a sampling of our open source dependencies

Microsoft is experimenting with and investing in sustainability of the open source…