PyTorch

Introducing ONNX Script: Authoring ONNX with the ease of Python

August 1, 2023 6 min read

By Aaron BockoverPrincipal Engineer, Microsoft
Maanav DalalProgram Manager, Microsoft
Ganesan RamalingamPrincipal Architect, Microsoft
Justin ChuSoftware Engineer, Microsoft

ONNX Script is a new open-source library for directly authoring ONNX models in Python with a focus on clean, idiomatic Python syntax and composability through ONNX-native functions. Read more

Olive: A user-friendly toolchain for hardware-aware model optimization

June 26, 2023 4 min read

By Emma NingPrincipal Program Manager, AI Frameworks
Devang PatelPrincipal Architect, AI Frameworks
Guoliang HuaPrincipal Software Engineer Manager, Microsoft.

Introducing Olive, an easy-to-use toolchain for optimizing models with hardware awareness. With Olive, you don’t need to be an expert to explore diverse hardware optimization toolchains. Read more

Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs

May 2, 2022 5 min read

By Emma NingPrincipal Program Manager, AI Frameworks

Mohit Ayani, Solutions Architect, NVIDIA Shang Zhang, Senior AI Developer Technology Engineer, NVIDIA Jay Rodge, Product Marketing Manager-AI, NVIDIA Transformer-based models have revolutionized the natural language processing (NLP) domain. Ever since its inception, transformer architecture has been integrated into models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) for performing tasks Read more

Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime

April 19, 2022 8 min read

By Faith XuPrincipal Program Manager, Machine Learning Platform

Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry. We’re happy to see that the ONNX Runtime Machine Learning model inferencing solution we’ve built and use in high-volume Microsoft products and services also resonates with our open source community, enabling new capabilities that Read more

Supporting efficient large model training on AMD Instinct™ GPUs with DeepSpeed

March 21, 2022 6 min read

By Olatunji RuwasePrincipal RSDE
Jeff RasleySenior Research SDE

This post was co-authored by Jithun Nair and Aswin Mathews, members of technical staff at AMD. In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains. Companies now use these models to power novel AI-driven user experiences across a whole spectrum of Read more

Introducing Distributed Data Parallel support on PyTorch Windows

August 4, 2021 6 min read

By Chester LiuSoftware Engineer II

Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can’t really do anything about it. If you have the luxury (especially at this moment of time) of having multiple GPUs, you are likely to find Read more

Accelerate PyTorch training with torch-ort

July 13, 2021 3 min read

By Natalie KershawSenior Program Manager, AI Frameworks, Microsoft

With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.ORTModule, to accelerate distributed training of PyTorch models, reducing the time and resources needed Read more

ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform

July 13, 2021 4 min read

By Weixing ZhangPrincipal Software Engineer, AI Frameworks at Microsoft
Suffian KhanSoftware Engineer, AI Frameworks at Microsoft

This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in Read more

Journey to optimize large scale transformer model inference with ONNX Runtime

June 30, 2021 7 min read

By Xiaoyu LiuApplied Scientist II, Data&AI, Developer Division (DevDiv)
Eric LinSenior Researcher SDE, Turing Team
Emma NingPrincipal Program Manager, AI Frameworks

“With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code.” Large-scale transformer models, such as GPT-2 and GPT-3, are among the most Read more

Delivering reliable production experiences with PyTorch Enterprise on Microsoft Azure

May 25, 2021 3 min read

By Sarah NovotnyOpen Source Lead, Azure Office of the CTO

At Microsoft, we use PyTorch to power products such as Bing and Azure Cognitive Services and we actively contribute to several PyTorch open-source projects, including PyTorch Profiler, ONNX Runtime, DeepSpeed, and more. Today, we’re announcing a new initiative in collaboration with Facebook—the PyTorch Enterprise Support Program. This new program enables service providers to develop and Read more

Introducing ONNX Runtime mobile – a reduced size, high performance package for edge devices

October 12, 2020 2 min read

By Scott McKayPrincipal Software Engineer
Manash GoswamiPrincipal Program Manager, Machine Learning Platform

ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce ONNX Runtime release v1.5 as part of our AI at Scale initiative. This release includes ONNX Runtime mobile, a new feature targeting smartphones and other Read more

GPT-2 fine-tuning with ONNX Runtime – a 34% speedup in training time

August 24, 2020 4 min read

By Aishwarya BhandareSoftware Engineer
Tianju XuSenior Software Engineer
Kshama PawarSenior Program Manager

Model training is an important step when developing and deploying large scale Artificial Intelligence (AI) models. Training typically utilizes a large amount of compute resources to tune the model based on the input dataset. Transformer models, with millions and billions of parameters, are especially compute-intensive and training costs increase with model size and fine-tuning steps Read more

Follow OpenAtMicrosoft