Automate optimization techniques for transformer models 

3 min read

Intel has collaborated with Microsoft to integrate Intel® Neural Compressor into Olive, enabling developers to easily take advantage of model compression techniques in their deployment platform, including Intel processors and accelerators. Read more

Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime 

8 min read

Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry. We’re happy to see that the ONNX Runtime Machine Learning model inferencing solution we’ve built and use in high-volume Microsoft products and services also resonates with our open source community, enabling new capabilities that Read more

Accelerate and simplify Scikit-learn model inference with ONNX Runtime 

5 min read

Scikit-learn is one of the most useful libraries for general machine learning in Python. To minimize the cost of deployment and avoid discrepancies, deploying scikit-learn models to production usually leverages Docker containers and pickle, the object serialization module of the Python standard library. Docker is a good way to create consistent environments and pickle saves Read more

ONNX Runtime scenario highlight: Vespa.ai integration 

1 min read

Since its open source debut two years ago, ONNX Runtime has seen strong growth with performance improvements, expanded platform and device compatibility, hardware accelerator support, an extension to training acceleration, and more. We are excited by its broad usage in production, powering more than a hundred models across Microsoft products and services and bringing concrete Read more

Adding RoBERTa NLP to the ONNX model zoo for natural language predictions 

3 min read

In summer 2019, I worked as a high school intern for the ONNX AI team at Microsoft and loved working on various projects with the team, including the BERT text classification model. However, due to Covid-19, the Microsoft Internship Program for high school students was canceled in the summer of 2020. This led two other Read more

Introducing ONNX Runtime mobile – a reduced size, high performance package for edge devices 

2 min read

ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce ONNX Runtime release v1.5 as part of our AI at Scale initiative. This release includes ONNX Runtime mobile, a new feature targeting smartphones and other Read more

GPT-2 fine-tuning with ONNX Runtime – a 34% speedup in training time 

4 min read

Model training is an important step when developing and deploying large scale Artificial Intelligence (AI) models. Training typically utilizes a large amount of compute resources to tune the model based on the input dataset. Transformer models, with millions and billions of parameters, are especially compute-intensive and training costs increase with model size and fine-tuning steps Read more

Announcing accelerated training with ONNX Runtime—train models up to 45% faster 

3 min read

ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. It is used extensively in Microsoft products, like Office 365 and Bing, delivering over 20 billion inferences every day and up to 17 times faster inferencing. Today we are introducing Read more

1 Comment