Skip to content
Microsoft Open Source Blog

Posts

Journey to optimize large scale transformer model inference with ONNX Runtime 

“With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code.” Large-scale transformer models, such as GPT-2 and GPT-3, are among the most...Read more

Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU 

This post is co-authored by Emma Ning, Azure Machine Learning; Nathan Yan, Azure Machine Learning; Jeffrey Zhu, Bing; Jason Li, Bing One of the most popular deep learning models used for natural language processing is BERT (Bidirectional Encoder Representations from Transformers). Due to the significant computation required, inferencing BERT at high scale can be extremely...Read more