News JavaScript • February 29, 2024 • 4 min read ONNX Runtime Web unleashes generative AI in the browser using WebGPU ONNX Runtime Web featuring WebGPU is now available in the ONNX Runtime 1.17 release—unlocking new possibilities.
News Tools PyTorch • June 26, 2023 • 4 min read Olive: A user-friendly toolchain for hardware-aware model optimization Introducing Olive, an easy-to-use toolchain for optimizing models with hardware awareness. With Olive, you don't need to be…
News Cloud • June 26, 2023 • 3 min read Automate optimization techniques for transformer models Intel has collaborated with Microsoft to integrate Intel® Neural Compressor into Olive, enabling developers to easily take advantage…
Project updates PyTorch • May 2, 2022 • 5 min read Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs Mohit Ayani, Solutions Architect, NVIDIA Shang Zhang, Senior AI Developer Technology Engineer, NVIDIA Jay Rodge, Product Marketing Manager-AI,…
Project updates AI + Machine Learning JavaScript • September 2, 2021 • 5 min read ONNX Runtime Web—running your machine learning model in browser We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers…
AI + Machine Learning PyTorch • June 30, 2021 • 7 min read Journey to optimize large scale transformer model inference with ONNX Runtime “With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer…
Project updates AI + Machine Learning • January 21, 2020 • 4 min read Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU This post is co-authored by Emma Ning, Azure Machine Learning; Nathan Yan, Azure Machine Learning; Jeffrey Zhu, Bing;…