Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU 

4 min read

This post is co-authored by Emma Ning, Azure Machine Learning; Nathan Yan, Azure Machine Learning; Jeffrey Zhu, Bing; Jason Li, Bing One of the most popular deep learning models used for natural language processing is BERT (Bidirectional Encoder Representations from Transformers). Due to the significant computation required, inferencing BERT at high scale can be extremely Read more

8 Comments

Announcing Applied Cloud Stories 

2 min read

We are delighted to announce the Applied Cloud Stories initiative by Microsoft! What is Applied Cloud Stories? Do you work with open source? Are you passionate about machine learning or data science? Do you have stories to share about solving scale or data challenges? Are you investing time and effort so that you and your Read more

ONNX joins Linux Foundation 

2 min read

Today the Open Neural Network eXchange (ONNX) is joining the LF AI Foundation, an umbrella foundation of the Linux Foundation supporting open source innovation in artificial intelligence, machine learning, and deep learning. ONNX was co-founded by Microsoft in 2017 to make it easier to create and deploy machine learning applications. In the past few years, Read more

Announcing ONNX Runtime 1.0 

3 min read

One year after ONNX Runtime’s initial preview release, we’re excited to announce v1.0 of the high-performance machine learning model inferencing engine. This release marks our commitment to API stability for the cross-platform, multi-language APIs, and introduces a breadth of performance optimizations, broad operator coverage, and pluggable accelerators to take advantage of new and exciting hardware Read more

Now available: ONNX Runtime 0.5 with support for edge hardware acceleration 

4 min read

ONNX Runtime 0.5, the latest update to the open source high performance inference engine for ONNX models, is now available. This release improves the customer experience and supports inferencing optimizations across hardware platforms. Since the last release in May, Microsoft teams have deployed an additional 45+ models that leverage ONNX Runtime for inferencing. These models Read more

Microsoft joins partners and the Linux Foundation to create Confidential Computing Consortium 

3 min read

Microsoft has invested in confidential computing for many years, so I’m excited to announce that Microsoft will join industry partners to create the Confidential Computing Consortium, a new organization that will be hosted at The Linux Foundation. The Confidential Computing Consortium will be dedicated to defining and accelerating the adoption of confidential computing. Confidential computing Read more

1 Comment

Trill 103: Ingress, Egress, and Trill’s notion of time 

8 min read

Congratulations! You’ve made it to the next installment of our overview of Trill, Microsoft’s open source streaming data engine. As noted in our previous posts about basic queries and joins, Trill is a temporal query processor. Trill works with data that has some intrinsic notion of time. However, Trill doesn’t assign any semantics to that Read more

AzureR now available: Create, manage, and monitor Azure services with R 

4 min read

AzureR, a family of packages that provides tools to manage Azure resources from the open source R language, is now available. If you code in Python, C#, Java or JavaScript, you already have a rich selection of SDKs to choose from to interact with Azure. AzureR extends SDK support to the R language, by providing Read more

1 Comment

How to use Trill for impression feedback (part 2) 

8 min read

This is part 2 of 2-post series that shows you how to use Trill, an open source .NET library designed to process one trillion events a day, for impression feedback. In part 1, we walked through how to write Trill queries to find out: 1) which impressions successfully joined to the feedback stream and 2) Read more

How to use Trill for impression feedback (part 1) 

9 min read

On the Microsoft BingAds team, one of my primary responsibilities is the development and maintenance of the FastBI pipeline – the system responsible for all revenue coming from the Bing search engine. We have been working on streaming technologies for the last five years, combining the scale and stability of our internal Cosmos compute platform Read more

ONNX Runtime: a one-stop shop for machine learning inferencing 

3 min read

Organizations that want to leverage AI at scale must overcome a number of challenges around model training and model inferencing. Today, there are a plethora of tools and frameworks that accelerate model training but inferencing remains a tough nut due to the variety of environments that models need to run in. For example, the same Read more

2 Comments

Trill 102: Temporal Joins 

5 min read

This post is the second in a sequence intended to introduce developers to the Trill streaming query engine, its programming model, and its capabilities. We introduced in the previous post the concept of snapshot semantics for temporal query processing. Here, we go deeper into the mechanics of snapshot semantics by showing its impact on one Read more