<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Microsoft Open Source Blog</title>
	<atom:link href="https://cloudblogs.microsoft.com/opensource/feed/" rel="self" type="application/rss+xml" />
	<link>https://cloudblogs.microsoft.com/opensource</link>
	<description>Open dialogue about openness at Microsoft – open source, standards, interoperability</description>
	<lastBuildDate>Tue, 07 Sep 2021 22:23:37 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>ONNX Runtime Web—running your machine learning model in browser</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/</link>
		
		<dc:creator><![CDATA[Emma Ning, Yulong Wang and Du Li]]></dc:creator>
		<pubDate>Thu, 02 Sep 2021 16:00:20 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Browser]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[ONNX Runtime]]></category>
		<category><![CDATA[Web inference]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/</guid>

					<description><![CDATA[<p>We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.js, with improvements such as a more consistent developer<span><a class="read-more" aria-label="Read more about ONNX Runtime Webrunning your machine learning model in browser" href="https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/" data-bi-cn="Read more about ONNX Runtime Webrunning your machine learning model in browser">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/">ONNX Runtime Web—running your machine learning model in browser</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>We are introducing <a href="https://github.com/microsoft/onnxruntime/tree/master/js/web#readme" target="_blank" rel="noopener">ONNX Runtime Web</a> (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.js, with improvements such as a more consistent developer experience between packages for server-side and client-side inferencing and improved inference performance and model coverage. This blog gives you a quick overview of ORT Web, as well as getting started resources for trying it out.</p><h2><strong>A glance at ONNX Runtime (ORT)</strong></h2><p><a href="https://onnxruntime.ai/">ONNX Runtime</a> is a high-performance cross-platform inference engine to run all kinds of machine learning models. It supports all the most popular training frameworks including TensorFlow, PyTorch, SciKit Learn, and more. ONNX Runtime aims to provide an easy-to-use experience for AI developers to run models on various hardware and software platforms. Beyond accelerating server-side inference, ONNX Runtime for Mobile is available since ONNX Runtime 1.5. Now ORT Web is a new offering with the ONNX Runtime 1.8 release, focusing on in-browser inference.</p><h2><strong>In-browser inference with ORT Web</strong></h2><p>Running machine-learning-powered web applications in browsers has drawn a lot of attention from the AI community. It is challenging to make native AI applications portable to multiple platforms given the variations in programming languages and deployment environments. Web applications can easily enable cross-platform portability with the same implementation through the browser. Additionally, running machine learning models in browsers can accelerate performance by reducing server-client communications and simplify the distribution experience without needing any additional libraries and driver installations.</p><h3>How does it work?</h3><p>ORT Web accelerates model inference in the browser on both CPUs and GPUs, through WebAssembly (WASM) and WebGL backends separately. For CPU inference, ORT Web compiles the native ONNX Runtime CPU engine into the WASM backend by using Emscripten. WebGL is a popular standard for accessing GPU capabilities and adopted by ORT Web for achieving high performance on GPUs.</p><img loading="lazy" alt="Figure 1: ORT Web Overview" width="721" height="301" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/08/Figure-1.png"><p><em><strong>Figure 1:</strong> ORT web overview.</em></p><h3>WebAssembly (WASM) backend for CPU</h3><p>WebAssembly allows you to use server-side code on the client-side in the browser. Before WebAssembly only JavaScript was available in the browser. There are some advantages of WebAssembly compared to JavaScript such as faster load time and execution efficiency. Furthermore, WebAssembly supports multi-threading by utilizing SharedArrayBuffer, Web Worker, and <a href="https://github.com/WebAssembly/simd" target="_blank" rel="noopener">SIMD128</a> (128-bits Single Instruction Multiple Data) to accelerate bulk data processing. This makes WebAssembly an attractive technique to execute the model at near-native speed on the web.</p><p>We leverage <a href="https://emscripten.org/" target="_blank" rel="noopener">Emscripten</a>, an open-source compiler toolchain, to compile ONNXRuntime C++ code into WebAssembly so that they can be loaded in browsers. This allows us to reuse the ONNX Runtime core and native CPU engine. By doing that ORT Web WASM backend can run any ONNX model and support most functionalities native ONNX Runtime offers, including full ONNX operator coverage, <a href="https://www.onnxruntime.ai/docs/how-to/quantization.html" target="_blank" rel="noopener">quantized ONNX model</a>, and <a href="https://onnxruntime.ai/docs/how-to/mobile/custom-build.html" target="_blank" rel="noopener">mini runtime</a>. We utilize multi-threading and features in WebAssembly to further accelerate model inferencing. Note that SIMD is a new feature and isn&rsquo;t yet available in all browsers with WebAssembly support. The browsers supporting new WebAssembly features could be found on the <a href="https://webassembly.org/roadmap/" target="_blank" rel="noopener">webassembly.org</a> website.</p><p>During initialization, ORT Web checks the capabilities of the runtime environment to detect whether multi-threading and SIMD features are available. If not, there is a fallback version based on the environment. Taking Mobilenet V2 as an example, the CPU inference performance can be accelerated by 3.4x with two threads together with SIMD enabled, comparing the pure WebAssembly without enabling these two features.</p><p><strong><img loading="lazy" alt="Figure 2: 3.4x performance acceleration on CPU with multi-threading and SIMD enabled in WebAssembly (Test machine: Processor Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s))" width="1024" height="112" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/09/Figure-2-1024x112.webp"></strong></p><p><em><strong>Figure 2:</strong> 3.4x performance acceleration on CPU with multi-threading and SIMD enabled in WebAssembly (Test machine: Processor Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s)).</em></p><h3>WebGL backend for GPU</h3><p>WebGL is a JavaScript API that conforms to OpenGL ES 2.0 standard, which is supported by all major browsers and on various platforms including Windows, Linux, macOS, Android, and iOS. The GPU backend of ORT Web is built on WebGL and works with a variety of supported environments. This enables users to seamlessly port their deep learning models across different platforms.</p><p>In addition to portability, the ORT WebGL backend offers superior inference performance by deploying the following optimizations: pack mode, data cache, code cache, and node fusion. Pack mode reduces up to 75 percent memory footprint while improving parallelism. To avoid creating the same GPU data multiple times, ORT Web reuses as much GPU data (texture) as possible. WebGL uses OpenGL Shading Language (GLSL) to construct shaders to execute GPU programs. However, shaders must be compiled at runtime, introducing unacceptably high overhead. The code cache addresses this issue by ensuring each shader will be compiled only once. WebGL backend is capable of quite a few typical node fusions and has plans to take advantage of the graph optimization infrastructure to support a large collection of graph-based optimizations.</p><p>All ONNX operators are supported by the WASM backend but a subset by the WebGL backend. You can get <a href="https://github.com/microsoft/onnxruntime/tree/master/js/web#operators" target="_blank" rel="noopener">supported operators by each backend</a>. And below are the compatible platforms that each backend supports in ORT Web.</p><img loading="lazy" alt="Figure 3: Compatible platforms that ORT Web supports." width="714" height="240" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/09/Figure-3.jpg"><p><em><strong>Figure 3:</strong> Compatible platforms that ORT Web supports.</em></p><h2><strong>Get started</strong></h2><p>In this section, we'll show you how you can incorporate ORT Web to build machine-learning-powered web applications.</p><h3>Get an ONNX model</h3><p>Thanks to the framework interoperability of ONNX, you can convert a model trained in <a href="https://github.com/onnx/tutorials#converting-to-onnx-format" target="_blank" rel="noopener">any framework supporting ONNX</a> to ONNX format. <a href="https://pytorch.org/docs/stable/onnx.html" target="_blank" rel="noopener">Torch.onnx.export</a> is the built-in API in PyTorch for model exporting to ONNX and <a href="https://github.com/onnx/tensorflow-onnx" target="_blank" rel="noopener">Tensorflow-ONNX</a> is a standalone tool for TensorFlow and TensorFlow Lite to ONNX model conversion. Also, there are various pre-trained ONNX models covering common scenarios in the <a href="https://github.com/onnx/models" target="_blank" rel="noopener">ONNX Model Zoo</a> for a quick start.</p><h3>Inference ONNX model in the browser</h3><p>There are two ways to use ORT-Web, through a <a href="https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/quick-start_onnxruntime-web-script-tag" target="_blank" rel="noopener">script tag</a> or <a href="https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/quick-start_onnxruntime-web-bundler" target="_blank" rel="noopener">a bundler</a>. The APIs in ORT Web to score the model are similar to the native ONNX Runtime, first creating an ONNX Runtime inference session with the model and then running the session with input data. By providing a consistent development experience, we aim to save time and effort for developers to integrate ML into applications and services for different platforms through ONNX Runtime.</p><p>The following code snippet shows how to call ORT Web API to inference a model with different backends.</p><pre>const ort = require('onnxruntime-web');// create an inference session, using WebGL backend. (default is 'wasm')const session = await ort.InferenceSession.create('./model.onnx', { executionProviders: ['webgl'] });// feed inputs and runconst results = await session.run(feeds);</pre><p><em><strong>Figure 4:</strong> Code snippet of ORT Web APIs.</em></p><p>Some advanced features can be configured via setting properties of object <a href="https://github.com/microsoft/onnxruntime/blob/master/js/common/lib/env.ts" target="_blank" rel="noopener">`ort.env`</a>, such as setting the maximum thread number and enabling/disabling SIMD.</p><pre>// set maximum thread number for WebAssembly backend. Setting to 1 to disable multi-threadsort.wasm.numThreads = 1;// set flag to enable/disable SIMD (default is true)ort.wasm.simd = false;</pre><p><em><strong>Figure 5:</strong> Code snippet of properties setting in ORT Web.</em></p><p>Pre- and post-processing needs to be handled in JS before inputs are fed into ORT Web for inference. <a href="https://microsoft.github.io/onnxruntime-web-demo/#/" target="_blank" rel="noopener">ORT Web Demo</a> shows several interesting In-Browser vision scenarios powered by image models with ORT Web. You can find the <a href="https://github.com/microsoft/onnxruntime-web-demo/" target="_blank" rel="noopener">code source</a> including image input processing and inference through ORT Web. Another <a href="https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/4-Applied" target="_blank" rel="noopener">E2E tutorial</a> is created by the Cloud Advocate curriculum team about building a Cuisine Recommender Web App with ORT Web. It goes through exporting a Scikit-Learn model to ONNX as well as running this model with ORT Web using script tag.</p><img loading="lazy" alt="Figure 6: A cuisine recommender web app with ORT Web." width="1020" height="413" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/08/Figure-6.png"><p><em><strong>Figure 6:</strong> A cuisine recommender web app with ORT Web.</em></p><h2><strong>Looking forward</strong></h2><p>We hope this has inspired you to try out ORT Web in your web applications. We would love to hear your suggestions and feedback. You can participate or leave comments in our GitHub repos (<a href="https://github.com/microsoft/onnxruntime" target="_blank" rel="noopener">ONNX Runtime</a>). We continue to work on and improve the performance, model coverage as well as adding new features. On-device training is another interesting possibility we want to research for ORT Web. Stay tuned for our updates.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/">ONNX Runtime Web&mdash;running your machine learning model in browser</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Join Microsoft and the Spring community at SpringOne 2021</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/08/27/join-microsoft-and-the-spring-community-at-springone-2021/</link>
		
		<dc:creator><![CDATA[Nate Ceres]]></dc:creator>
		<pubDate>Fri, 27 Aug 2021 15:00:26 +0000</pubDate>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[Spring]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/08/27/join-microsoft-and-the-spring-community-at-springone-2021/</guid>

					<description><![CDATA[<p>Microsoft loves Spring, and we'd love to see you at SpringOne during September 1  2, 2021. Join us for Spring on Azure announcements and attend keynotes, sessions, and hands-on workshops over a two-day, all-virtual event. Spring is an important part of the Java ecosystem and we've been working together with VMware to make it<span><a class="read-more" aria-label="Read more about Join Microsoft and the Spring community at SpringOne 2021" href="https://cloudblogs.microsoft.com/opensource/2021/08/27/join-microsoft-and-the-spring-community-at-springone-2021/" data-bi-cn="Read more about Join Microsoft and the Spring community at SpringOne 2021">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/27/join-microsoft-and-the-spring-community-at-springone-2021/">Join Microsoft and the Spring community at SpringOne 2021</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Microsoft loves Spring, and we'd love to see you at <a href="https://springone.io/" target="_blank" rel="noopener">SpringOne</a> during September 1 2, 2021. Join us for Spring on Azure announcements and <a href="https://techcommunity.microsoft.com/t5/apps-on-azure/join-microsoft-at-springone-2021/ba-p/2677087" target="_blank" rel="noopener">attend keynotes, sessions, and hands-on workshops</a> over a two-day, all-virtual event.</p><p>Spring is an important part of the Java ecosystem and we've been working together with VMware to make it easier for developers to run Spring applications. We started back in early 2016 with our Spring on Azure integrations with Pivotal. The developer response from our customers was very positive, and they asked us if we could address other challenges with running Spring apps at scale such as infrastructure management, dynamic scaling, monitoring and observability, application lifecycle management, and more.</p><p>In 2018 we started working on Azure Spring Cloud, a fully managed service for Spring Boot applications that lets developers focus on their apps. We announced the preview at SpringOne in 2019 and we've continued to improve the service and experience with new capabilities. This year at SpringOne, you'll see even more innovation to help enterprise developers run their Spring applications. Microsoft has our biggest presence ever at SpringOne this year and we're excited to share a range of sessions across topics, all in close partnership with VMware.</p><p>For more information, visit the <a href="https://techcommunity.microsoft.com/t5/apps-on-azure/join-microsoft-at-springone-2021/ba-p/2677087" target="_blank" rel="noopener">Tech Community blog</a>.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/27/join-microsoft-and-the-spring-community-at-springone-2021/">Join Microsoft and the Spring community at SpringOne 2021</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Introducing Distributed Data Parallel support on PyTorch Windows</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/08/04/introducing-distributed-data-parallel-support-on-pytorch-windows/</link>
		
		<dc:creator><![CDATA[Chester Liu]]></dc:creator>
		<pubDate>Wed, 04 Aug 2021 15:00:19 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[PyTorch]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/08/04/introducing-distributed-data-parallel-support-on-pytorch-windows/</guid>

					<description><![CDATA[<p>Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can't really do anything about it. If you have the luxury (especially at this moment of time) of having multiple GPUs, you are likely to find<span><a class="read-more" aria-label="Read more about Introducing Distributed Data Parallel support on PyTorch Windows" href="https://cloudblogs.microsoft.com/opensource/2021/08/04/introducing-distributed-data-parallel-support-on-pytorch-windows/" data-bi-cn="Read more about Introducing Distributed Data Parallel support on PyTorch Windows">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/04/introducing-distributed-data-parallel-support-on-pytorch-windows/">Introducing Distributed Data Parallel support on PyTorch Windows</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can't really do anything about it. If you have the luxury (especially at this moment of time) of having multiple GPUs, you are likely to find Distributed Data Parallel (DDP) helpful in terms of model training. DDP performs model training across multiple GPUs, in a transparent fashion. You can have multiple GPUs on a single machine, or multiple machines separately. DDP can utilize all the GPUs you have to maximize the computing power, thus significantly shorten the time needed for training.</p><p>For a reasonably long time, DDP was only available on Linux. This was changed in PyTorch 1.7. In PyTorch 1.7 the support for DDP on Windows was introduced by Microsoft and has since then been continuously improved. In this article, we'd like to show you how it can help with the training experience on Windows.</p><h2>Walkthrough</h2><p>For reference, we'll set up two machines with the same spec on Azure, with one being Windows and the other being Linux, then perform model training with the same code and dataset.</p><p>We use this very nice resource in Azure called a Data Science Virtual Machine (DSVM). This is a handy VM image with a lot of machine learning tools preinstalled. At the time of writing, PyTorch 1.8.1(Anaconda) is included in the DSVM image, which will be what we use for demonstration.</p><p>You can search directly for this resource:</p><img loading="lazy" alt="Create a resource" width="612" height="251" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Image-1.png"><p>You can also follow the normal VM creation process and choose the desired DSVM image:</p><img loading="lazy" alt="Instance details" width="785" height="248" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Image-2.png"><p>In this article, we use the size "Standard NC24s_v3", which puts four NVIDIA Tesla V100 GPUs at our disposal.</p><p>To better understand how DDP works, here are some basic concepts we need to learn first.</p><p>One important concept we need to understand is "process group", which is the fundamental tool that powers DDP. A process group is, as the name suggests, a group of processes. Each of the processes is responsible for the training workload of one dedicated GPU. Additionally, we need some method to coordinate the group of processes (more importantly, the GPUs behind them), so that they can communicate with each other. This is called "backend" in PyTorch (&ndash;dist-backend in the script parameter). In PyTorch 1.8 we will be using Gloo as the backend because NCCL and MPI backends are currently not available on Windows. See the PyTorch documentation to find <a href="https://pytorch.org/docs/stable/distributed.html" target="_blank" rel="noopener">more information about "backend"</a>. And finally, we need a place for the backend to exchange information. This is called "store" in PyTorch (&ndash;dist-url in the script parameter). See the PyTorch documentation to find out <a href="https://pytorch.org/docs/stable/distributed.html#torch.distributed.Store" target="_blank" rel="noopener">more about "store"</a>.</p><p>Other concepts that might be a bit confusing are "world size" and "rank". World size is essentially the number of processes participating in the training job. As we mentioned before, each process is responsible for one dedicated GPU. Thus, world size also equals to the total number of GPUs used. Pretty straightforward, right? Now let's talk about "rank". Rank can be seen as an index number of each process, which can be used to identify one specific process. Note that a process with rank 0 is always needed because it will act like the "controller" which coordinates all the processes. If the process with rank 0 doesn't exist, the entire training is a no-go.</p><p>With the necessary knowledge in our backpack, let's get started with the actual training. We use a small subset of ImageNet 2012 as the dataset. Let's assume we have downloaded and placed our dataset at some location in the filesystem, we'll use &ldquo;D:\imagenet-small&rdquo; for this demonstration.</p><p>Obviously, we also need a training script. We use the imagenet training script from <a href="https://github.com/pytorch/examples/tree/master/imagenet" target="_blank" rel="noopener">PyTorch Examples repo</a> and ResNet50 as the target model. The training script here can be seen as a normal training script, plus the DDP power provided packages like &ldquo;torch.distributed&rdquo; and &ldquo;torch.multiprocessing&rdquo;. The script doesn't contain too much logic and you can easily set up your own script based on it. You can also refer to this <a href="https://pytorch.org/tutorials/intermediate/ddp_tutorial.html" target="_blank" rel="noopener">Getting Started tutorial</a> for more inspiration.</p><p>On a single machine, we can simply use FileStore which is easier to set up. The complete command looks like this:</p><pre>&gt; python main.py D:\imagenet-small --arch resnet50 --dist-url file:///D:\pg --dist-backend gloo --world-size 1 --multiprocessing-distributed --rank 0</pre><p>You probably noticed that we are using "world-size 1" and "rank 0". This is because the script will calculate the desired world size and rank based on the available GPUs. Here the actual world size used is the same as the number of GPUs available, which is four. The rank of each process will also be automatically assigned with the correct number, starting from zero.</p><p>If you're not a fan of command-line arguments, you can also use environment variables to initialize the DDP arguments. This might be helpful if you need to automate the deployment. More details can be found in the <a href="https://pytorch.org/docs/stable/distributed.html" target="_blank" rel="noopener">"Environment variable initialization" section</a> of the PyTorch documentation.</p><p>If everything goes well, the training job will start shortly after.</p><h2>Troubleshooting</h2><p>If something doesn't go well, here are some troubleshooting tips that might be helpful:</p><ul><li>If you're using FileStore on Windows, make sure the file used is not locked by other processes, which can happen if you forcefully kill the training processes. This can lead to freezing of the DDP training process, because the script fails to initialize the FileStore. A workaround is to manually kill previous training processes and delete the file before you conduct the next training.</li><li>If you're using TcpStore, make sure the network is accessible and the port is in fact available. Otherwise, the training may freeze because the script fails to initialize the TcpStore. The process with rank zero will bind and listen on the port you provided. Other processes will try to connect to that port. You can use network monitoring tools like "netstat" to help debugging the TCP connection issue.</li><li>You can use tools like nvidia-smi to monitor the GPU load while performing the training. Ideally, we want all the GPUs fully utilized and running at 100 percent usage. If you find that the GPU load is low, you may want to increase the batch size and/or the number of DataLoader workers.</li><li>Be aware that the number of GPUs used in DDP also affects the effective batch size. For example, if we use 128 as batch size on a single GPU, and then we switch to DDP with two GPUs. We have two options: a) split the batch and use 64 as batch size on each GPU; b) use 128 as batch size on each GPU and thus resulting in 256 as the effective batch size. Besides the limitation of the GPU memory, the choice is mostly up to you. You can tweak the script to choose either way. Remember to also adjust the initial learning rate if you choose option b) and expect a similar training result.</li></ul><h2>Benchmark</h2><p>Back to our benchmarking mission. First, we tried to perform the training without using DDP to establish a baseline. Then we tried the DDP setup with two GPUs, then finally with four GPUs. These are the results:</p><table width="444"><tbody><tr><td width="136"><strong>Duration</strong></td><td width="146"><strong>1 GPU (No DDP)</strong></td><td width="78"><strong>2 GPUs</strong></td><td width="84"><strong>4 GPUs</strong></td></tr><tr><td width="136"><strong>Linux</strong></td><td width="146">56m 58s</td><td width="78">31m 7s</td><td width="84">17m 20s</td></tr><tr><td width="136"><strong>Windows</strong></td><td width="146">58m 55s</td><td width="78">31m 55s</td><td width="84">19m 3s</td></tr></tbody></table><p>&nbsp;</p><p>To better visualize it, we plot it as the chart below:</p><img loading="lazy" alt="Training Duration for 1GPU is slower than for 2 GPUs. 4 GPUs is fastest." width="603" height="318" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Image-3-1024x540.webp"><p>As we can see from the data, the acceleration from additional GPUs meets our overall expectations. Using two GPUs cuts training duration to almost half. And using four GPUs makes it nearly one-quarter.</p><p>In terms of accuracy, here's the loss curve we see on both Windows and Linux:</p><img loading="lazy" alt="Training with 4 GPUs reaches accuracy threshold much faster than with 2 GPUs or with 1 GPU (No DDP)" width="1024" height="304" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Image-4-1024x304.webp"><p>We can tell from the loss curve that the shortening of training time does not end up with a bad training result. We can still expect the model to be gradually trained over time.</p><p>This is of course only a small demonstration of how DDP on Windows can bring users a performance boost that is comparable to the one on Linux, without compromising accuracy. We at Microsoft are working closely with PyTorch team to keep improving the PyTorch experience on Windows. The support of DDP on Windows is a huge leap ahead in terms of training performance. We'd like to encourage people to try it and we'd love to hear your feedback.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/04/introducing-distributed-data-parallel-support-on-pytorch-windows/">Introducing Distributed Data Parallel support on PyTorch Windows</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Microsoft expands support with The Eclipse Foundation</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/08/03/microsoft-expands-support-with-the-eclipse-foundation/</link>
		
		<dc:creator><![CDATA[Stephen Walli]]></dc:creator>
		<pubDate>Tue, 03 Aug 2021 15:00:42 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Java]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/08/03/microsoft-expands-support-with-the-eclipse-foundation/</guid>

					<description><![CDATA[<p>At Microsoft, our goal is to empower all developers to be successful building any application, using any language, on any platform. To do so, we are committed to building open, flexible technology, and to working together with the open source community to grow together as an industry. Microsoft has worked with the Eclipse community for<span><a class="read-more" aria-label="Read more about Microsoft expands support with The Eclipse Foundation" href="https://cloudblogs.microsoft.com/opensource/2021/08/03/microsoft-expands-support-with-the-eclipse-foundation/" data-bi-cn="Read more about Microsoft expands support with The Eclipse Foundation">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/03/microsoft-expands-support-with-the-eclipse-foundation/">Microsoft expands support with The Eclipse Foundation</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>At Microsoft, our goal is to empower all developers to be successful building any application, using any language, on any platform. To do so, we are committed to building open, flexible technology, and to working together with the open source community to grow together as an industry.</p><p>Microsoft has worked with the Eclipse community for many years and <a href="https://devblogs.microsoft.com/visualstudio/microsoft-joins-the-eclipse-foundation/" target="_blank" rel="noopener">we joined the Eclipse Foundation</a> in 2016. Today I'm excited to share Microsoft is continuing to advance our support of the Eclipse Foundation AISBL by expanding our participation to a Strategic Member. I will also be joining the Foundation's board of directors and I look forward to working alongside our industry partners.</p><p>The Eclipse Foundation has a long history of providing a strong, collaborative culture supporting open-source-licensed projects, and many of those projects are important to Microsoft, our partners, and our customers. It is important for Microsoft to support the organization that supports those projects, and to work within the organization towards those collective goals.</p><h2>Collaboration in community</h2><p>The Eclipse Foundation has deep expertise in vendor-neutral governance, infrastructure, marketing, community building, and developer advocacy work. The team showed initiative and forethought, and pivoted to become a European-based international non-profit organization to align with its membership. The Eclipse Foundation is a natural place for Microsoft to collaborate on new initiatives beginning with European partners.</p><p>The Eclipse Foundation remains a vital cornerstone of the Java ecosystem. Microsoft is committed to Java developers and the health of the Java ecosystem, actively participating in Eclipse Adoptium (formerly AdoptOpenJDK) and other projects. Expanding our involvement with the Foundation as a Strategic Member will help advance modern Java initiatives in the spirit of open source.</p><p>The Eclipse Foundation also has close ties with core parts of the Java community with the Eclipse IDE, Jakarta EE (the successor to Java EE), and MicroProfile projects hosted there. For Microsoft and its partners, the Eclipse Foundation was the logical choice for AdoptOpenJDK to continue that mission. As a vendor-neutral, multi-vendor initiative, Eclipse Adoptium continues to be a leading provider of fully compatible, high-quality distributions of Java runtimes based on OpenJDK source code.</p><p>The Eclipse Foundation is expanding its role through working groups and many of these working groups are important to Microsoft and its partners. Recent work around the Eclipse Dataspace Connector and Eclipse Tractus-X are examples of new work beginning at the Eclipse Foundation in working groups in which Microsoft has an interest in participating.</p><h2>Looking ahead</h2><p>Open source non-profits serve an important role in the community, providing the structure to enable projects to reach their next opportunity of growth. Companies support such non-profits because it supports their engagement in the non-profit's projects, supports relationships with collaborating partners, and supports the developer communities at large. Having a rich ecosystem of healthy non-profits supporting different groups of Open-Source-Initiative-licensed projects and their project ecosystems is a must. At Microsoft, we are committed to continuing to support and participate across the non-profit ecosystem, as well as engage in projects themselves.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/08/03/microsoft-expands-support-with-the-eclipse-foundation/">Microsoft expands support with The Eclipse Foundation</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Accelerate PyTorch training with torch-ort</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/</link>
		
		<dc:creator><![CDATA[Natalie Kershaw]]></dc:creator>
		<pubDate>Tue, 13 Jul 2021 16:00:30 +0000</pubDate>
				<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[ONNX Runtime]]></category>
		<category><![CDATA[PyTorch]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/</guid>

					<description><![CDATA[<p>With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.ORTModule, to accelerate distributed training of PyTorch models, reducing the time and resources needed<span><a class="read-more" aria-label="Read more about Accelerate PyTorch training with torch-ort" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/" data-bi-cn="Read more about Accelerate PyTorch training with torch-ort">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/">Accelerate PyTorch training with torch-ort</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice.</p><p>Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.ORTModule, to accelerate distributed training of PyTorch models, reducing the time and resources needed for training. To provide flexibility for the developer, torch-ort is available for both NVIDIA and AMD GPUs. The torch-ort package can be used with other deep learning optimizers like DeepSpeed to provide additional performance gains on training tasks.</p><p>Delivered via the torch-ort package from <a href="https://github.com/pytorch/ort" target="_blank" rel="noopener">https://github.com/pytorch/ort</a>, the ORTModule class is a simple wrapper for torch.nn.Module. ORTModule supports transformer models such as the GPT and BERT series, with support for other modalities to come soon. Today, you can fine-tune the most popular language models with a labeled dataset for a target task; augment self-supervised training of a model with a specific corpus, or experiment with pre-training new models from scratch.</p><h2>Performance</h2><p>As well as using torch-ort for large workloads inside Microsoft, we have benchmarked fine-tuning the most popular HuggingFace models, showing up to 37 percent improvement in throughput, with ORTModule alone, and up to 86 percent when combined with DeepSpeed.</p><img loading="lazy" alt="chart, bar chart" width="900" height="562" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/DeBERTa-fine-tuning.png"><img loading="lazy" alt="chart, bar chart" width="902" height="564" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/GPT-2-fine-turning.png"><p>These experiments were run on Azure's world-class Azure ND <a href="https://azure.microsoft.com/en-us/blog/azure-announces-general-availability-of-scaleup-scaleout-nvidia-a100-gpu-instances-claims-title-of-fastest-public-cloud-super/" target="_blank" rel="noopener">A100</a> v4 Infrastructure, with its optimal bandwidth between GPUs inside and across machines.</p><p>The graphs above show throughput in training samples per second. The actual time your training job takes depends on the number of training samples you have and the type of CPU/GPU you are using. Before starting the training processing, ORTModule does a one-time optimization of your model. This has a fixed cost that is amortized across the run.</p><p>The combination of ORTModule and DeepSpeed also enables <a href="https://askherefirst.com/" target="_blank" rel="noopener">Ask Here First</a> to train the 2.7B parameter GPT-Neo model for a custom natural language task, where previously this large model could not be trained on the available hardware. AskHereFirst, a spin-off by Columbia University, uses powerful AI-based natural language query solutions for structured data stores that can be used to dramatically simplify the search in a wide range of industries such as finance, media, marketing, and sports.</p><img loading="lazy" alt="Ask me here logo" width="446" height="218" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Ask-Here-First-logo_.png"><blockquote><p><em>Training GPT-NEO for our custom natural language task was not possible before we employed ORTModule and DeepSpeed. We have now produced fine-tuned 2.7B parameter GPT-NEO models to map natural language inputs into structured queries for a number of our applications."</em>Professor Vishal Misra, Columbia University and founder, Ask Here First</p></blockquote><h2>Hardware portability</h2><p>There are different hardware platform options for running distributed training workloads. The torch_ort.ORTModule works with NVIDIA and AMD GPUs.</p><p>We are releasing the torch-ort package for NVIDIA using CUDA 10.2 or CUDA 11.1. This can be used to accelerate the PyTorch training execution on both NVIDIA GPUs on Azure or on a user's on-prem environment.</p><p>We are also releasing the preview package for torch-ort with ROCm 4.2 for use on AMD GPUs.</p><h2>Simple developer experience</h2><p>Getting started with ORTModule is simple. You download and install the <a href="https://pypi.org/project/torch-ort/" target="_blank" rel="noopener">torch-ort</a> package and wrap your model with ORTModule, as demonstrated in the following code example.</p><p>Your PyTorch training loop is unmodified except for wrapping the torch.nn.Module in ORTModule.</p><img loading="lazy" alt="lines of code" width="936" height="962" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/torch.ort_.png"><p>Because the PyTorch training loop is unmodified, ORTModule can be seamlessly integrated with other libraries in the PyTorch ecosystem, such as torch.autocast and NVIDIA apex.</p><h2>How does it work?</h2><p>On the first call to forward, two optimized computation graphs are generated: one for the forward prediction pass and one for the backward gradient calculation pass. All other parts of the training loop are executed by native PyTorch. The optimizations in these graphs, such as optimized kernels, subgraph operator fusion, and reduction of memory copies between CPU and GPU provide the speed up.</p><h2>For more information</h2><p>Go for a <a href="https://aka.ms/torch-ort-deepdive" target="_blank" rel="noopener">technical deep dive into ORTModule</a> and read about our <a href="https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/" target="_blank" rel="noopener">partnership with AMD</a>. Also, see <a href="https://github.com/pytorch/ort" target="_blank" rel="noopener">documentation, samples, or reach out</a> to the team.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/">Accelerate PyTorch training with torch-ort</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/</link>
		
		<dc:creator><![CDATA[Weixing Zhang and Suffian Khan]]></dc:creator>
		<pubDate>Tue, 13 Jul 2021 16:00:14 +0000</pubDate>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Azure Marketplace]]></category>
		<category><![CDATA[ONNX Runtime]]></category>
		<category><![CDATA[PyTorch]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/</guid>

					<description><![CDATA[<p>This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in<span><a class="read-more" aria-label="Read more about ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm Open Software Platform" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/" data-bi-cn="Read more about ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm Open Software Platform">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/">ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><em>This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD.</em></p><img loading="lazy" alt="ONNX Runtime and AMD logo side by side" width="812" height="159" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/ONNX-Runtime-AMD-ROCm_logo_update.png"><p><a href="https://microsoft.github.io/onnxruntime/" target="_blank" rel="noopener">ONNX Runtime</a> is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in release 1.8.1 featuring support for AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> GPUs facilitated by the AMD ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> open software platform. Users can now use AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> GPUs with ONNX Runtime to accelerate distributed training for large-scale DNN models. AMD ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> becomes the latest ONNX Runtime execution provider, continuing the Microsoft mission to endorse choice and versatility in targeting different compute devices and server platforms.</p><img loading="lazy" alt="Selection interface showing AMD GPU support" width="624" height="245" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Get-started-easily.png"><p><em>Figure 1: Selection interface showing <a href="https://www.onnxruntime.ai/" target="_blank" rel="noopener">AMD GPU support</a>.</em></p><h2>The ROCm Open Software Platform</h2><p>ROCm is AMD&rsquo;s open software platform for GPU-accelerated high-performance computing and machine learning workloads. Since the first ROCm release in 2016, the ROCm platform has evolved to support additional math, AI and machine learning, and communication libraries and tools, a wider set of Linux distributions, and a range of new GPUs. This includes the AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> MI100 GPU, the first AMD data center accelerator based on the compute-optimized AMD CDNA<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> architecture.</p><p>The primary focus of ROCm has been high-performance computing at scale. The combined capabilities of ROCm and the AMD Instinct family of data center accelerators are well suited to accelerate AI/ML training using ONNX Runtime.</p><h2>Accelerated training with ONNX Runtime on AMD GPUs</h2><p>Large transformer models like GPT2 have proven themselves state of the art in natural language processing (NLP) tasks like NLP understanding, generation, and translation. They are also proving useful in applications like time-series prediction and computer vision. Due to their size, these models need to be trained in a largescale, distributed GPU environment. ONNX Runtime, with support from AMD (rocBLAS, MIOpen, hipRAND, and RCCL) libraries, enables users to train large transformer models in mixedprecision in a distributed AMD GPU environment. Thus, ONNX Runtime on ROCm supports training state-of-art models like BERT, GPT-2, T5, BART, and more using AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> GPUs. Data scientists, researchers, students, and others in the community have an option to accelerate workloads using ONNX Runtime on AMD GPUs. This includes <a href="https://www.amd.com/en/products/server-accelerators/instinct-mi100" target="_blank" rel="noopener">AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> MI100</a>, AMD Radeon Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> MI50, and AMD Radeon<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> Pro VII GPUs.</p><p>Today, we are happy to announce the preview of Python<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> packages supporting ONNX Runtime on ROCm, making it easy to get started with ROCm and ONNX Runtime.</p><h3>Training performance acceleration</h3><p>In this preview, we have demonstrated clear performance gains with ONNX runtime using AMD GPUs for fine-tuning GPT2 using <a href="https://github.com/huggingface/transformers" target="_blank" rel="noopener">HuggingFace</a> on eight AMD Instinct<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> MI100 GPUs. We see an 18 percent performance gain in these experiments relative to standalone PyTorch along and validated well-matched loss curves.</p><img loading="lazy" alt="Using ONNX runtime gets 18 percent perf gains over stand-alone PyTorch" width="1536" height="1055" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Throughput-samples_.png"><p><em>Figure 2: Using ONNX runtime gets 18 percent perf gains over standalone PyTorch. Configuration details are listed below.</em></p><img loading="lazy" alt="chart, line chart, histogram" width="767" height="306" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Hugging-Face-training-loss.png"><p><em>Figure 3: Training loss comparing the PyTorch and PyTorch and ONNX Runtime experiments.</em></p><p>In general, the preview ONNX Runtime-ROCm library can be used in multi-node MI100 AMD GPU configurations with high-speed interconnects for inter-GPU communications. As we proceed to our official release, we expect users to see excellent performance across a wide range of Transformer models and ML/AI workloads, offering users a highly performant choice for their datacenter applications.</p><h3>Getting started with ONNX runtime on AMD GPUs</h3><h4>With Python packages</h4><p>In a ROCm enabled environment, users can get off to a quick start with a pip install:</p><p><a href="https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.rocm42.html" target="_blank" rel="noopener">https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.rocm42.html</a></p><p>Plugging their Pytorch script to ONNX runtime only* requires wrapping the model with</p><p>More details are available at <a href="https://github.com/pytorch/ort" target="_blank" rel="noopener">pytorch/ort: Accelerate PyTorch models with ONNX Runtime</a>.</p><p><em>*PyTorch model should use standard PyTorch to support export to ONNX.</em></p><h4>With Dockerfiles</h4><p>Users can also take advantage of a simple Dockerfile to get pre-configured packages of the ROCm libraries, Pytorch, and ONNX Runtime.</p><p>The stable ONNX runtime 1.8.1 release is now available at <a href="https://github.com/pytorch/ort/blob/main/docker/Dockerfile.ort-torch181-onnxruntime-nightly-rocm4.2-ubuntu18.04" target="_blank" rel="noopener">ort/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04 at main pytorch/ort</a></p><p>More details are available at <a href="https://github.com/pytorch/ort" target="_blank" rel="noopener">pytorch/ort</a>.</p><h2>More information about ONNX Runtime</h2><ul><li>Read our recent blog, &ldquo;<a href="https://cloudblogs.microsoft.com/opensource/2021/06/07/onnx-runtime-1-8-mobile-web-and-accelerated-training/" target="_blank" rel="noopener">ONNX Runtime 1.8: mobile, web, and accelerated training</a>,&rdquo; introducing the extended capabilities of ONNX runtime release 1.8.1.</li><li>Check out examples demonstrating accelerating <a href="https://github.com/microsoft/onnxruntime-training-examples" target="_blank" rel="noopener">large transformer models using ONNX runtime</a>.</li><li>View instructions for accelerating general PyTorch workloads using ONNX runtime at <a href="https://github.com/pytorch/ort" target="_blank" rel="noopener">Accelerate PyTorch models with ONNX Runtime</a>.</li></ul><h2>More Information about ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> Open Software Platform</h2><ul><li>A list of <a href="https://github.com/RadeonOpenCompute/ROCm" target="_blank" rel="noopener">ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> supported GPUs and operating systems</a>.</li><li><a href="https://rocmdocs.amd.com/en/latest/" target="_blank" rel="noopener">General documentation</a> on the ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> platform.</li><li><a href="https://developer.amd.com/resources/rocm-resources/rocm-learning-center/" target="_blank" rel="noopener">ROCm<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> Learning Center</a>.</li><li>General information on <a href="https://amd.com/hpc" target="_blank" rel="noopener">AMD's offerings for HPC and machine learning</a>.</li></ul><h3>Configuration and performance benchmarking</h3><h4>Hardware setup</h4><ul><li>Server Type: HPE Apollo 6500 Gen10 Plus</li><li>8 x <a href="https://www.amd.com/en/products/server-accelerators/instinct-mi100" target="_blank" rel="noopener">AMD Instinct MI100</a> with 2nd Gen Infinity Fabric Link (4 GPUs/ring) and PCIe Gen4 (across rings)</li><li>GPU Memory: 32 GB</li><li>CPU: 2 x <a href="https://www.amd.com/en/products/cpu/amd-epyc-7662" target="_blank" rel="noopener">AMD EPYC<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt=""> 7662 | AMD</a></li><li>Main Memory: 512 GB (HPE 32GB 2Rx4 PC4-3200AA-R)</li><li>SSD: HPE 1.92TB NVMe Read-Intensive Smart Carrier U.3 PE8010 SSD</li><li>Ethernet: Intel I350 1GbE 4-port BASE-T</li></ul><h4>HuggingFace configuration</h4><p><strong>Repository</strong></p><p><a href="https://github.com/microsoft/huggingface-transformers/tree/blog-commit" target="_blank" rel="noopener">HuggingFace Transformers</a> (branch <em>blog-commit</em>).</p><p><strong>Dockerfile</strong></p><p><a href="https://github.com/pytorch/ort/blob/main/docker/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04" target="_blank" rel="noopener">ort/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04</a></p><p><strong>HuggingFace GPT2</strong></p><p>The flag below enables wrapping with ONNX Runtime.</p><hr><p>Author information: Jeff Daily is a Principal Member of Technical Staff, Deep Learning Software for AMD. Weixing Zhang is a Principal Software Engineer, AI Frameworks at Microsoft. Suffian Khan is a Software Engineer, AI Frameworks at Microsoft.</p><p>Their postings are their own opinions and may not represent AMD's or Microsoft's positions, strategies or opinions. Links to third-party sites are provided for convenience and unless explicitly stated, neither AMD nor Microsoft is responsible for the contents of such linked sites and no endorsement is implied.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/">ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm&trade; Open Software Platform</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Simple steps to create scalable processes to deploy ML models as microservices</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/07/09/simple-steps-to-create-scalable-processes-to-deploy-ml-models-as-microservices/</link>
		
		<dc:creator><![CDATA[Elena Neroslavskaya]]></dc:creator>
		<pubDate>Fri, 09 Jul 2021 16:00:18 +0000</pubDate>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[Azure Kubernetes Service]]></category>
		<category><![CDATA[GPT-2]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[ONNX Runtime]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/07/09/simple-steps-to-create-scalable-processes-to-deploy-ml-models-as-microservices/</guid>

					<description><![CDATA[<p>This post was co-authored by Alejandro Saucedo, Director of Machine Learning Engineering at Seldon Technologies. About the co-author: Alejandro leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products with over five million installations. Alejandro is also the Chief Scientist at the Institute for Ethical AI<span><a class="read-more" aria-label="Read more about Simple steps to create scalable processes to deploy ML models as microservices" href="https://cloudblogs.microsoft.com/opensource/2021/07/09/simple-steps-to-create-scalable-processes-to-deploy-ml-models-as-microservices/" data-bi-cn="Read more about Simple steps to create scalable processes to deploy ML models as microservices">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/09/simple-steps-to-create-scalable-processes-to-deploy-ml-models-as-microservices/">Simple steps to create scalable processes to deploy ML models as microservices</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><strong><em>This post was co-authored by Alejandro Saucedo, Director of Machine Learning Engineering at Seldon Technologies.</em></strong></p><p><em>About the co-author: Alejandro leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products with over five million installations. Alejandro is also the Chief Scientist at the Institute for Ethical AI and Machine Learning, where he leads the development of industry standards on machine learning explainability, adversarial robustness, and differential privacy. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and has a strong track record building cross-functional teams of software engineers.</em></p><p>As organizations adopt machine learning in production, they face growing challenges that arise when the number of production machine learning models starts to increase. In this article, we provide a practical tutorial that will enable AI practitioners to leverage production-ready workflows to deploy their machine learning models at scale. More specifically, we will demonstrate the benefits of open source tools and frameworks like ONNX Runtime, Seldon Core, and HuggingFace, as well as how these can be integrated with Azure Kubernetes Services to achieve robust and scalable machine learning operations capabilities.</p><p>By the end of this blog post, you will have a simple, repeatable, and scalable process to deploy complex machine learning models. You will learn by example, deploying the OpenAI GPT-2 natural language processing (NLP) model as a fully-fledged microservice with real-time metrics, and robust monitoring capabilities. <span><span lang="EN-US" xml:lang="EN-US"><span>Try out </span></span></span><span><span lang="EN-US" xml:lang="EN-US"><span>our </span></span></span><span><span lang="EN-US" xml:lang="EN-US"><span></span></span></span><a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/triton_gpt2_example_azure.html" target="_blank" rel="noreferrer noopener"><span><span><span lang="EN-US" xml:lang="EN-US"><span>GPT-2 </span></span></span><span lang="EN-US" xml:lang="EN-US"><span>Azure AKS </span></span><span lang="EN-US" xml:lang="EN-US"><span>Deploy</span></span><span lang="EN-US" xml:lang="EN-US"><span>ment</span></span><span lang="EN-US" xml:lang="EN-US"><span> Notebook</span></span></span></a><span><span lang="EN-US" xml:lang="EN-US"><span> that </span></span></span><span><span lang="EN-US" xml:lang="EN-US"><span>demonstrates the </span></span></span><span><span lang="EN-US" xml:lang="EN-US"><span>full process.</span></span></span></p><p>The steps that will be carried out in this blog are outlined in the image below, and include the following:</p><ol><li>Fetch the pre-trained GPT2 Model using HuggingFace and export to ONNX.</li><li>Setup Kubernetes Environment and upload model artifact.</li><li>Deploy ONNX Model with Seldon Core to Azure Kubernetes Service.</li><li>Send inference requests to Kubernetes deployed GPT2 Model.</li><li>Visualize real-time monitoring metrics with Azure dashboards.</li></ol><img loading="lazy" alt="Placeholder" width="1024" height="162" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture1-60e459d1d317b-1024x162.webp"><p>Furthermore, the tools that we'll be using in this framework will be the following:</p><ul><li><a href="https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html" target="_blank" rel="noopener">Seldon Core</a>: A machine learning model deployment and monitoring framework for Kubernetes which will allow us to convert our model artifact into a scalable microservice with real-time metrics.</li><li><a href="https://www.onnxruntime.ai/" target="_blank" rel="noopener">ONNX Runtime</a>: An optimized runtime engine to improve the performance of model inference, which we'll be using to optimize and run our models.</li><li><a href="https://azure.microsoft.com/en-us/services/kubernetes-service/" target="_blank" rel="noopener">Azure Kubernetes Service (AKS)</a>: Azure's managed Kubernetes service, where we will be running the deployed machine learning models.</li><li><a href="https://docs.microsoft.com/en-us/azure/azure-monitor/overview" target="_blank" rel="noopener">Azure Monitor</a>: Azure's service for managed monitoring, where we will be able to visualize all the performance metrics.</li><li><a href="https://huggingface.co/gpt2" target="_blank" rel="noopener">HuggingFace</a>: An ecosystem for training and pre-trained transformer-based NLP models, which we will leverage to get access to the OpenAI GPT-2 model.</li></ul><p>Let's get started.</p><h2>1. Fetch the trained GPT-2 Model with HuggingFace and export to ONNX</h2><p><a href="https://openai.com/blog/better-language-models/" target="_blank" rel="noopener">GPT-2</a> is a popular NLP language model trained on a huge dataset that can generate human-like text. We will use Hugging Face's utilities to import the pre-trained GPT-2 tokenizer and model. First, we download the tokenizer as follows.</p><p>Now we can download the GPT2 Tensorflow model and export it to deploy it:</p><p>Finally, we convert it and optimize it for ONNX Runtime with the command below:</p><p>One of the main advantages of using the <a href="https://onnxruntime.ai/" target="_blank" rel="noopener">ONNX Runtime</a> is the high-performance inference capabilities and broad compatibility that it brings. The ONNX Runtime enables practitioners to use any machine learning framework of their choice, and convert it to the optimized Open Neural Network Exchange (ONNX) format. Once these models are converted, the ONNX Runtime can be used to deploy it to a variety of targets including desktop, IoT, mobile, and in our case Azure Kubernetes through Seldon Core.</p><img loading="lazy" alt="Placeholder" width="1024" height="474" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture9-1024x474.webp"><p>This framework has benefited from a broad range of rich benchmarks that showcase the high throughput and low latencies that can be achieved, which you can see in "<a href="https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333" target="_blank" rel="noopener">Accelerate your NLP pipelines using Hugging Face Transformers and ONNX Runtime</a>", as well as in "<a href="https://medium.com/microsoftazure/faster-and-smaller-quantized-nlp-with-hugging-face-and-onnx-runtime-ec5525473bb7" target="_blank" rel="noopener">Faster and smaller quantized NLP with Hugging Face and ONNX Runtime</a>".</p><img loading="lazy" alt="Placeholder" width="606" height="365" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture10.png"><h2>2. Setup Kubernetes environment and upload model artifact</h2><p><a href="https://docs.seldon.io/projects/seldon-core/en/latest/" target="_blank" rel="noopener">Seldon Core</a> is one of the leading open-source frameworks for machine-learning model deployment and monitoring at scale on Kubernetes. It allows machine learning practitioners to convert their trained model artifacts or machine learning model code into fully-fledged microservices. All models deployed with Seldon are enabled with advanced monitoring, robust promotion strategies, and scalable architectural patterns. We will be using Seldon Core to deploy our GPT-2 model.</p><img loading="lazy" alt="Placeholder" width="894" height="599" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture11.png"><p>Seldon provides out-of-the-box a broad range of <a href="https://docs.seldon.io/projects/seldon-core/en/stable/servers/overview.html" target="_blank" rel="noopener">Pre-Packaged Inference Servers</a> to deploy model artifacts to TFServing, Triton, ONNX Runtime, etc. It also provides <a href="https://docs.seldon.io/projects/seldon-core/en/stable/workflow/overview.html#two-types-of-model-servers" target="_blank" rel="noopener">Custom Language Wrappers</a> to deploy custom Python, Java, C++, and more. In this blog post, we will be leveraging the Triton Prepackaged server with the ONNX Runtime backend. In order to set up Seldon Core in you can follow <a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Seldon-Core-Setup" target="_blank" rel="noopener">Seldon core setup instructions</a>.</p><h3>Setting Up the Azure Kubernetes Environment</h3><p>The following diagram depicts our target architecture utilizing Azure Kubernetes Service (AKS)fully-managed Kubernetes service provided on Azure which removes the complexity of managing infrastructure and allows developers and data scientists to focus on machine learning models.</p><img loading="lazy" alt="diagram" width="2396" height="1334" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/seldon_.png"><p>We recommend creating an AKS cluster with three node pools and installed CSI driverrefer to the following notebook for the scripts which you can run yourself (<a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/triton_gpt2_example_azure_setup.html" target="_blank" rel="noopener">Azure Setup Notebook</a>):</p><ul><li><strong>CPU System Nodepool</strong>: running Kubernetes system components.</li><li><strong>CPU User Nodepool</strong>: running Seldon Operator and Istio components.</li><li><strong>GPU User Nodepool</strong>: running machine learning model inference with GPU hardware optimizations.</li><li><strong>Azure Blob CSI driver</strong> mounting Azure Storage Account for model hosting.</li></ul><h3>Upload model artifacts</h3><p>Seldon is able to automatically download your model artifacts from an object store, so we will start by uploading our model in Azure blob storage. To abstract away details of storage connection from SeldonDeployment, we will be able to use the PersistentVolume reference in our model manifest, which will be mounted with our model container. For details on setting up PersistentVolume for Azure Blob with Blob CSI driver refer to our example <a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/triton_gpt2_example_azure.html" target="_blank" rel="noopener">here</a>. We can then upload ONNX model file to Azure Blob following the default directory structure as per the <a href="https://github.com/triton-inference-server/server/blob/master/docs/model_repository.md#onnx-models" target="_blank" rel="noopener">Triton model repository format</a>:</p><img loading="lazy" alt="Placeholder" width="1024" height="54" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture13-1024x54.webp"><h2>3. Deploy to Kubernetes (AKS) with Seldon Core</h2><p>Now, we are ready to deploy our model using <a href="https://docs.seldon.io/projects/seldon-core/en/latest/servers/triton.html?highlight=Triton#triton-inference-server" target="_blank" rel="noopener">Seldon Core's Triton pre-packaged server</a>. For that we need to define and apply a SeldonDeployment for prepackaged Triton server Kubernetes manifest file as shown below:</p><img loading="lazy" alt="Placeholder" width="599" height="1024" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture14-599x1024.webp"><p>Some of the key attributes to notice:</p><ul><li>implementation field is set to `TRITON_SERVER`.</li><li>'model_url' points to PVC to download the model (&lt;pvc&gt;://&lt;name&gt;).</li><li>'name' field is GPT-2 to instruct what model to download.</li><li>'componentSpecs' override Pod spec fields such as limits/requests and tolerations to instruct Kubernetes scheduler to run the Pods on GPU nodes.</li><li>'protocol' field is using the widely adopted <a href="https://github.com/kubeflow/kfserving/blob/master/docs/predict-api/v2/required_api.md" target="_blank" rel="noopener">inference protocol</a><u>.</u></li><li>annotations direct Azure Monitor to scrape real-time metrics.</li></ul><p>Once you deploy it, you can verify the logs as follows:</p><h2>4. Run inference requests with your deployed model</h2><p>Now that we have deployed our model, we are able to perform text generation in real-time. This can be done by sending REST requests directly against our productionized model; however, we'll have to carry out a couple of steps first: namely tokenization of our input, then sending the request, and then decoding the resulting tokens. This is shown in detail below.</p><ul><li>Tokenize the input sentence using Hugging Face GPT-2 pre-trained tokenizer:</li></ul><ul><li>Now we can send the tokens by constructing the input payload:</li></ul><ul><li>Our GPT-2 model will return the probability distribution for the next token over the vocabulary for the input vector (<em>logits</em>). Following the "greedy" approach we decode the response to a string and append to the input sentence.</li></ul><ul><li>We repeat this to generate the full synthetic sentence:</li></ul><p>The full end-to-end implementation could be found in our <a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/triton_gpt2_example_azure.html" target="_blank" rel="noopener">GPT-2 Notebook</a>.</p><h2>5. Visualize monitoring metrics with Azure Monitor</h2><p>We are now able to visually monitor the real-time metrics generated by our Seldon Model by enabling <a href="https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-overview" target="_blank" rel="noopener">Azure Monitor Container Insights</a> in the AKS cluster. We can navigate to the insights blade page and check whether resources/limits configured for SeldonDeployment are within the healthy thresholds and monitor the changes in Memory or CPU during model inference.</p><img loading="lazy" alt="Placeholder" width="1024" height="520" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture15-1024x520.webp"><p>In addition to container metrics, we can collect the specialized metrics generated by Seldon Triton orchestrator. Azure Monitor Container Insights provides out-of-the-box ability to scrape Prometheus metrics from declared endpoints, no need to install and operate Prometheus server. To learn more see <a href="https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration" target="_blank" rel="noopener">Microsoft docsPrometheus metrics with Container insights</a>. For our case we can now visualize the real-time metrics with specialized dashboards as demonstrated in our <a href="https://docs.seldon.io/projects/seldon-core/en/latest/examples/triton_gpt2_example_azure.html" target="_blank" rel="noopener">GPT-2 Notebook</a> and the following example dashboard:</p><img loading="lazy" alt="Placeholder" width="1024" height="790" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/07/Picture16-1024x790.webp"><h2>Uncover repeatable ML deployment processes today</h2><p>In this blog, we have been able to cover a repeatable and scalable process to deploy a GPT-2 NLP model as a fully-fledged microservice with real-time metrics to enable observability and monitoring capabilities at scale using Seldon Core in Azure.</p><p>More specifically, we were able to:</p><ul><li>Fetch the trained GPT-2 Model with HuggingFace and export to ONNX.</li><li>Setup Kubernetes Environment and Upload Model Artifact.</li><li>Deploy ONNX Model with Seldon Core to Azure Kubernetes Service.</li><li>Send requests to generate text with deployed GPT-2 Model.</li><li>Visualize monitoring metrics with Azure dashboards.</li></ul><p>These workflows are continuously being refined and evolved through the Seldon Core open source project, and advanced state-of-the-art algorithms on outlier detection, concept drift, explainability, and more are improving continuouslyif you are interested in learning more or contributing please feel free to reach out. If you are interested in further hands-on examples of scalable deployment strategies of machine learning models, you can check out:</p><ul><li><a href="https://towardsdatascience.com/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance-d9b1d02ac158" target="_blank" rel="noopener">Production machine learning monitoring</a>: Outliers, drift, explainers, and statistical performance.</li><li>Real-time machine learning at scale using <a href="https://towardsdatascience.com/real-time-stream-processing-for-machine-learning-at-scale-with-spacy-kafka-seldon-core-6360f2fedbe" target="_blank" rel="noopener">SpaCy, Kafka, and Seldon Core</a>.</li><li><a href="https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html" target="_blank" rel="noopener">Seldon Core quick-start documentation</a>.</li></ul><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/07/09/simple-steps-to-create-scalable-processes-to-deploy-ml-models-as-microservices/">Simple steps to create scalable processes to deploy ML models as microservices</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Journey to optimize large scale transformer model inference with ONNX Runtime</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/</link>
		
		<dc:creator><![CDATA[Xiaoyu Liu, Eric Lin and Emma Ning]]></dc:creator>
		<pubDate>Wed, 30 Jun 2021 19:00:47 +0000</pubDate>
				<category><![CDATA[Deep Learning]]></category>
		<category><![CDATA[IntelliCode]]></category>
		<category><![CDATA[ONNX Runtime]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Transformer]]></category>
		<category><![CDATA[Visual Studio]]></category>
		<category><![CDATA[Visual Studio Code]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/</guid>

					<description><![CDATA[<p>"With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code." Large-scale transformer models, such as GPT-2 and GPT-3, are among the most<span><a class="read-more" aria-label="Read more about Journey to optimize large scale transformer model inference with ONNX Runtime" href="https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/" data-bi-cn="Read more about Journey to optimize large scale transformer model inference with ONNX Runtime">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/">Journey to optimize large scale transformer model inference with ONNX Runtime</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote><p><em>"With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code."</em></p></blockquote><p>Large-scale transformer models, such as GPT-2 and GPT-3, are among the most useful self-supervised transformer language models for natural language processing tasks such as language translation, question answering, passage summarization, text generation, and so on. After successfully shipping <a href="https://devblogs.microsoft.com/visualstudio/the-making-of-intellicodes-first-deep-learning-model-a-research-journey/" target="_blank" rel="noopener">the first deep learning model for IntelliCode completion</a>, our recent research effort brings GPT-C, a multi-layer generative decoder transformer architecture part of our <a href="https://www.microsoft.com/en-us/research/project/microsoft-deepdev/" target="_blank" rel="noopener">DeepDev transformer platform</a> for code and text from Developer Division (DevDiv) Data&amp;AI Applied Science team, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code.</p><p>To meet the need of computing power required by large-scale transformers, our initial aim was to deploy the GPT-C model in production by leveraging <a href="https://azure.microsoft.com/en-us/services/machine-learning/" target="_blank" rel="noopener">Azure Machine Learning service</a> with a cluster of virtual machines powered by NVIDIA Tesla V100 GPUs. However, there were some limitations:</p><ul><li>Cloud-based deployment requires transmitting user code over the network for inference, which increases the risks of exposing sensitive data.</li><li>The service is not accessible in disconnected or offline mode. This limitation requires developers to stay connected to the internet during their work, which may not be a choice for people who work in areas with poor internet connections.</li><li>Typical language models aim to generate full token sequences left-to-right using a beam search decoding algorithm to search for the best solutions in a batch-oriented manner. GPT-C is no exception. This scenario imposes a large memory overhead, resulting in high latency and serving costs. A 12-layer generative transformer model requires 374 MB in memory usage, takes around 80 ms GPU time per inference call. This cost of scaling it to our large user base would make it impractical.</li></ul><p>With its resource-efficient and high-performance nature, ONNX Runtime can help address these limitations in GPT-C model production.</p><h2>Large scale transformer model with ONNX Runtime</h2><p><a href="https://github.com/onnx/onnx" target="_blank" rel="noopener">ONNX</a> (Open Neural Network Exchange) and <a href="https://github.com/microsoft/onnxruntime" target="_blank" rel="noopener">ONNX Runtime</a> play an important role in accelerating and simplifying transformer model inference in production. ONNX is an open standard format representing machine learning models. Models trained with various frameworks, e.g. PyTorch, TensorFlow, can be converted to ONNX. Built based on the ONNX standard, ONNX Runtime is an optimized inference engine for efficiently running any model converted to the ONNX format across different hardware and operating systems with minimum effort. Due to this framework interoperability nature of ONNX, ONNX Runtime improves the development efficiency from model training to inference. Through various optimization techniques, ONNX Runtime can run all kinds of models with optimal performance across hardware platforms.</p><p>To deliver the IntelliCode line completion experience at a low cost, we decided to deploy GPT-C on the client-side. This means that the GPT-C model needs to be run on CPU efficiently with a wide range of client devices. Thanks to ONNX Runtime, our first attempt significantly reduces the memory usage from about 370MB to 80MB. ONNX Runtime enables <a href="https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333" target="_blank" rel="noopener">transformer optimizations</a> that achieve more than 2x performance speedup over PyTorch with a large sequence length on CPUs. PyTorch offers a built-in ONNX exporter for exporting PyTorch model to ONNX. On top of that, ONNX Runtime builds the <a href="https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/convert_to_onnx.py" target="_blank" rel="noopener">GPT2 conversion tool</a> for simplifying the conversion experience for GPT2 models with the past states. Our GPT-C transformer model is easily converted from PyTorch to ONNX by leveraging this tool, then runs with ONNX Runtime with good performance. In addition to the model itself, beam search is another important component in our deployment. In the initial version, beam search modules were implemented in managed code (C# and Typescript). It scores and re-ranks the output tensors received from the previous ONNX Runtime model inference step. When the scoring and re-ranking are done, the model retrieves the output tensors from the beam search module and conducts another round of inference. Due to the inefficiency of managed code implementation, the E2E client-side GPT-C inference suffers from a relatively poor response time of around 1 second CPU time for each line completion inference.</p><p>To improve the E2E performance of client-side GPT-C further, we extended the <a href="https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/convert_to_onnx.py" target="_blank" rel="noopener">GPT2 conversion tool</a> to support GPT-2 models with native one-step beam search. This was collaborative work between the DevDiv Data&amp;AI Applied Science team, <a href="https://msturing.org/about" target="_blank" rel="noopener">Microsoft Turing team</a>, and the <a href="https://www.onnxruntime.ai/" target="_blank" rel="noopener">ONNX Runtime team</a>. Consequently, we improved both aspects of training and of deploying GPT-2 models, which makes it simpler and more efficient for GPT-2 models with native one-step beam search to fully access hardware acceleration through ONNX Runtime.</p><h2>One-step beam search optimization through ONNX Runtime for large scale transformer model</h2><p>As shown in Figure 1, GPT-C is leveraging the native one-step beam search in its compute graph. Specifically, one-step beam search is compiled as <a href="https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html" target="_blank" rel="noopener">TorchScript</a> code that serves as a bridge between the GPT-C beam search module and ONNX Runtime. Then <a href="https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/convert_to_onnx.py" target="_blank" rel="noopener">GPT2 conversion tool</a> calls to the ONNX conversion APIs to convert one-step beam search into ONNX operators and appends to the end of the converted GPT-C transformer model ONNX compute graph. After GPT-2 models with native one-step beam search are converted to the whole ONNX graph, <a href="https://medium.com/microsoftazure/faster-and-smaller-quantized-nlp-with-hugging-face-and-onnx-runtime-ec5525473bb7" target="_blank" rel="noopener">ONNX Runtime quantization</a> is applied to further reduce the size of the model. When deploying the GPT-C ONNX model, the IntelliCode client-side model service retrieves the output tensors from ONNX Runtime and sends them back for the next inference step until all beams reach the end of the line.</p><img loading="lazy" alt="Figure 1. How GPT-C Model deployed in Visual Studio and Visual Studio Code" width="624" height="290" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/06/How-GPT-C-Model-deployed-in-Visual-Studio-and-Visual-Studio-Code.png"><p><em>Figure 1. How GPT-C Model deployed in Visual Studio and Visual Studio Code</em></p><p>We measured the latency of the GPT-C ONNX model on both CPU and GPU configurations. CPU performance measurement was done on a laptop machine with an Intel Core i7-8650U CPU. Compared with the initial attempt client-side GPT-C, performance gains up to 4.0x with around 300 ms per inference.</p><p>For GPU, we used one NVIDIA V100-PCIE-16GB GPU on an Azure Standard_NC12s_v3 VM and tested it in FP16 configuration. Compared with PyTorch, ONNX Runtime showed both significant memory efficiency and performance speedup with up to 5x and 4x, respectively.</p><h2>Technical insights about one-step beam search ONNX optimization</h2><p>Considering beam search requires multiple steps with certain stop conditions while the ONNX graph is static, we standardize the interface by exporting only one step of the beam search to ONNX. To enable multi-step beam search, all we need is a simple loop with a proper stop condition. Unfortunately, we ran into problems as the beam search algorithm requires loop operations in selecting beams and a set to store finished beams, which aren't natively supported in ONNX spec yet. To overcome this, at each step, we use two matrices to store the beam indices and scores at each step. In addition, we use a vector of indicators to track if the input beams are finished.</p><p><strong><img loading="lazy" alt="Figure 2. Graphical user interface, text, application, chat or text message" width="1024" height="271" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/06/Figure2-1024x271.webp"></strong></p><p><em>Figure 2. Left: Input beams. Right: candidate output beams. Meaning of tuples: (index, scores and or probabilities, finish and or unfinished). Here beam size (k) is 3 and index 2 is an &lt;end-of-text&gt; token. For the 3<sup>rd</sup> row, the input is finished (i.e., reach &lt;end-of-text&gt;), so we construct and insert k "fake" candidates with the 1<sup>st</sup> beam carrying the same score as the input with a pad index (or arbitrary index). The 2<sup>nd</sup> and after beams are given a score of -Inf, which will be dropped when finding the top-k (shadowed) from all candidates.</em></p><p>As the example shown in Figure 2, input beams feed into the model to get a probability distribution of the next tokens. Since model inference is expensive, we only run the model on the unfinished beams, denoted as <em>k0</em> (<em>k0</em> = 2 in the example), and select top-k (the beam size) candidates for each unfinished input beam, which results in a <em>k0 x k</em> table. Then the finished beams are constructed and inserted back into the candidate pool or table by using the ONNX scatter operator. Therefore, we end up with a table of <em>k x k</em> candidates. For the next round of beam search, the next k-beams are selected by the top-k operator, which automatically discards finished beams with a <em>-inf</em> score, avoiding the use of branching yet to be supported in ONNX to handle finished and unfished inputs separately.</p><p>By putting beam search into the ONNX graph, we benefit from ONNX Runtime's optimization and reduce the overhead of transforming data between ONNX Runtime and the scripting language, which helps reduce model inference latency. Another benefit is to help bridge the gap between model training and deployment. As we know, it is common that the programming language for the large-scale transformer model in production may not be the same as the one used to train the model. For example, we found that most models are trained and tested using Python packages such as PyTorch but deployed in C#, C++, or Javascript packages. This means that the exact same beam search algorithm has to be implemented in different languages, which would cause inconsistency and maintenance issues. Given beam search algorithm is mostly standardized, exporting beam search directly to the ONNX graph avoids substantial code changes during deployment.</p><h2>Try it now</h2><p>We are delighted to offer this innovation to the public developer and data science community. You can now leverage high-performance inference with ONNX Runtime for a given GPT-2 model with one step beam search with the following steps:</p><ol><li>Train a model with or load a pre-trained model from GPT-2.</li><li>Convert the GPT-2 model with one-step beam search to ONNX format.</li></ol><p>Run the converted model with ONNX Runtime on the target platform of your choice. Check out this <a href="https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Inference_GPT2-OneStepSearch_OnnxRuntime_CPU.ipynb" target="_blank" rel="noopener">end-to-end tutorial</a>.</p><h2>Ongoing work</h2><p>We will continue optimizing the performance of the large-scale transformer model in ONNX Runtime. There are still opportunities for further improvements, such as integrating the multi-step beam search into the ONNX model.</p><p>We have completed the internal preview of IntelliCode line completion and released it to <a href="https://visualstudio.microsoft.com/vs/preview/vs2022/" target="_blank" rel="noopener">preview in Visual Studio 2022</a>.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/">Journey to optimize large scale transformer model inference with ONNX Runtime</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to migrate and modernize Linux workloads and open source databases to Azure</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/06/21/how-to-migrate-and-modernize-linux-workloads-and-open-source-databases-to-azure/</link>
		
		<dc:creator><![CDATA[Chhavi Nijhawan]]></dc:creator>
		<pubDate>Mon, 21 Jun 2021 15:00:36 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Azure Migrate]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/06/21/how-to-migrate-and-modernize-linux-workloads-and-open-source-databases-to-azure/</guid>

					<description><![CDATA[<p>With extensive support for all major Linux distributions including Red Hat, SUSE, Ubuntu, CentOS, Debian, and managed platform-as-a-service (PaaS) offerings for open source databases like Azure Database for MySQL, Azure Database for PostgreSQL, and Azure Database for MariaDBit&#8217;s no surprise that Linux is the fastest growing platform on Azure. Furthermore, Azure Migrate makes the discovery,<span><a class="read-more" aria-label="Read more about How to migrate and modernize Linux workloads and open source databases to Azure" href="https://cloudblogs.microsoft.com/opensource/2021/06/21/how-to-migrate-and-modernize-linux-workloads-and-open-source-databases-to-azure/" data-bi-cn="Read more about How to migrate and modernize Linux workloads and open source databases to Azure">Read more</a></span></p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/21/how-to-migrate-and-modernize-linux-workloads-and-open-source-databases-to-azure/">How to migrate and modernize Linux workloads and open source databases to Azure</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>With extensive <a href="https://docs.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros" target="_blank" rel="noopener">support for all major Linux distributions</a> including Red Hat, SUSE, Ubuntu, CentOS, Debian, and managed platform-as-a-service (PaaS) offerings for open source databases like <a href="https://azure.microsoft.com/en-us/services/mysql/" target="_blank" rel="noopener">Azure Database for MySQL</a>, <a href="https://azure.microsoft.com/en-us/services/postgresql/" target="_blank" rel="noopener">Azure Database for PostgreSQL</a>, and <a href="https://azure.microsoft.com/en-us/services/mariadb/" target="_blank" rel="noopener">Azure Database for MariaDB</a>it&rsquo;s no surprise that Linux is the fastest growing platform on Azure. Furthermore, <a href="https://azure.microsoft.com/en-us/services/azure-migrate/" target="_blank" rel="noopener">Azure Migrate</a> makes the discovery, assessment, migration, and modernization of apps, databases, and serversboth Linux and Windowsto Azure seamlessly. In this blog, we will show you how to migrate and modernize an open-source Java web application running on Linux and a MySQL database, to Azure using Azure Migrate.</p><h2>Easily migrate and modernize Linux and open source databases to Azure</h2><p>Azure Migrate is your one-stop-shop in Azure for migrating and modernizing your virtual machines like <a href="https://azure.microsoft.com/en-us/campaigns/windows-server/" target="_blank" rel="noopener">Windows</a> or <a href="https://azure.com/Linux" target="_blank" rel="noopener">Linux Servers</a>, databases, data, <a href="https://azure.microsoft.com/en-us/migration/web-applications/" target="_blank" rel="noopener">web apps</a>, and virtual desktops. Azure Migrate features free Azure migration tools with features like agentless datacenter discovery, Azure readiness analysis, cost estimation, app modernization, and app dependency visualization as well as popular migration tools from our ISV partners to help you in the discovery, assessment, and migration phases of your migration and modernization journey in one central location with end-to-end visibility.</p><p>In the below demo video, we migrated and modernized an open-source Java app, <a href="https://github.com/airsonic/airsonic" target="_blank" rel="noopener">Airsonic</a>, and its backend MySQL database, both running on-premises on Linux Virtual Machines, to Azure. To modernize the MySQL database, we moved the data from the on-premises virtual machine into Azure Database for MySQL, using the <a href="https://azure.microsoft.com/en-us/services/database-migration/?OCID=AID2100131_SEM_895e2d51255414da242b4dec56ac84dc:G:s&amp;ef_id=895e2d51255414da242b4dec56ac84dc:G:s&amp;msclkid=895e2d51255414da242b4dec56ac84dc" target="_blank" rel="noopener">Azure Database Migration Service</a>. Azure Database for MySQL is a managed database, so once you have the data in Azure Database for MySQL, you don't have to worry about managing a virtual machine and you get the benefits of built-in scalability, high availability, enterprise-grade SLAs, and cost optimization.</p><p>For the modernizing the app, we containerized it using the <a href="https://docs.microsoft.com/en-us/azure/migrate/tutorial-containerize-java-kubernetes" target="_blank" rel="noopener">Azure Migrate App Containerization tool</a>, so you can achieve faster application development cycles, ease of deployment, and quick scalability offered by containers, all without making any code changes to the app. Also, check out this <a href="https://aka.ms/migrate/mysql" target="_blank" rel="noopener">MySQL migration guide</a>, to get detailed step-by-step guidance on how to migrate MySQL workloads to Azure Database for MySQL.</p><p>To learn more, watch the <a href="https://www.youtube.com/watch?v=1iskhkEtFNk" target="_blank" rel="noopener">Microsoft Mechanics video</a> below, which shows you step-by-step how to migrate and modernize your Linux and open source databases to Azure.</p><h2>Get full support for Linux and high availability, industry-leading SLAs for open source databases</h2><p><a href="https://docs.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros" target="_blank" rel="noopener">Azure supports all major Linux distributions</a> including Red Hat, SUSE, Ubuntu, CentOS, Debian, Oracle Linux, and Flatcar Linux and open-source databases like MySQL, PostgreSQL, Cassandra, MariaDB, and more. More than 60 percent of Azure Marketplace solutions run on Linux. Then, beyond the workload level, Azure also contributes back to the upstream Linux and Kubernetes communities, that many of the modern and cloud-native architectures rely on.</p><p>Microsoft has done a ton of work for performance, reliability, manageability, and security to make Azure the best home for running any open source workload. Starting at the foundational level, Microsoft is working with the leading Linux Distros to optimize the kernels and hypervisors of Azure, including tuning the kernel for Azure hypervisors. Microsoft also works closely with Red Hat for managed services like Azure Red Hat OpenShift and SUSE with SAP enhancements. So when you bring your workloads to Azure, there is a benefit every step of the way, from onboarding to operation and you gain more security than you might have had on-premises, in your private cloud, or in another cloud. And whether you are starting greenfield or bringing what you already have running to Azure, we've got you covered.</p><p>Here is a summary of the key advantages of running Linux workloads and open source database services on Azure:</p><ul><li>Support for all major Linux distros like <a href="https://Azure.com/RedHat" target="_blank" rel="noopener">Red Hat</a>, <a href="https://Azure.com/SUSE" target="_blank" rel="noopener">SUSE</a>, <a href="https://Azure.com/Linux" target="_blank" rel="noopener">Ubuntu, Oracle Linux, Debian, CentOS, CoreOS, and OpenSUSE</a>.</li><li><a href="https://docs.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros#azure-tuned-kernels" target="_blank" rel="noopener">Azure-tuned kernels</a> provide 25 percent faster network throughput.</li><li><a href="https://docs.microsoft.com/en-us/troubleshoot/azure/cloud-services/support-linux-open-source-technology" target="_blank" rel="noopener">Unique and integrated support experience</a>. Our support teams work with Red Had and the SUSE support team, to triage your support cases together.</li><li>Minimal administration required for <a href="https://azure.microsoft.com/en-us/services/mysql/" target="_blank" rel="noopener">MySQL</a>, <a href="https://azure.microsoft.com/en-us/services/postgresql/" target="_blank" rel="noopener">PostgreSQL</a>, and <a href="https://azure.microsoft.com/en-us/services/mariadb/" target="_blank" rel="noopener">MariaDB</a> with fully managed databases based on the latest community editions.</li><li>The best total cost of ownership, high availability, and built-in intelligence provided by managed databases.</li><li>Enterprise scalability with <a href="https://docs.microsoft.com/en-us/azure/postgresql/quickstart-create-hyperscale-portal" target="_blank" rel="noopener">Hyperscale (Citus)</a> which enables scaling of PostgreSQL across multiple servers and Query parallelization across those servers for faster responses.</li></ul><h2>Learn more</h2><p>Find just about everything related to <a href="https://azure.com/linux" target="_blank" rel="noopener">Linux running on Azure</a>. And once you're ready to try migration or modernize your apps and open source databases, you can use <a href="https://aka.ms/azuremigrate" target="_blank" rel="noopener">Azure Migrate</a> to find the tools to <a href="https://aka.ms/datamigration" target="_blank" rel="noopener">migrate your databases</a>. Get guidance on how to migrate and modernize your workloads, apps, and databases on <a href="https://azure.microsoft.com/en-us/migration/migration-journey/" target="_blank" rel="noopener">Azure Migration Center</a> and enroll in <a href="https://azure.microsoft.com/en-us/migration/migration-program/" target="_blank" rel="noopener">Azure Migration Program</a> to get expert help. And we also have a ton of learning content available on <a href="https://docs.microsoft.com/en-us/learn/topics/azure-migration" target="_blank" rel="noopener">Microsoft Learn</a> to help you easily migrate and modernize your applications to Azure.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/21/how-to-migrate-and-modernize-linux-workloads-and-open-source-databases-to-azure/">How to migrate and modernize Linux workloads and open source databases to Azure</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Empowering you to achieve more with open source on Azure</title>
		<link>https://cloudblogs.microsoft.com/opensource/2021/06/16/empowering-you-to-achieve-more-with-open-source-on-azure/</link>
		
		<dc:creator><![CDATA[Katie Fritsch]]></dc:creator>
		<pubDate>Wed, 16 Jun 2021 16:00:43 +0000</pubDate>
				<category><![CDATA[Application Development]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Microsoft Azure]]></category>
		<guid isPermaLink="false">https://cloudblogs.microsoft.com/opensource/2021/06/16/empowering-you-to-achieve-more-with-open-source-on-azure/</guid>

					<description><![CDATA[<p>At Microsoft, we are taking cloud architecture to the next level and our open cloud reduces the friction for developers to get applications up and running. We give autonomy and control to the developers to flexibly choose their infrastructure and give them options to build, migrate, and deploy across multiple environments on-premises, in the cloud, or at the edge. Our philosophy is to give developers the best technology as quickly as possible.</p>
<p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/16/empowering-you-to-achieve-more-with-open-source-on-azure/">Empowering you to achieve more with open source on Azure</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Open source plays a critical role in your application development and modernization. At Microsoft, we are taking cloud architecture to the next level and our <a href="https://azure.microsoft.com/en-us/overview/open-source/" target="_blank" rel="noopener">open cloud</a> reduces the friction for developers to get applications up and running. We give autonomy and control to developers to flexibly choose their infrastructure and give them options to build, migrate, and deploy across multiple environments on-premises, in the cloud, or at the edge. Our philosophy is to give developers the best technology as quickly as possible. We are a tools company at our core, and we are constantly innovating and pushing out the latest code so that you can take our best technology and build your best technology, fast.</p><p>To learn more about how Microsoft is making it easier to build scalable applications and contributing to open source projects visit our most recent blog post, "<a href="https://azure.microsoft.com/en-us/blog/gain-flexibility-to-run-open-source-applications-your-way-with-microsoft-azure/" target="_blank" rel="noopener">Gain flexibility to run open source applications your way with Microsoft Azure</a>."</p><h2>Azure is the cloud for open source developers</h2><p>You have endless possibilities on your cloud-native journey as new technologies and patterns are emerging every day. Azure supports developers to build on their terms, with integrated support for open source tools, languages, and third-party integrations of choice. We are constantly improving Azure for all. Our open cloud allows you to be free from infrastructure management, without the fear of vendor lock-in so that you can focus on delivering impact. Learn from developers around the globe why they chose Azure from our <a href="https://azure.microsoft.com/en-us/resources/videos/open-source-on-azure-developers-tell-all/" target="_blank" rel="noopener">Developers Tell All video</a> below.</p><p><a href="https://azure.microsoft.com/en-us/resources/videos/open-source-on-azure-developers-tell-all/" target="_blank" rel="noopener"><img loading="lazy" alt="A video thumbnail with nine faces of developers smiling at the camera" width="624" height="354" src="https://cloudblogs.microsoft.com/uploads/prod/sites/37/2021/06/Developers-Tell-All-video.png"></a></p><p>At Microsoft Build, we shared new technologies to enable developers to develop flexibly and innovate quickly on Microsoft Azure. Check out our session, "<a href="https://mybuild.microsoft.com/sessions/b66c3a65-4d11-4c1b-9b29-4df873a8cf4d" target="_blank" rel="noopener">Run Open Source Applications your way with Microsoft Azure</a>," and our demos to see how we are empowering developers. Get the latest updates on the <a href="https://cloudblogs.microsoft.com/opensource/" target="_blank" rel="noopener">open source blog</a> to learn how our engineers are innovating with open source and <a href="https://twitter.com/openatmicrosoft?lang=en" target="_blank" rel="noopener">follow us on Twitter</a>.</p><p>The post <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource/2021/06/16/empowering-you-to-achieve-more-with-open-source-on-azure/">Empowering you to achieve more with open source on Azure</a> appeared first on <a rel="nofollow" href="https://cloudblogs.microsoft.com/opensource">Microsoft Open Source Blog</a>.</p>]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>