Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU
This post is co-authored by Emma Ning, Azure Machine Learning; Nathan Yan, Azure Machine Learning; Jeffrey Zhu, Bing; Jason Li, Bing One of the most popular deep learning models used for natural language processing is BERT (Bidirectional Encoder Representations from Transformers). Due to the significant computation required, inferencing BERT at high scale can be extremely Read more