Microsoft showed off a new system called Brainwave, is an FPGA (Field Programmable Gate Array)based system for ultra-low latency deep learning in the cloud. The new system allows developers to deploy machine learning models onto programmable silicon and achieve high performance.
Researchers explains a Gated Recurrent Unit model running on Intel’s new Stratix 10 FPGA chip at a speed of 39.5 teraflops, without arranging operations at all. The model that Microsoft chose is larger than convolutional neural networks like Alexnet and Resnet-50.
However, low-latency is important for deploying machine learning systems at scale. “We call it real-time AI because the idea here is that you send in a request, you want the answer back,” said, Doug Burger, a famous engineer with Microsoft Research.
However, previous results on hardware-accelerated machine learning focused on results that optimize for throughput at the cost of the quality. In Burger’s view, how a machine learning accelerator performs without bundling requests into a batch and processing them all at once.
Microsoft uses Brainwave system across FPGAs. Burger said, Brainwave allows Microsoft services to more rapidly support AI features. Additionally, Microsoft working to make Brainwave available to third-party customers through its Azure cloud platform.
FPGAs programmers configure hardware optimized to execute functions prior to runtime. Microsoft deployed thousands of FPGAs in its data centers on boards, slotted into servers and connected to the network.
FPGA hardware’s memory
Brainwave loads a trained machine learning model into FPGA hardware’s memory that stays there throughout the lifetime of a machine learning service. That hardware used to compute whatever insights the model designed to generate. If a model too big to run on a single FPGA, software deploys and executes across multiple hardware boards.
FPGAs are less fast and less efficient than chips made specifically to execute machine learning operations. Burger said that this performance milestone should show that the programmable hardware can deliver high performance as well. With further improvements, Microsoft to hit 90 teraflops with the Intel Stratix 10.
Currently, Brainwave supports training models using Microsoft’s CNTK framework and Google’s TensorFlow framework.
Microsoft isn’t the only company investing in hardware that’s supposed to accelerate machine learning. Google announced the second revision of its Tensor Processing Unit earlier this year.
Microsoft hasn’t revealed Brainwave available to its customers, but is working towards a future when third parties would be able to bring any trained model and run it on Brainwave.