SAN JOSE, Calif. — A Facebook executive confirmed reports that the social networking giant is hiring chip engineers and designing at least one ASIC. The news came at the @Scale event here, where Facebook announced that five chip companies will support Glow, an open-source, deep-learning compiler that it backs.

Facebook “is absolutely bringing up a silicon team focused on working with silicon providers, and we have a chip we’re building, but it’s not our primary focus,” said Jason Taylor, vice president of infrastructure at Facebook. The chip is “not the equivalent of [Google’s] TPU” deep-learning accelerator, he added, declining to provide further details on its focus or time frame.

Working with the estimated 50 companies designing AI accelerators is one focus for the new Facebook chip group. “There will be a lot of [accelerator] chips in the market,” said Taylor at a press roundtable. “The big question is whether the workloads they are designed for are the important ones at the time.”

In a keynote, Taylor described Glow as a generic compiler to let developers target any of the emerging deep-learning accelerators for inference in the cloud or at the edge of the network. It does not target client systems such as smartphones.

“We expect that there will be hardware fragmentation [in inference accelerators]. Our work with Glow is to help machine-learning experts design neural nets and not have to do the work required to tune them” to each unique chip.

“We know that the fragmentation is coming because no one knows what combination of [hardware] resources [such as on-chip memory blocks and multiply-accumulate arrays] will win, so we’ll let developers focus on the high-level graphs without hand-coding for the specifics of hardware.”

Jason Taylor described Glow as a compiler for inference on cloud and edge networks. (Images: Facebook)
Jason Taylor described Glow as a compiler for inference on cloud and edge networks. (Images: Facebook)

Glow takes an AI graph produced by a framework such as TensorFlow or Caffe2 and renders it into byte code for hardware accelerators, explained Taylor. The compiler includes several tools including an instruction scheduler, a linear algebra optimizer, a memory allocator to generate efficient code for a chip’s specific memory configuration, and a CPU-based reference implementation for testing the accuracy of the hardware, according to a Facebook blog.

Cadence, Esperanto Technologies, Intel, Marvell, and Qualcomm said that they will support Glow on future chips. Taylor said that he expects to add others to the list. “That’s one of the benefits of it being open-source.”

One senior chip expert described Glow as a framework for deploying a neural network in production systems. Its input would be a graph created in a framework such as TensorFlow or Caffe2.

Some established chipmakers already supply similar software. For example, Nvidia’s Tensor RT takes in a graph from a framework and outputs Cuda code for its GPUs.

Traditionally, compilers are tightly optimized for a specific chip. But “what a compiler is these days is quite a bit broader than in the past — the kinds of optimizations in Glow have to do with identifying large portions of a graph that can be rendered to a hardware accelerator,” said Taylor.

Glow is the latest example of an effort to plug the gap between software and hardware in the fast-moving world of deep learning. For example, Nvidia’s Tensor RT is now in its fifth version, though it was first released just a year ago. Some accelerator startups express frustration at the level of work needed to support the wide variety of software frameworks and their changes.

Facebook, Microsoft, and others are backing ONNX, a standard way to express a graph with its weights. In December, the Khronos Group released NNEF, a hardware abstraction layer for deep-learning accelerators.

For its part, Glow is a single component of Pytorch 1.0, a collection of open-source projects that includes merged Caffe2 and Pytorch frameworks. The first developer conference for Pytorch 1.0 is slated for October in San Francisco.

In a separate talk, Facebook engineering manager Kim Hazelwood rattled off a list of a dozen different deep-learning workloads that the social network uses, employing at least four different kinds of neural nets. Every day, the AI apps generate more than 200 trillion inferences, translate more than five billion texts, and automatically remove more than a million fake accounts.

Some of Facebook’s inference tasks require 100 times more compute than others, she said. Today, Facebook runs the jobs on a handful of CPU and GPU servers that it has designed.

Moving from general-purpose to custom hardware would require tailoring chips specific to those still-changing workloads, Hazelwood told EE Times after her talk. She declined to give any insights into Facebook’s thoughts on using any custom AI accelerators.

Facebook Neural net types

Facebook alone uses at least five kinds of neural networks across at least a dozen deep-learning apps.

One observer speculated that Glow would be an ideal tool to enable the company to adopt a handful of accelerators suited to its various workloads. Its semiconductor team could help cull through the options to select a few chips and perhaps suggest customizations for some of them.

Separately, Facebook posted a blog describing a new software tool that it created that uses deep learning to debug code. SapFix can automatically generate fixes for specific bugs and then propose them to engineers for approval and deployment to production, it said.

So far, Facebook has used SapFix to accelerate the process of shipping code updates to millions of devices using the Facebook Android app. Facebook said it will release a version of the tool but did not state when.

— Rick Merritt, Silicon Valley Bureau Chief, EE Times