SAN JOSE, Calif. — A startup will sample before June a 13W machine-learning accelerator for cars, robots and drones said to handily beat Nvidia GPUs in recognizing images. Visteon is considering using the chip in future automotive systems based on test results on an FPGA version of the device.

AlphaICs designed an instruction set architecture (ISA) optimized for deep learning, reinforcement learning and other machine learning tasks. The startup aims to produce a family of chips with 16-256 cores, roughly spanning 2W to 200W.

The market is already getting crowded with AI accelerators from startups and established companies, but money is still flowing into the space because AI represents a historic shift in computing. Rather than try to build large arrays of multiple-accumulate units as many early AI startups did, AlphaICs is part of an emerging group of startups that aims to take a broader look at a wider class of machine learning algorithms and ways to speed them up.

The startup was formed by Vinod Dham, a veteran of several x86 designs, along with technical and business co-founders based in India.

“We are on a quest to build a new type of compute engine…There has to be a better architecture for deep learning, reinforcement learning and new types of machine learning,” said Dham, who designed Pentium processors at Intel then formed processor startups NexGen and Silicon Spice, sold to AMD and Broadcom, respectively.

AlphaICs’ first product, the 13W RAP-E, does inference and some learning on devices at the network’s edge and should be in production late next year. A higher end RAP-C will be a 100W chip using high bandwidth memory for building large neural networking models in data centers and will be in an FPGA version by June.

So far, the 25-person startup based in Bangalore raised about $15 million, enough to tape out its RAP-E in a TSMC 16FF process. It aims to raise a Series B over the next nine months to fund work on a 7nm version of RAP-C.

The RAP chips include both a pool of homegrown processors and multiply-accumulate arrays on a crossbar switch. (Image: AlphaICs)
The RAP chips include both a pool of homegrown processors and multiply-accumulate arrays on a crossbar switch. (Image: AlphaICs)

AlphaICs beats Volta in Visteon race

The FPGA version of RAP-E beat Nvidia’s Volta V100 on a detailed image-recognition test using videos and convolutional neural net algorithms created by Visteon and run by AlphaICs at its labs. RAP-E beat Nvidia in all metrics by margins ranging from 50-400%.

“We were more than pleasantly surprised to see how good the technology was…so we are on the verge of engaging them at a deeper level and considering incorporating the chip in our products” for autonomous driving and in-car infotainment, said Vijay Nadkarni, an AI and augmented reality specialist at Visteon that mainly uses Nvdia chips for deep learning.

Visteon will evaluate some of the many other startups coming into the area but is unlikely to do technical testing of their parts. “AlphaICs has been good to work with, and I have to balance time testing with time developing our own products,” said Nadkarni.

The next step for Visteon is getting a hands-on with its FPGA board and kicking the tires on its programming tools. “I want to get a sample board and have my engineers and see its flexibility and how difficult or easy it is to write our algorithms to its API,” he said.

Nadkarni likes the fact the RAP-E can handle reinforcement learning and LSTMs, algorithms used to make decisions in self-driving cars and recognize speech, respectively.

Typically, Visteon defines a low-level API in its products so it can quickly port to whatever chips its customers prefer. It uses chips from NXP, Qualcomm and Renesas for some deep learning tasks, sometimes based on preferences of car makers.

Visteon already has running a heads-up display for cars that overlays AR images on the road ahead.

“With a projection, the car indicates what it’s doing, giving a driver a sense of trust. That’s important because it can be unnerving when you don’t know what a self-driving car will do. This capability could go commercial in a matter of months,” he said.


Programming a whole new ISA

At the heart of the RAP chips is a pool of basic processor cores running an instruction set the startup calls single instruction multiple agents (SIMA).

“Today’s SIMD and multi-threading architectures don’t provide the best abstraction for AI tasks, so we developed a new ISA. SIMA offers instructions such as explore-all, interact and create-event that are then decoded into micro-operations,” said Nagendra Nagaraja, chief executive of AlphaICs, and a former lead designer of Nvidia Tegra and Qualcomm Snapdragon chips.

The hardware cores were developed from scratch but are fairly typical, non-RISC pipelined units. The agents they process are defined as groups of tensors packed together to bolster parallel processing.

Users can define agents for various neural network types including reinforcement learning. One instruction can pack multiple agents with each agent run on a single core.

The chip also includes multiply-accumulate arrays similar to Google’s TPU. Unlike the TPU, the arrays are linked on a crossbar switch “so no nearest-neighbor search is needed — the crossbar lets anything be available so the data path can be used more efficiently,” said Nagaraja.

He claims the use of AI agents as an abstraction makes programming the chip easier than devices that are accessed through an operating system’s kernel. Users define the agents in a one-time process on a separate host CPU, and the RAP chip then runs autonomously.

By contrast, “GPUs need a host to do the heavy lifting. Host/GPU interaction is a bottleneck now — it limits going beyond deep learning or having models of larger sizes,” said Dham.

The RAP chips have their own runtime environment, libraries and APIs. It can be programmed in C code or through a TensorFlow framework. Eventually, AlphaICs aims to support other popular deep learning frameworks.

— Rick Merritt, Silicon Valley Bureau Chief, EE Times