SANTA CLARA, Calif. – Startup Wave Computing added IP for deep learning to its expanding business model of chips, systems and services. Its TritonAI 64 packages existing MIPS and dataflow blocks with a new tensor core unit, initially targeting inference jobs at the edge.

Observers expressed surprise one of the first startups to design accelerators for deep learning would enter a market already well served by established IP players. Wave has yet to reveal specs, performance and availability of its new products, leaving analysts unable to make meaningful comparisons to existing blocks from Cadence, Ceva, Nvidia, Synopsys and others.

“It’s a busy sector, but they have MIPS now so they have expertise in licensing,” said Linley Gwennap of the Linley Group, referring to Wave’s acquisition in June of the processor IP vendor.

The WaveTensor block, the new element in Triton, is a matrix multiply unit, a standard feature of most deep learning accelerators since Google revealed its TPU in 2016. It is made up of multiple 4x4 and 8x8 kernels gathered into an array capable of up to 8 TOPS/watt and more than 10 TOPS/mm2 in a 7nm process.

One rival called the new unit a step backward into a more conventional AI architecture focused on convolutional neutral networks (CNNs). In its first disclosures, Wave described a dataflow processor flexible across a wide range of neural net jobs.

The dataflow unit still exists in TritonAI next to the tensor core and up to six 64-bit MIPS cores to run Google’s TensorFlow framework. In practical applications, the tensor units will handle “80-90% of the computations” needed for CNNs, said Chris Nicol, chief technologist of Wave in a talk presenting Triton AI at the Linley Spring Processor Conference here.

The MIPS, tensor, and dataflow blocks are synthesizable cores configurable for different array sizes and caches. Users will need performance data to make informed configuration choices, something Nicol said the startup’s benchmark team is developing.

TritonAI Wave sketched out its software stack for TritonAI. (Source: Wave Computing)

A still unbounded problem space for deep learning

Wave expects to detail later this year how it links the tensor and dataflow blocks using a novel approach for address generation and data movement without CPU intervention. The net result is a highly flexible architecture programmed in what Nicol called a “C-like language.”

Among its other goodies, Wave will offer an SDK as well as access to its internal quantization tool. It also expects to release SIMD extensions it made for MIPS blocks in TritonAI.

Long term, Wave is exploring enhancements to MIPS for the bfloat16 format and for training at the edge. “It’s still very early days” for a project on edge training “spearheaded at Google…but it looks like exciting research, and we’re funding a project on it,” Nicol said.

Wave has yet to reveal any customers using its products that include its initial dataflow chip and servers powered by it. To date, the company has taken in more than $200 million in venture capital. It also has set up a 40-person data science team helping users develop deep-learning software.

The good news is the problem space for deep learning still seems unbounded. One data center operator that is a Wave stakeholder wants to be able to train by 2021 neural networks with as many as a trillion parameters, said Nicol.

“Neither HMC nor HBM memory has that capacity for that challenge,” he said, noting Wave’s initial chips aimed at data centers use HMC memory stacks.