Andes Technology has released the AndesCore 27-series CPU cores, which it claims are the first licensable RISC-V core to deliver to a production licensee the RISC-V vector instruction extension (RVV). It has also re-architected the memory subsystem to sustain memory bandwidth and efficiency.

Andes has delivered the core to its first licensee, with production release slated for Q1, 2020. Charlie Su, CTO and EVP of Andes Technology, told EE Times the first customer will use the new core for a datacenter AI engine. The customer plans to reveal itself, along with details of how it’s using the Andes core, at the RISC-V Summit in San Jose, California this week.

The president of Andes Technology, Frankwell Lin, said, “The RVV extension boldly takes RISC-V beyond any licensable processor core technology into the hottest markets today, and our licensee’s confidence in the R&D team enables Andes to be the first to deliver on this ambitious vision. The team has worked together from specification to delivery in less than nine months.”

The advent of AI, AR/VR, computer vision, cryptography, and multimedia processing all require complex computation of large volume of matrix data. Unlike other vendors’ advanced SIMD, which has a narrow range of performance dictated by their architecture control, the RVV specification envisions a powerful instruction set with scalable data sizes, flexible microarchitecture implementations, and leaves memory subsystem decisions open for system level optimization. With the 27-series CPU cores, Andes said it delivers this unprecedented performance and flexibility to the RISC-V community and for the first time, enables RISC-V cores to fill a void in applications other vendors have not been able to reach.

The NX27V contains a vector processing unit (VPU) which supports the RVV scalable vector instruction set, designed from the ground up to be a Cray-like full vectorization computation unit (Image: Andes Technology)

Initially available in the 27-series will be the 32-bit A27, and 64-bit AX27 and NX27V. They build upon the Andes 25-series cores, supporting the latest RISC-V specifications, subsystem level components, as well as ecosystem enablement from Andes’ 14-years of R&D development. The A27 and AX27, tailored for applications running Linux, offer 50% higher memory bandwidth than its 25-series predecessors. The NX27V contains a vector processing unit (VPU) which supports the RVV scalable vector instruction set, designed from the ground up to be a Cray-like full vectorization computation unit — in contrast to the incremental growth from SIMD instructions which some advanced SIMD has evolved from.

There is a full vector register file (VRF) of user-configurable number of elements per register. Each vector can be arbitrary length, from as small as 64-bit to as large as 512-bit (VLEN) and all the way to 4096-bit by combining up to eight vector registers (LMUL). It also allows each computation of integer, fixed point, floating point, and other AI-optimized representations to be any bit-width from 4 bits to 32 bits (SEW) and handles non-divisible last matrix elements in the same loop. The 27-series VPU implements all of these capabilities, and has multiple functional units which are chainable, each can operate in independent pipelines to sustain the computational throughputs needed in critical kernel functions.

Fully configured, the VPU can achieve over 30x speedup measured by the key functions in MobileNets, a popular convolutional neural network (CNN). Compared to the popular 128-bit scalar SIMD solution, the NX27V VPU offers 4 times more raw processing power per cycle with additional advantage due to the higher efficiency of vector instruction issuing.

Charlie Su commented, “From the vector microarchitecture to the memory subsystem, and all the ecosystems required to enable our licensees, at whatever scale and scope the licensee deems appropriate, Andes has taken RISC-V users to the frontiers of these embedded applications.”

The 27-series has expanded its memory subsystem to keep up with the bandwidth required to sustain the computational rate of the VPU, which will benefit all customers whether they use the VPU or not. The 27-series now supports multiple outstanding memory accesses inflight so the scalar and vector processors both don’t have to wait for the data during cache misses. In addition, cache pre-fetches allow the memory to prepare data in advance of processor’s needs, thus hiding potential cache misses. Finally, Andes Custom Extension (ACE) interface has been expanded to provide instruction customization to speed up control path as well as to widen data path into the core.