Hardware neural network accelerator can be deployed in an SoC or standalone for automotive vision AI engines.
Hungary-based AImotive, a developer of software and hardware based automated driving technologies, has started shipment of its aiWare3 neural network (NN) hardware inference engine intellectual property (IP) to its lead customers.
Its aiWare3P IP core, which was announced last year, offers a hardware NN accelerator for high-resolution automotive vision applications, and as a component within ISO26262 ASIL A, B and above certified subsystems. The core, which can be deployed within a system on chip (SoC), or as a standalone NN accelerator, is provided as fully synthesizable RTL; its low-level microarchitecture is designed to use far less host CPU or shared memory resources than other hardware NN accelerators.
Speaking to EE Times Europe about how the AIMotive offer is different to other solutions, Tony-King Smith, the company’s executive advisor, said most chip players talk in academic terms about accelerators based on GPUs and SoCs, tested in a lab environment, which doesn’t really translate well to the real world. “The crucial difference is that it’s necessary to understand the principles of neural networks rather than the accelerator. In our solution there are no DSPs, no NOCs (network on chip). aiWare is only designed for automotive inference, hence we are able to provide low latency from input to output.” He added that improvements in the RTL output of the new core means it frees up the main CPU subsystem, and the core can then be attached to any accelerator SoC.
The aiWare3P IP core incorporates features that result in improved performance, lower power consumption, greater host CPU offload and simpler layout for larger chip designs. Each core offers up to 16 TMAC/s (>32 TOPS) at 2GHz, with multi-core and multi-chip implementations capable of delivering up to 50+ TMAC/s (>100 INT8 TOPS) – useful for multi-camera or heterogeneous sensor-rich applications. The core is designed for AEC-Q100 extended temperature operation and includes features to enable users to achieve ASIL-B and above certification.
The IP core’s performance scalability to more than 50 TMAC/s (>100 TOPS) per chip and low latency sustained inference is a result of its low-level micro-architecture. It uses a patented ground-up design for highly deterministic dataflow management, with highly parallel memory-centric architecture featuring up to 100x more on-chip memory bandwidth than other hardware NN accelerators, ensuring up to 95% sustained efficiency for complex DNNs used with large inputs such as multiple HD cameras.
Supporting Khronos’ NNEF as well as open standard ONNX inputs, the aiWare SDK directly compiles binaries with no need for low level programming of DSPs or MCUs. It includes automated tools for FP32 to INT8 quantization with little or no loss of accuracy, alongside a growing portfolio of sophisticated DNN performance analysis tools. The latter are designed to help software and AI engineers migrate and transform NNs trained in a lab into efficient real-time solutions executing on aiWare-powered production automotive hardware platforms.
Marton Feher, senior vice president of hardware engineering for AImotive, said, “Our production-ready aiWare3P release brings together everything we know about accelerating neural networks for vision-based automotive AI inference applications. We now have one of the automotive industry’s most efficient and compelling NN acceleration solutions for volume production L2/L2+/L3 AI.”
The aiWare3P hardware IP is being deployed in a range of L2/L2+ production solutions, as well as being adopted for studies of more advanced heterogeneous sensor applications. Customers include Nextchip for their forthcoming Apache5 Imaging Edge Processor, and ON Semiconductor for their collaborative project with AImotive to demonstrate advanced heterogeneous sensor fusion capabilities.
AImotive said it will be releasing a full update to their public benchmark results in Q1 2020 based on the aiWare3P IP core. This is part of its commitment to open benchmarking using well-controlled benchmarks reflecting real applications such as high-resolution inputs for cameras rather than unrealistic public benchmarks using 224×224 inputs.
No host CPU intervention needed
New features of the aiWare3P hardware IP include support for a much larger portfolio of pre-optimized embedded activation and pooling functions, ensuring that 100% of most NNs execute within the aiWare3P core without any host CPU intervention; real-time data compression, reducing external memory bandwidth requirements – especially for larger input sizes and deeper networks; and advanced cross-coupling between C-LAM convolution engines and F-LAM function engines, to increase overlapped and interleaved execution efficiency.
The physical tile-based microarchitecture enables easier physical implementation of large aiWare cores by minimizing difficult timing constraints on any process node; and logical tile-based data management enables efficient workload scalability up to the maximum 16 TMAC/s per core, without the need for caches, NOCs or other complex multi-core processor-based approaches that create bottlenecks, reduce determinism and consume more power and silicon area The aiWare3P RTL will be shipping to all customers from January 2020, and an upgraded SDK includes improved compiler and new performance analysis tools for both offline estimation and real-time fine-grained target hardware analysis.