EEMBC's ADASMark simulates and evaluates performance in several real-life scenarios
Automotive chip companies talk about system-on-chips designed for Advanced Driving Assistance Systems all the time.
But how can the rest of us — reporters, analysts and most important, carmakers — tell one ADAS SoC from another?
Truth is, we can’t. The absence of scientific tools and benchmarks leaves little choice but to take the vendor’s word for it. Or we rely on such imperfect measures as trillion operations per second (TOPS) to compare Intel/Mobileye’s EyeQ5 with Nvidia’s Xavier, which is probably a bum steer.
About a month ago, EEMBC, an industry consortium that develops benchmarks for embedded hardware, rolled out “ADASMark,” an autonomous driving benchmark suite, which is now available for licensing.
The new tool suite, according to EEMBC, is designed to help tier ones and carmakers to optimize their use of compute resources ranging from CPU to GPU and hardware accelerators when they design their own ADAS systems.
Mike Demler, a senior analyst at The Linley Group, welcomed ADASMark, noting, “It’s good to see that this is not just an abstract performance metric, but they used real workloads.” Demler said that participation from AU-Zone Technologies — a Calgary-based engineering design services company — and chip vendors such as NXP Semiconductors and Texas Instruments made EEMBC's test more meaningful than, for example, Baidu’s generic DeepBench.
It’s all about frameworks
EE Times caught up with Peter Torelli, EEMBC president and CTO, to ask about challenges automakers face as they set out to design highly automated vehicles.
There’s no question that more and more automotive embedded systems deploy multiple cores. However, as Torelli pointed out, “there are still very few frameworks that can utilize their asymmetric compute resources.” He added, “Without a framework, every instance of the compiled benchmark would vary dramatically depending on the hardware, and make comparisons across platforms extremely difficult. Frameworks facilitate portability with very little modification.”
Consider the ADASMark Pipeline below, he said.
Torelli said: “The baseline performance of this system might be using the same CPU for all stages in the pipeline. But what if a developer wanted to swap in a custom neural-net chip for the last stage? Or perhaps use a dedicated DSP for the color space conversion?”
This is where a framework comes in.
“Without a framework the developer would need to insert code to interface between the benchmark and the compute device (NN, DSP or GPU). This is time consuming, complicated, and error prone, and can easily disrupt the intent of the benchmark (or corrupt the results).”
A framework makes this retargeting of compute devices much easier, Torelli explained.
EEMBC initially examined options available on the market today. “AMP and OpenAMP attempt to address this, but they are specifications for symmetric multicore, and they don't really help us here,” said Torelli. “We also looked at OpenCV and OpenVX, but support was spotty among the landscape of manufacturers.”
That’s how EEMBC came to develop ADASMark based on a new framework with a more relevant workload.
Focus on imaging pipeline
Key features of the ADASMark Benchmark Suite, according to EEMBC, “include an OpenCL 1.2 Embedded Profile API to ensure consistency between compute implementations; application flows created by a series of micro-benchmarks that measure and report performance for SoCs handling computer vision, autonomous driving, and mobile imaging tasks; and a traffic sign recognition CNN inference engine created by Au-Zone Technologies.”
Because ADAS requires compute-intensive object-detection and visual classification capabilities, ADASMark’s focus is on the imaging pipeline. It looks to use “real-world workloads that represent highly parallel applications, such as surround view stitching, contour detection, and convolutional neural-net (CNN) traffic sign classification,” EEMBC explained.
How ADASMark works
So, how does ADASMark work?
With its focus on object recognition, ADASMark uses “a collection of visible-spectrum, wide-angle cameras placed around the vehicle, and an image processing system which prepares these images for classification by a trained CNN,” EEMBC explained. “The output of the classifier feeds additional decision-making logic such as the steering and braking systems. This arrangement requires a significant amount of compute power.”
Identifying the limits of the available resources and figuring out how efficiently they are utilized is no cakewalk.
To address this challenge, the ADASMark benchmark combines “application use cases with synthetic test collections into a series of microbenchmarks that measure and report performance and latency for SoCs handling computer vision, autonomous driving, and mobile imaging tasks,” the group explained.
More specifically, this is how EEMBC described how its ADASMark works:
…The front-end of the benchmark contains the image-processing functions for de-warping, colorspace conversion (Bayer), stitching, Gaussian blur, and Sobel threshold filtering—which identifies regions of interest (ROI) for the classifier.
The back-end image-classification portion of the benchmark executes a CNN trained to recognize traffic signs. An input video stream comprised of four HD surround-cameras is provided as part of the benchmark. Each frame of the video (one for each camera) passes through the directed acrylic graph (DAG).
At four nodes (blur, threshold, ROI, and classification) the framework validates the work of the pipeline for accuracy. If the accuracy is within the permitted threshold, the test passes.
The performance of the platform is recorded as the amount of execution time and overhead for only the portions of the DAG associated with vision, meaning the benchmark time does not include the main-thread processing of the video file, or the overhead associated with splitting the data streams to different edges of the DAG. The overall performance is inverse of the longest path in the DAG, which represents frames-per-second.
Who came on top?
So, in the table above, Device C shows the best performance. Torelli told us that Device C was his own cloud-based AWS Nvidia system used during development of the benchmark. Asked then whose chips Device A and Device B were, he told us, “Unfortunately, I cannot name the first two, since they have decided not to publish scores at this time.”
It’s important to note that ADASMark is designed to deal with just one part of Level 2 ADAS. As Demler pointed out, “This is not a criticism, but the ADASMark doesn’t address the higher-level functions required for self-driving. Recognizing traffic signs is important at all levels, so it’s useful and necessary, but it’s not sufficient for L3 to L5 cars.”
Further, as EEMBC stated, “the benchmark time does not include the main-thread processing of the video file, or the overhead associated with splitting the data streams to different edges of the DAG.”
At a time when a new generation of AI accelerators aimed at automotive applications is looking at graph computing (or some sorts), EE Times asked if ADASMark’s scores in the future could differ once the benchmark starts considering the “main-thread processing of the video file,” or “the overhead associated with splitting the data streams.”
Torelli said, “Perhaps, but that is not what the benchmark intends to measure.”
He explained, “There is a tradeoff here because, in a deployed system, the kernels would be more tightly integrated. To mitigate this difference, certain stages of the pipeline are not part of the scoring. Although the time it takes for subsequent OpenCL devices on the same platform to swap kernels is included in the scoring, it is typically orders of magnitude smaller than the workload execution for that stage. We also count the copy-in/copy-out of memory buffers, as this is an important part of heterogeneous computation.”
‘It's a Catch-22’
Asked what surprised him most in developing ADASMark, Torelli told us, “Working with beta-testers, porting our benchmarks between SoC developers exposes the complexity in evaluating heterogeneous compute systems, and each new strategy requires a large step-function of effort from the development team.”
He added, “Most developers are entrenched in the environment that takes the least effort, which in turn walls them off from evaluating other designs. It's a Catch-22: clearly each vendor's customers demand they optimize their own stacks, but those silos leave little room for the developer to leave their local minima and find a potentially better solution.”
More significantly, Torelli observed, “From a hardware perspective, it's hard to judge good/bad at this point: compute needs are churning monthly with each new academic paper on machine learning.” The processor industry typically lags academic work, he said, “But that is not the case today with machine learning/deep learning. So, engineering can't respond quickly enough.”
Impact of ADAS SoCs on ADAS vehicle safety
EE Times recently reported on the initial test results by the Insurance Institute for Highway Safety (IIHS) on the safety performance of ADAS-equipped vehicles, in a story entitled “Not All ADAS Are Created Equal.”
Are there any correlations between performance of ADAS SoCs and the safety performance of ADAS vehicles? We asked Torelli how much of the ability of individual ADAS chips is affecting variability of the ADAS vehicle performances today.
From our story, Torelli singled out a quote by VSI Labs’ Phil Magney, who said, “A lot of performance variance is found on these systems because there are so many elements of the HW/SW configurations.” Torelli said, “That’s an understatement!”
He acknowledged, “In our case, the benchmark is not capturing the response time of the system above the vision pipeline, such as the decision-making logic.”
“With the exception of contour detection, every stage of the ADASMark pipeline has deterministic runtime in proportion to the input,” he noted. “The response time of both the decision-making logic and the physical systems, in my opinion, is the issue.”
Referring to the "mixed bag" EE Times noted in describing IIHS’s test results, Torelli said, “It appears to be related to decision-making systems themselves being at the mercy of the algorithms and not the hardware; the physical response systems (braking, steering, etc) have their own environmental variables that vary widely.”
He concluded that ADASmark is “a useful analysis tool for comparing (largely deterministic) compute behavior of asymmetric hardware, which is itself is a complicated task that we sought to address.”
Meanwhile, when asked about the impact of ADAS SoC performance on the safety of ADAS vehicles, Linley Group’s Demler acknowledged that IIHS report was also about a set of L2 tests.
But he added that it’s hard to point to just one hardware component that could cause that variability. “To start, I’d look at the sensors and sensor-processing software, before I looked at something like an EyeQ3 or DrivePX processor used in some of those cars.”
Benchmark sensor fusion?
EEMBC’s current ADASMark is focused on vision. How does EEMBC plan to benchmark the performance of SoCs for sensor fusion — integrating sensory data from radars, lidars, and others?
Torelli said, “Radar and Lidar are still vision (or vision-like) pipelines. I don't expect to explore those areas as I don't see much variation as a performance metric at this point.” However, he added, “sensor fusion and decision-making logic, that is definitely an area of interest, but I believe it crosses over into a different domain of machine learning. Whether or not that is covered by our ADAS group or our machine learning group remains to be seen.”
— Junko Yoshida, Global Co-Editor-In-Chief, AspenCore Media, Chief International Correspondent, EE Times