Embedded compilers originally designed for traditional applications are in need of modifications to be able to meet the high-level design requirements of ADAS.
Major automotive OEMs and software suppliers are commited to developing advanced driver assistance systems (ADAS). However, the demands ADAS applications place on compilers and toolsets should also be noted. Traditional automotive applications are different from ADAS and adaptations to current compilers are needed to better address the ADAS requirements.
To better support the task of driving autonomously, vehicles need to be much more aware of their surroundings. Several new sensors (radar, lidar, cameras, etc.) can be used to detect road markings, other vehicles, obstacles, and other relevant environmental data with high resolution as shown in Figure 1. In the past, it was common practice for automotive systems to process only individual measurements from specific actuators (steering angle, pedal positions, various engine sensors, etc.) in real time.
Figure 1: Areas wherein different ADAS applications use sensor-based monitoring.
As is common with physical measurements, however, the environmental data acquired for ADAS applications are subject to noise (as shown in Figure 2) and measurement errors. Therefore, the data require electronic post-processing by hardware and software before they can be used for their ultimate purpose, i.e., to automatically offload decisions from the driver. However, this post-processing does not always deal with individual measurements like before. Quite frequently, data from different sources get consolidated (sensor fusion) for reduced error susceptibility.
In order for ADAS to automatically make decisions on the driver's behalf, it must process a tremendous amount of data in real time. Further, that data is complex. Traditionally, just isolated sensor data involving only some integer or fixed-point numbers with 1-5kbps data rates needed to be processed. Today, data are often provided as floating-point numbers (floats/doubles) at high rates. Camera images, for instance, provide approximately 340 kbps and radar data of around 1.5Mbps.
Figure 2: An environmental sensor's noisy signal and its filtered result.
Obviously, ADAS applications require a lot more processing power than traditional automotive applications. But currently it is very hard to predict which high-performance hardware architectures will prevail for these kinds of applications. Because ADAS applications must be produced in a reusable and cost-efficient manner, it is clear that compiler support will be required for all architectures. This requirement mandates the use of abstract, portable design methodologies (e.g., C++11/14), model-based design, and additional technologies including parallel programming (e.g., OpenCL, Pthreads). Furthermore, highly optimised, certified libraries will be required to implement standard operations efficiently, safely, and with maximum hardware independence. Because ADAS applications intervene with the driving process, these applications and the hardware used to execute them must also adhere to relevant safety standards (ASIL-B or higher; ISO 26262).
Finding the perfect match
For companies developing ADAS applications, the fact that no specific hardware architecture has prevailed until now creates a risk. In general, major hardware accelerators—including the Nvidia GPU derivatives (Drive PX)—provide adequate computational power in the Teraflops range for the data-parallel parts of ADAS applications. However, apart from lacking sufficient safety features, these devices are rather cost-intensive in terms of their power consumption and purchasing price. On the other hand, typical architectures for safety-critical applications up to ASIL-D (incl. AURIX or RH850) have not yet utilised some hardware-based opportunities to achieve higher data rates because these will be hard to certify according to ASIL-D.
OEMs or large suppliers of ADAS systems therefore are in danger of selecting an architecture that may fail in the market because it is too large, too expensive, or cannot meet the safety requirements. On the other hand, there is a risk in selecting an architecture that fully supports safety-critical applications that it is too small for the more demanding computations. During the development process, it might turn out that the envisioned application cannot be implemented for efficiency reasons.
Thus, the requirements of ADAS projects are quite complex. On the one hand, it is mandatory that developers create very efficient, target-specific code, meet all safety goals, and minimise the risks outlined above. On the other hand, portable and high-level design methods are necessary to enable cost-effective application development. These high-level design requirements mandate modifications of the embedded compilers that were originally designed for traditional embedded applications.
Code structure efficiency
One necessary new compiler feature is the need to support the typical code structures of ADAS applications in order to create highly efficient code for this kind of application. The data structures used and the operations they are subjected to in an ADAS application differ fundamentally from those found in classical applications, and the code used for sensor fusion and analysing sensor data is commonly generated using model-based tools like BASELABS. For instance, the speed-relevant part of ADAS code often involves arrays (vectors and matrices) of floats and doubles, which are subjected to linear algebraic operations such as matrix multiplication, inversion, singular value decomposition (SVD), and the like. Using these operations, the system combines the sensor data arrays to compute an abstract representation of the environment that is then used as the basis for decision making (e.g., for detecting objects in an image or for tracking and assigning the spatial position of an object).
Highly optimised libraries
This heavy use of arrays and floating-point data means that entirely new optimisations are required for compilers in order to provide efficient results. For instance, the most commonly used linear algebraic functions are typically provided by libraries that are highly optimised for the specific target architecture. All computations not included in the libraries, therefore, must be well optimised by the compiler in order to prevent these computations from becoming the bottleneck.
Many of the performance-critical computations in ADAS applications are based on a set of standard linear algebraic operations. The overhead resulting from porting such ADAS applications to different target architectures can be reduced dramatically if a standard interface is used for these standard operations. Libraries supporting a standard interface and which are highly optimised for the specific target architecture are available for the most relevant hardware platforms, including LAPACK from Tasking for embedded, cuBLAS from Nvidia, and Intel's Integrated Performance Primitives.
Quite often, these libraries are as much as an order of magnitude faster than open-source offerings or in-house implementations. Consequently, a standard interface-based application can immediately achieve excellent efficiency even on new target platforms without the need for designers themselves to optimise and test the underlying, performance-critical computations in the target specific library. Note, however, that not all libraries are adequately certified for safety-critical applications or suitable for embedded systems.
An additional new capability required from embedded compilers is the support of current languages like C++11/C++14. The goal in ADAS design is to improve the code’s reusability and to achieve more with fewer lines of code, without giving up the efficiency provided by closeness to the hardware. C++ classes and inheritance are time-tested methods to write such code on a higher, more abstract level.
C++11 and later variants support these methods but offer significant advantages over the older C99 language standard. Furthermore, C++11 (and C11) finally provide the opportunity to write portable, parallel programs. The computational overhead, considering all response-time requirements, of many ADAS applications often exceeds the capabilities of sequential processing implemented on a single core. Parallel and multicore processing, therefore, is a common ADAS system requirement.
Older standards like C99 do not acknowledge parallelism, so programmers using those languages must have excellent hardware and compiler knowledge to correctly write a parallel program. Programmers must, for instance, exclude specific data ranges and code sections from parallel accesses in order to ensure that no data updates are lost or incorrectly read during access. Programmers must also insert barriers (mutexes) into the code to keep critical sections from being executed by more than one core at a time.
However, the barrier insertion technique only works if the compiler is aware of these barriers. Without such awareness, the compiler may move code sections out of the protected parts during compiler or hardware optimisations. Before the advent of C11/C++11, there was no uniform way to notify the compiler of such barriers. So, programmers had to disable important optimisations altogether, resulting in significant efficiency degradations, or they misused attributes like 'volatile' to restrict compiler optimisations. It has now become generally accepted that using the 'volatile' attribute is not sufficient for writing correct and portable parallel code.
Mutexes and the like are now part of the C++ standard, however, so a C++11 compiler is aware of all barriers. The compiler can therefore prevent the application of optimisations when necessary without incurring any unnecessary speed penalties. Instead of using optimisations, programmers use the 'atomic' attribute introduced by C11/C++11. With 'atomic', the compiler generates code that addresses the hardware so as to expose the expected behaviour and generates a minimum performance overhead. Programmers can then focus their efforts on their main task, i.e., the code’s functionality, instead of trying to generate specific code patterns via unsuitable means like 'volatile' and optimisation inhibition.
Unfortunately, it is generally not possible to detect all the program sections and data that have incorrect protection from parallel access. Sometimes, programs with incorrect protection will not generate any compiler errors and thus appear to operate correctly. Yet these programs can spontaneously produce false results as a result of subtle timing issues. These errors generally appear only after very long testing times. Further, they are difficult to reproduce because they depend on relative execution times and time-related disturbances within the system.
Thus, it is not quite trivial to write correct parallel code even when using C11/C++11. Self-written parallel code also bears the risk of being correct but only marginally faster (or even slower) than the easier-to-maintain functionally equivalent sequential code. Fortunately, libraries like EMB² and LAPACK can be used with relatively little risk, as they were written by experts in this field. As an additional advantage, these libraries ensure a relatively large speed increase due to their parallelism and optimisation.