Designing any kind of system to be truly safe is a challenge. The first requirement is to define the term "safe" and consider the implications of it being unsafe. A system is a combination of hardware and software and these each contribute to the safety and integrity of the system as a whole. This article looks at the basic considerations for designing for safety. A number of industry segments demand systems that comply with specific safety requirements. Obvious examples are mil/aero and medical and, of course, one which we bump into every day: automotive.
A true story
Some years ago, this guy – we will call him "R" – got a new car. It is not useful to identify the make/model here, as things will have changed, but suffice it to say that it was an up-market, expensive vehicle, where quality would be expected. Soon after taking delivery of the car, R was driving at some speed on the freeway, when the vehicle suddenly started to pull hard to the right. He quickly realized that the brakes on one side of the car were being applied automatically! He got the car under control and drove it straight to the dealers who had sold it, demanding immediate attention. A day later, they returned the car to him, admitting that they had not found a fault, but had replaced a bunch of stuff and assured him that it should be fine now. It was not fine. Next time R drove at speed, the problem occurred again. Once again, he returned the car and said that he would not accept it back until they had unambiguously identified the fault and rectified it. It took a couple of days. They discovered that the ESP sensor was faulty and triggered the system unnecessarily. This sensor is located under the driver's seat and essentially detects rotational movement of the car; the braking is intended to offset that motion to avoid the car going into a spin. R was not really the victim of a faulty car. It was a major design flaw, which is so elementary that the manufacturers of the car should be deeply ashamed.
Garbage in, garbage out
In the world of data processing, there is an old jargon term: GIGO â€" garbage in, garbage out. This simply explained that a system, however well implemented, will only produce valid results if it has valid input data. A typical safety critical embedded system has essentially two parts: sensors to gather data and software to process it. For a system to be safe, each of these must given due attention.
Automotive systems
Modern cars are stuffed full of microprocessors and microcontrollers that perform a variety of functions, which vary in their importance from convenience to safety critical. It is interesting to consider how each of these different type of systems is implemented. Systems in cars may be broadly divided into 3 categories, depending on their safety requirements:

  1. Convenience systems, which add to the comfort and pleasure of using the vehicle, but are only an inconvenience if they malfunction; an example is climate control.
  2. Non-critical safety systems, which add to the safety of the vehicle, but do not render the vehicle unsafe if switched off, but may introduce problems if they malfunction; an example is an electronic stability program (ESP).
  3. Critical systems, the correct functioning of which is essential to the safe operation of the vehicle; the braking system is a good example.

All of these systems will have software control but rely on sensors to determine their operation
Sensors – data in
It is impossible to design a sensor that is guaranteed to be 100% reliable. The trick is to design systems that are reliable, despite being aware of this unfortunate fact. Looking at car systems, in terms of sensors:

  1. A convenience system needs a single sensor. If it malfunctions and the user notices, they can switch off the system and have it repaired at their convenience.
  2. A non-critical safety system needs a pair of sensors. The system compares their signals and, if they agree within a defined margin, it responds normally. If the signals disagree, the system shuts down and displays a yellow indicator, which tells the driver that the system should be fixed sometime soon.
  3. A critical system needs three (or a larger odd number) of sensors. The system compares their signals and, if they all agree, it takes appropriate action. If one signal disagrees, the system takes action based on the other two, but illuminates a red indicator to advise the driver that urgent attention is required. If all three sensors disagree, the system inhibits further use of the vehicle immediately. R's ESP system was a non-critical safety system, which had a single sensor instead of two, so it could not "fail safe". A critical system normally has three sensors. This is acceptable in automotive and industrial systems. For mil/aero, five sensors are likely to be required. For space systems, seven is more likely. The space shuttle orbiter, for example, had seven sensor/servos controlling each flying surface, each with its own computer, which was supplied and programmed by a different contractor. Is that overkill?
    Software – data processing
    Having ensured that sensors are reliably providing valid data, the software needs to process it in an equally safe way. Producing safety critical software entails two phases: implementation of the software using best practices for design and programming and the inclusion of well-proven software IP; and certification of the software according to the requirements of the specific application. Best practices for software implementation vary from one industry to another, but broadly aims to make software that is well designed, with clean, well-defined interfaces, and readable code with documentation that makes functionality transparent. Commonly, questionable programming language elements and practices are explicitly excluded. In the automotive world this has led to the development of MISRA C, which defines a clear subset of the language that may be used. Compliance with standards is obviously key. In automotive, AUTOSAR is an an open and standardized software architecture for automotive electronic control units excluding infotainment (where GENIVI might be used). For real time systems, an operating system complying with the OSEK/VDX standard is likely. After software implementation is complete, most safety critical systems will also require some form of certification. This typically entails a line by line analysis of the code, along with well documented testing procedures. The certification process normally requires access to the source code to the entire application (including any licensed software IP).
    Developing a safety critical system requires an approach that addresses that system as a whole. Hardware and software developers need to cooperate in order to ensure that valid data is provided initially and that its processing is handled in a reliable fashion.
    About the author
    Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with Mentor Embedded [the Mentor Graphics Embedded Software Division], and is based in the UK.