The oldest, tried-and-true debugging technique in embedded development is to pepper printf statements throughout the software in the hope of gleaning insight into system behaviour. Using printf is not always advisable, however, as it can have unforeseen real-time implications. Let's examine the fundamental issues with printf and then a few techniques that can be used to get the most performance from it.
The issues with printf
The use of printf comes with a few problems that developers often overlook. The first such problem is that the technique requires a developer to bring a standard C library into the software. This will undoubtedly increase ROM and RAM usage. A second problem is that every time a printf stamen is used, the system becomes blocked until all characters have been transmitted. This blocking can result in significant real-time performance degradation. Take, for example, the output of a simple string such as "Hello World!" to be printed out through a UART at 9600 baud (still a very common occurrence). I performed a simple timing measurement on an STM32 and, as shown in figure 1, it took 12.5ms for the string to be formatted and printed to the terminal. During this time the system can do nothing else.

EEIOL 2016JUL06 TA 01Fig1 Figure 1: Printing "Hello World!"

Adding any string formatting makes the situation even worse! Printing the system state to the terminal using printf("The system state is %d", State) results in a 21ms application delay as the string is formatted and transmitted. One might argue that running at 9600 baud is ridiculous but even increasing to 115200 would still result in 1.05 and 1.75ms respectively to transmit these two messages. That's a lot of processor bandwidth and potential real-time performance hits for minimally useful information. Now, on to how to address these problems.
Performance Technique #1 – Create a non-blocking printf
Every printf version that I have ever encountered in the wild has been the blocking type. Once a call to printf is made the application stops execution until every character has been successfully transmitted. Amazingly inefficient! An alternative, then, is to create a non-blocking version. A non-blocking printf version will:

  • format the string
  • stuff the formatted string into a transmit buffer
  • initiate the transmission for the first character
  • let an interrupt service routine handle the remaining characters in the transmit buffer continue executing code.

The big hit for a non-blocking printf is the setup time, which on the STM32 at 9600 baud I found to vary between 0.8 and 1.8ms. After the initial setup time, a transmit interrupt occurs approximately every one millisecond. The routine then requires only 35µs to stuff the next character into the UART transmit register before getting back to doing useful work. Figure 2 shows the periodic interrupt and also the interrupt execution time. Keep in mind that the execution time does not include the interrupt overhead, which is less than 25 clock cycles in this case.

EEIOL 2016JUL06 TA 01Fig2Figure 2: Non-blocking printf performance.

Performance Technique #2 – Increase the baud rate
It blows my mind that so many developers still will default their UART to 9600 even though serial hardware today can handle baud rates of 1Mbit/s or better! Occasionally, I'll encounter someone bold enough to set the baud rate at 115200. But unless there is a potential electrical or hardware related issue with running up the clock, there is nothing wrong with setting the baud to 1Mbit/s and getting debug messages out as fast as possible in order to minimise real-time performance issues. At 1Mbit/s the original blocking printf for "Hello World!" would only block for 120µs. That's far more acceptable than 12.5ms.
Performance Technique #3 – Use SWD
Modern day microcontrollers had the printf performance issues in mind when their creators developed the silicon. For example, developers that take advantage of the ARM Cortex-M's debugging capabilities can skip the UART altogether and use the internal debug module to transmit printf messages back through the debugger to the IDE. Skipping the UART in this manner not only saves setup, the internal hardware mechanism minimises software overhead. An internal buffer gets filled with the message and the debug hardware automatically handles transmission to the debug probe, which results in minimal impact on the application's real-time performance.
Few developers are going to toss out their favourite, tried-and-true printf debugging techniques. In today's modern microcontroller hardware, though, there exist multiple options for improving the performance and efficiency of printf that minimise impact on real-time performance. For developers looking to try these improvements themselves, I've put together a Keil project for the STM32 that demonstrate how to use these techniques.
About the author
Jacob Beningo is a Certified Software Development Professional (CSDP) whose expertise is in embedded software. He works with companies to decrease costs and time to market while maintaining a quality and robust product.