Taking the Mystery Out of Vision AI

Article By : Nitin Dahad

How do you actually get data from a camera, embed machine learning to carry out the inference algorithm, and enable something useful to be processed?

The benefits of adding vision to everyday products have attracted the attention of many industries and sectors. But how do you actually get data from a camera, embed machine learning on the device to carry out the inference algorithm, and enable something useful to be processed?

The answer, as in any system design, is having the right software, tools, libraries, compilers, and so on. It’s no surprise that such capabilities are generally beyond the grasp of non-engineers. But even among embedded systems developers, the knowledge and skills required for vision systems design are considered to be in short supply.

One of the big challenges is lack of awareness of the software and tools available for developing embedded vision systems, as Jeff Bier, founder of the Edge AI and Vision Alliance, explained in a briefing with EE Timesahead of the 2021 Embedded Vision SummitWhile there has been significant investment and research on the algorithms and the silicon, the software tools that mediate between them — compilers, optimized function libraries, and so on — have been somewhat neglected, said Bier.

Using the right software tools, compilers, and libraries can yield a very efficient implementation of an algorithm for a particular microcontroller or processor. But “over the last 30 years, semiconductor companies have typically under-invested in software tools,” said Bier, a veteran of the embedded systems industry and signal processing in particular. “[Software] is often viewed as a necessarily evil — it’s a cost center in a very cost-sensitive business — and it shows. As an embedded software developer, you might look at the tools available [to you] in comparison to what PC or cloud developers have and feel like the unloved stepchild.”

Taking this argument to the next level, he added, “You might have embedded systems expertise, but there is a chance that these developers may have never worked with image data or deep neural networks.”

Skills are a big challenge, said Bier. “We may have done some spreadsheet calculations, and said, ‘Yes, it is possible to run this kind of deep neural network at sufficient performance in our application.’ But do we know how to do that? Do we have the skills? For most organizations the answer is no, because they have not had the opportunity to use this technology in the past. Since it’s a relatively new technology in the commercial world, they don’t have the expertise. They don’t have a machine learning department or a computer vision department in their company.

“In the last couple of years, this has turned into a really big bottleneck with respect to commercial application of computer vision and deep neural networks — just the know-how.”

The technology is becoming more accessible, however, as companies have sought to address the skills gap over the past couple of years. “The knowledge and skills gap has been quite big but is getting smaller,” said Bier, adding that “a couple of companies,” one big and one small, “have led the charge on this.” The large company is Intel; the small one is Edge Impulse.

“Intel has often impressed me as bucking the trend and willing to make big investments in software tools in a number of ways,” said Bier. “They have, for example, the OpenVINO tool chain for edge computer vision and inference, and DevCloud for the Edge. Edge Impulse is also a cloud-based environment. To an embedded developer, that [the cloud environment] feels weird. Everything to them is often on their desk — the dev board, the workstation, the tools — and they don’t even need an internet connection. Everything is very local. So, it feels very strange to say, ‘Put your code in the cloud’ and run the tools in the cloud.”

The trend addresses time to deployment as well as the skills gap. A frequent frustration for embedded developers is getting access to boards and tools and getting them properly installed, Bier said. The timeline is “usually measured in weeks, sometimes in months, and that’s painful, especially if at the end of that you realize that’s not what you needed and you need to repeat the process with some other boards.” For example, you might find at the end of the process that “you need the next one up, with higher performance or a different set of I/O interfaces.”

But if the supplier “has all the development boards in the cloud connected to their machines and [can] access them at will, that offers tremendous convenience. Likewise, they have got the latest versions of the software tools, and they’ve sorted out all the dependencies between them.”

Paving the way for implementing vision
So how do you speed up the deployment of embedded vision to enable features such as object detection and analysis, whether for smart cities, factories, retail, or any other application?

Having realized the pain points described by Bier, companies are addressing them. Some now offer tools such as cloud based-development systems that allow you to feed your code or data and get evaluations in next to no time. Others provide reference designs that allow you simply to plug in your camera output and choose from libraries or apps that provide inference algorithms for common applications.

In the former camp, Intel DevCloud for the Edge and Edge Impulse offer cloud-based platforms that take most of the pain points away with easy access to the latest tools and software. In the latter, Xilinx and others have started offering complete systems-on-module with production-ready applications that can be deployed with tools at a higher level of abstraction, removing the need for some of the more specialist skills.

Prototype, benchmark, and test AI inference in the cloud
Intel DevCloud for the Edge lets users develop, prototype, benchmark, and test AI inference applications on a range of Intel hardware, including CPUs, integrated GPUs, FPGAs, and vision processing units (VPUs). With its Jupyter Notebook interface, the platform contains tutorials and examples preloaded with everything required to get up and running quickly. This includes pretrained models, sample data, and executable code from the latest version of the Intel distribution of OpenVINO toolkit, as well as other tools for deep learning. All supporting devices are configured for optimal performance and ready for inference execution.

Intel devcloud how it works
Intel DevCloud for the Edge lets developers prototype computer vision solutions in Intel’s cloud environment and watch their code run on any combination of its available hardware resources. (Source: Intel)

The most significant benefit for the developer is that the platform does not require any hardware setup at the user side. The Jupyter Notebook’s browser-based development environment enables developers to run code from within their browser and to visualize results instantly. This lets them prototype computer vision solutions in Intel’s cloud environment and watch their code run on any combination of its available hardware resources.

There are three main benefits to this cloud-based offering. First, it addresses the issue of hardware choice paralysis. Developers can run AI applications remotely on a wide range of hardware to determine which is best for their solution based on factors such as inference execution time, power consumption, and cost.

Second, it offers immediate remote access to the latest Intel edge hardware. On the software side, it addresses the issue of having to deal with outdated software, since it provides instant access to the latest version of Intel Distribution of OpenVINO toolkit and compatible edge hardware.

And third, it offers access to application-specific performance benchmarks in an easy-to-compare, side-by-side format.

(A tutorial on running object detection models using Intel DevCloud for the Edge is available here.)

Build a model in the cloud, see what happens live
Another approach is to feed data into a cloud platform to visualize and create training models and deploy them on embedded devices. Edge Impulse does just that, offering a cloud-based development environment that aims to make it simple to add machine learning on edge devices without requiring a Ph.D. in machine learning, according to the company.

Its platform enables users to import image data collected from the field, quickly build classifiers to interpret that data, and deploy models back to production low-power devices. A key to the Edge Impulse web platform is the ability to view and label all the acquired data, create pre-processing blocks to augment and transform data, visualize the image dataset, and classify and validate models on training data straight from the user interface.

Because it can be quite hard to build a computer vision model from scratch, Edge Impulse uses a process of transfer learning to make it easier and faster to train models. This involves piggybacking on a well-trained model and retraining only the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets. With the model designed, trained and verified, it is then possible to deploy this model back to the device. The model can then run on the device without an internet connection, with all its inherent benefits, such as minimizing latency, and runs with minimum power consumption. The complete model is packaged with pre-processing steps, neural network weights, and classification code in a single C++ library that can be included in the embedded software.

Edge Impulse trained model with accuracy
A trained model showing predicted on-device performance estimations in Edge Impulse’s web-based user interface (Source: Edge Impulse)

Going to a higher level of abstraction
Another approach being offered by vendors is to reduce development time by offering module-based systems and enabling design at a higher level of abstraction. Xilinx said that its new systems- on-module (SOMs) approach can shave up to nine months off the development time for vision systems, by addressing the rising complexity in vision AI as well as challenges for implementing AI at the edge.

Xilinx recently announced the first product in its new portfolio of SOMs: the Kria K26 SOM, specifically targeting vision AI applications in smart cities and smart factories, along with an out-of-the-box ready, low-cost development kit, the Kria KV260 AI vision starter kit.

Chetan Khona, director of industrial, vision and healthcare at Xilinx, said at the press briefing to launch the new module family, “Production-ready systems are important for rapid deployment [of embedded vision AI]. Customers are able to save up to nine months in development time by using a module-based design rather than a device-based design.” He added that with the starter kit, users can get started within an hour, “with no FPGA experience needed.” Users connect the camera, cables and monitor, insert the programmed microSD card and power up the board, and then can select and run an accelerated application of their choice.

The Kria SOM portfolio couples the hardware and software platform with production-ready vision-accelerated applications. These turnkey applications eliminate all the FPGA hardware design work; software developers need only integrate their custom AI models and application code, and optionally modify the vision pipeline — using familiar design environments, such as TensorFlow, Pytorch, or Café frameworks as well as C, C++, OpenCL, and Python programming languages.

Xilinx pre-built hardware-software platform Kria SOM
Xilinx’s Kria systems-on-module provide pre-built hardware with helpful utilities to allow developers to drop in their differentiation using their preferred design environment. (Source: Xilinx)

The Kria SOMs also enable customization and optimization for embedded developers with support for standard Yocto-based PetaLinux. Xilinx said a collaboration with Canonical is also in progress to provide Ubuntu Linux support, the highly popular Linux distribution used by AI developers. This offers widespread familiarity with AI developers and interoperability with existing applications. Customers can develop in either environment and take either approach to production. Both environments will come pre-built with a software infrastructure and helpful utilities.

We’ve highlighted three of the approaches that vendors are taking to address the skills and knowledge gap, as well as the deployment time, for embedded vision systems development. The cloud-based approaches offer tools that “democratize” the ability to create and train models and evaluate hardware for extremely rapid deployment onto embedded devices. And the approach that offers a module, or reference design, with an app library allows AI developers to use existing tools to create embedded vision systems quickly. These are all moving us to a different way of looking at development boards and tools. They take the mystery out of embedded vision by moving up the value chain, leaving the foundational-level work to the vendors’ tools and modules.

This article was originally published on Embedded.

Nitin Dahad is a correspondent for EE Times, EE Times Europe and also Editor-in-Chief of embedded.com. With 35 years in the electronics industry, he’s had many different roles: from engineer to journalist, and from entrepreneur to startup mentor and government advisor. He was part of the startup team that launched 32-bit microprocessor company ARC International in the US in the late 1990s and took it public, and co-founder of The Chilli, which influenced much of the tech startup scene in the early 2000s. He’s also worked with many of the big names – including National Semiconductor, GEC Plessey Semiconductors, Dialog Semiconductor and Marconi Instruments.

Leave a comment