Deep Learning Software Accelerator to Speed AI Deployments

Article By : Maurizio Di Paolo Emilio

DeepCube provides a software-based inference accelerator that allows an efficient implementation of deep learning models on intelligent edge devices...

Today, there is a lot of talk about artificial intelligence (AI). Any technological, electronic, recreational and IT sector is increasingly exploring AI. In particular, the world’s attention is focused on deep learning, a specific sub-field of AI. It involves extrapolations of expressions from our brain, neural networks and virtual realities.

DeepCube provides a software-based inference accelerator that allows an efficient implementation of deep learning models on intelligent edge devices. DeepCube was co-founded by Eli David, a leading deep learning expert (see figure 1), and Yaron Eitan, a serial technology entrepreneur with over thirty years of experience. David has published over 50 papers in leading AI publications and previously co-founded another deep learning company, Deep Instinct. The DeepCube team is made up of 15 researchers, most of which are former masters and doctoral students. They have spent the last two years working to bring advanced technology that will have a major impact on deep learning deployment in the real world.

According to these scientists, most automated activities, such as self-driving cars or medical image analysis, are all based on deep learning. Such sophisticated deep learning models turn out to be quite large, which results in extremely heavy processing requirements and costs. For this purpose, it is necessary to use dedicated hardware, which is also very expensive, with a large amount of memory. For these reasons, it is almost impossible to implement these models directly on edge devices, such as mobile devices, drones, cameras or autonomous cars.

The problem can be solved by using third-party services to increase the computing power: external servers, clouds, remote PCs, etc. However, that system is not very practical. In addition to having the problem of high latency and the lack of real-time response (also due to the low bandwidth), the costs would be too high.

“Also, you don’t always have continuous connectivity with most edge devices. DeepCube is trying to fill that gap by creating a dedicated artificial intelligence accelerator. It achieves the same goal but 10-20 times faster, through software,” said Eli David, co-founder of DeepCube.

Through a series of revolutionary innovations, DeepCube has made deep learning models about 10 times leaner. This inevitably has some advantages: it makes them much faster, by a factor of about 10 times, and they consume less memory. Therefore, energy consumption decreases.

Figure 1: DeepCube leadership

Our brain as a reference

DeepCube’s technology takes inspiration directly from the functioning of our brain. The human brain has more “connections” and better capacity between neurons not in adulthood, but when you are between two years of age and adolescence. Much of the learning does not happen by adding new connections, but by removing unnecessary or redundant connections. In this way, for any type of deep learning model during the training phase, more than 90% of connections are removed, while maintaining the same accuracy.

DeepCube’s proprietary software can be run on any existing platform of any type. It doesn’t matter if it is very slow or fast hardware, with Intel x86 or AMD architecture, ARM or Nvidia GPU, the speed will be an average of 10 times faster for any deep learning model. The founding team pointed out that, “accelerating inference is an easy way to measure improvement.”

DeepCube is ready

The technology is now mature and reliable (figure 2). Several series of demonstrations and tests have been successfully performed with the most important semiconductor manufacturers in the world, who have tested the technology on their hardware independently. The next step is to start a commercial partnership for solid collaborations, in order to distribute this technology more broadly.

Figure 2: the DeepCube Pillars

Some sample tests

The main tests conducted thus far have focused on artificial vision models. For example, object recognition or facial or voice recognition. Another area of application is the natural language area, which includes the Bert model, one of the largest deep learning models and which requires several Gigabytes of memory. All these categories have been tested on different sub-models, chosen by potential partners. The lowest rate of acceleration was equal to a 5x factor, while the best rate was around 20x. On average it is about 10x, with variable results in accuracy.

Today, there are excellent models for the use of autonomous driving, particularly for the recognition of objects and pedestrians. But the real problem is related to the implementation of such models inside a car. You cannot use low-end hardware as this would lead to the danger of getting quick responses on the road. When driving a car, even one second could be too long to wait for the vehicle to respond to an object on the road. Some models also require a Gigabyte of memory and their insertion is impossible for most peripheral devices. With the possibility to make everything over 10 times smaller, implementation on smaller systems becomes a reality.

“There are many different ways in which companies are trying to deal with this problem, however, when most companies have tried to reduce the size of the model, they also lose accuracy,” said David. “Another approach that companies like Tesla are trying, is to create a dedicated chip that is capable of performing very complex processing inside cars,” he continued. Obviously, the latter solution does not offer much flexibility. In case of model improvements, new chips must be produced.

The world will increasingly demand higher accuracy, precision and the ability for machines to make important decisions on their own. It is not just a matter of speeding up this process, but a matter of having the possibility to implement advanced technology within limited dimensions. Think, for example, of a security camera or a drone that must decide in real time how to respond to certain situations.

In the short-term, DeepCube is focused on further maturing its technology that makes deep learning training models much faster, but that is just one pillar of deep learning that they hope to improve and achieve breakthroughs for over the next few years as they continue their research and implementation.

Virtual Event - PowerUP Asia 2024 is coming (May 21-23, 2024)

Power Semiconductor Innovations Toward Green Goals, Decarbonization and Sustainability

Day 1: GaN and SiC Semiconductors

Day 2: Power Semiconductors in Low- and High-Power Applications

Day 3: Power Semiconductor Packaging Technologies and Renewable Energy

Register to watch 30+ conference speeches and visit booths, download technical whitepapers.

Leave a comment