Can AV software driver sense, detect and predict objects that pop up around a vehicle?
Can an AV software driver sense, detect and predict objects that pop up around a vehicle? Can it grasp the significance of spontaneous events and devise a safe, decisive response as well as could a human driver?
That’s the $64 billion question for which none of us yet has the answer.
But it’s not as if there aren’t any clues. Most AV companies and tech companies, including Waymo, Cruise, Nvidia, Mobileye and a host of AV startups, have attempted to demonstrate a maturity level in their AV software by posting city-driving video footage of their autonomous vehicles. The trouble with these movies is that, often, some scenes are either obviously edited out or cleverly sped up. Cities used for locales are mentioned, but often unmentioned are the time of day and day of the week when filming took place.
Among the AV/tech companies developing self-driving cars, Intel/Mobileye is unique. Mobileye has made public two unedited video clips — one in January during the Consumer Electronics Show, and another in May. The operative word here is “unedited.” (These videos are embedded below.)
Both were shot on busy streets on Jerusalem. Even better, each Mobileye video displays on the same screen three separate video elements: drone footage of a Mobileye AV driving in Jerusalem, a view of a safety driver (showing what he’s doing and his view of the street), and visualization software that renders the machine’s view as the AV tools around.
We at EE Times assumed that this video would offer insights into where in Jerusalem the AV was driving (drone footage), what its sensors (12 cameras, but no other sensors) were capturing (or not capturing), how Mobileye’s AV software was interpreting the world, and how the machine appears to understand the view rendered by the visualization software, which guides the AV’s actions. We also kept an eye on the safety driver to understand what he was seeing and whether any human discrepancies emerged compared to the visualization software’s outlook.
This exercise, done manually, was time-consuming. But when Mobileye announced that China’s Geely Auto Group will use Mobileye’s AV “full stack,” 360-degree camera-only ADAS solution to power Level 2+ electric vehicles starting next year, EE Times realized that our effort would gain significance.
This is because the Mobileye-Geely agreement suggests that Mobileye is importing software originally developed for L4 test AVs — “tried and proven” in Jerusalem — to consumer L2/L2+ vehicles in China. This is the first time AV software and hardware intended for self-driving vehicles are directly targeting consumer ADAS vehicles, to enable “hands-free ADAS.” Mobileye is promising this milestone in less than a year.
We wonder if this aggressive timeline represents the maturation of Mobileye’s AV software driver. Or is it just a sign of Mobileye’s confidence. Before starting its Jerusalem tests, the company is known to have done its homework by outlining upfront details for rigorous design and engineering, much needed in AV development.
In a separate interview done for EE Times’ Weekly Briefing podcast, Jack Weast, Intel’s senior principal engineer and Mobileye’s vice president for autonomous vehicle standards, recently stressed: “You don’t just code up a bunch of stuff and throw it on the road and see what happens” — a practice followed by many AV startups in hopes of attracting VC money. In Mobileye’ case, though, Weast said, “You think deeply about the design of the system. And you try to understand what the design looks like on paper and do formal verification of a design built on paper.”
An Intel spokesperson told us that Mobileye started developing and testing highway pilot in 2013 and have been adding different capabilities along the way, including its HD map (2015), Responsible-Sensitive Safety (2017) and true redundancy (2018). “Extensive on road testing with current configuration started in early 2018,” she added.
What we flagged
After viewing Mobileyes’ video clips repeatedly, often frame by frame, we noticed a few things (we will list them later in this story) that prompted some questions:
Is the visualization software correctly sizing up an object (confusion over a truck vs. a bus)?
Is the AV software tracking an object? Some cars or people suddenly disappeared, as though the software completely forgot about them, but then, they reappear several seconds later. Why?
Should we be worried about these flickering objects?
Why didn’t the vision sensors detect certain objects, such as a motorcycle or the baby buggy in a Mobileye video posted in January? It looks as though the sensors hesitated, for several seconds, before perceiving these objects.
Did the AV get just lucky when it didn’t run over the baby buggy?
Some driving maneuvers by Mobileye’s AV software, like an unprotected left turn, struck us a little too aggressive.
Just to be clear, no AV experts we talked to sounded alarms about what we noted as “potential problems.” We saw them as red flags but the experts did not concur.
“I want to make it abundantly clear — computer vision is a hard problem,” said one AV expert, who spoke on the condition of anonymity. In his opinion, from sensor’s viewpoint, “things did appear differently. It’s possible there is something blocking visibility.” He cautioned, “There is a big difference between what sensors can actually see, and what we can sort of see from other perspectives.”
Phil Magney, founder and principal of VSI Labs, told us, “Regarding your remarks about the visualizer, it is true that some objects disappear and reemerge. This is not uncommon.” According to Magney, “When the confidence momentarily drops you will lose the object from the visualizer. This can happen when there is partial occlusion or when the orientation of the object changes. On some objects they would disappear only to return as the object gets closer. Under these conditions the confidence grows as the target enters a region of interest (ROI) that is critical for the vehicle.”
Asked about the baby buggy, not initially detected on the visualization software screen, Magney said, “Classification is a hard problem. It is based on the neural network and how well it has been trained.”
So, would the problem be that the AV software driver has never seen a baby buggy? Perhaps, but Magney noted that the software driver did not completely ignore the buggy. Look closely, “The subject is momentarily classified with enough confidence but then a few frames later it disappears because of the orientation of the object changes and there is less confidence for a moment, then it improves as you get closer and the subject reemerges again.”
Work in progress
We got in touch with Mobileye to seek guidance.
Shai Shalev-Shwartz, CTO of Mobileye and senior fellow at Intel, acknowledged, “Our videos are showing work in progress and, as such, are not flawless.”
He stressed, “Our position is that transparency is a crucial ingredient in the process of building a self-driving system, hence showing work in progress is an important step forward.”
Before delving into the specific scenes that we flagged, Shalev-Shwartz laid out a few general comments:
The visualization only shows “facts” — things currently stably detected by the cameras. The driving policy has a “common sense” layer that includes logic like “things cannot vanish into thin air.” These non-visual guidelines are part of the car’s decision-making process.
Some objects are detected in 2D images but understanding their position and kinematic state in the 3D world poses a lot of uncertainty. These 2D images are not displayed in the visualization but are used in the context of RSS (Responsibility-Sensitive Safety), explained in the next bullet.
An important component of RSS is “know what you don’t know.” This means that at any time, for every area in the 3D space, we know one of the following: (1) it is known to be occupied by some road-user (2) it is known to be empty road (3) it is unknown. RSS logic behaves properly in each of these situations. We use the “unknown” mechanism also for objects detected in 2D, but there’s a lot of uncertainty about positioning them in the 3D world.
Mobileye responds to scenes we flagged
Let’s start with the first video Mobileye posted in January.
The first segment of video we want to examine is the part where the baby buggy appears. It runs from 13:48 to 15:05.
There is a person pushing a baby stroller at 14:52. The image above is a single frame from the video when the stroller appears. The computer seems unaware of it until much later. Is this the case that this is an object the AI wasn’t trained on?
Shai Shalev-Shwartz: The pedestrian and the stroller are being detected the whole time. However, the visualization only shows “facts” – see bullet (1) above.
The next video segment we want to highlight runs from 16:40 to 17:20. The event we want to call your attention to occurs at around 16:47. The visualization software seems to be confused whether it is a bus or a truck coming from the left. And then it momentarily disappears.
Shai Shalev-Shwartz: Distinguishing between a truck and a bus is not imperative for the AV’s decision making, especially when it’s three cars away from you.
Then, at around 17:17, you suddenly see a red box, at a time there is no such object, as captured in the still frame from the video you see above. What creates such a ghost? Same as above.
Shai Shalev-Shwartz: Our sensing Neural Network is tuned to the safe side (falsie vs. missy) because target misses are a safety issue, whereas false detection is a comfort issue. Nevertheless, the comfort of the drive is not compromised (decision making takes into account the number of frames for which the target is “alive” and tunes the braking profile accordingly to minimize jerkiness).
The next segment we want to call attention to runs from 21:10 to 21:55.
Stop at around 21:50. A motorcycle is visibly driving toward the car. It is in the view of the safety driver. The computer doesn’t seem to be aware of it until much later. Why?
Shai Shalev-Shwartz: The decision making takes into account uncertainty (not visible in the Display). See bullet (1,2). This is not a problem.
Let’s take a look at the May video.
The first segment we want to highlight comes after the 3:00 minute mark. Why does the car come almost to a stop at about 3:05 while the left lane is clearly empty, and it can easily drive around truck?
Shai Shalev-Shwartz: We use different contextual cues to determine whether an object is an obstacle/ stationary object that we need to overtake or not (i.e., vehicle waiting in a traffic jam vs. double-parked vehicle). This mechanism is rapidly improving. Nevertheless, this is not a safety-related issue.
The next sequence we want to point out comes shortly after the 6:00-minute mark in the video; the still frame above is from 6:12. A bunch of cars are parked on the right side of the street. Any one of them could back out. The visualization software sees only one, however. What triggers such a flickering phenomenon? The objects are being detected at the “low-level” the whole time. It’s just a matter of visualization.
Shai Shaley-Shwartz: See bullet (1,2) This isn’t a problem.
The next thing we would like to call attention to is the unprotected left turn. The sequence in the video begins around 10:45; the still frame above is from 10:58. The Mobileye vehicle inches out into the road, so it essentially blocks the cross traffic to create an opening for itself. There are other times that strike me as an aggressive driving. Is this considered “culturally different” behavior?
Shai Shalev-Shwartz: It’s completely normal here in Israel as well as in most western countries. Waiting idly for the perfect situation is not useful. Keep in mind that our RSS-based driving policy can be easily adjusted to match different driving styles (without compromising safety).
The next segment we wanted to ask about runs from 14:33 to 14:48. During the sequence, the Mobileye vehicle is waiting to make a turn. Why is the steering wheel moving so much? It appears that the computer is doing its own path planning. But what is causing the steering wheel to move so wildly?
Shai Shalev-Shwartz: This is one of the side effects of hacking a stock vehicle. This will not happen when using AV-ready platforms with a dedicated control stack.
The next question we had involved what happens after when the Mobileye vehicle stops for a battery change. The sequence begins around 14:45; the still frame above is time-stamped 15:05. In the drone footage, on the right, we see several vehicles parked. But on the visualization software, the number of parked cars keep changing. What causes this?
Shai Shalev-Shwartz: Again, see bullets (1,2)Things are ‘constantly improving’
All said and done, the bottom line is that Mobileye’s AV software is constantly improving, according to Shalev-Shwartz.
Most AV experts and Mobileye agreed that the AV software’s confusion over Bus vs. Truck doesn’t matter. One industry observer said, “The size [of a vehicle] can be challenging. But when we’re just talking about detection and response, what we really care about is knowing that there’s a vehicle in front of you.”
Mobileye’s CTO explained, “This is only a display issue (not affecting driving comfort or safety). From a Computer Vision standpoint, this is a rather simple task. Since it is not a safety-critical element, it was not prioritized.” Nonetheless, he added, “Recent sensing versions demonstrate much better performance in this regard.”
Nonetheless, we did notice that the visualization software seems to have problems sizing up the object. We asked Mobileye if this is caused by the limitation of 2D imaging.
Shalev-Shwartz responded, “This is constantly improving and already at an unparalleled level of accuracy.” Mobileye is adding safeguards to limited visibility by using RSS-based driving policy.
As non-experts watching the video, we often wondered why the computer sometimes seems to have a problem tracking an object. In one scene, it sees a car or a human, but in another scene a few seconds later the computer seems to have completely forgotten what it saw. The object disappears. The prospect of predicting what happens next seems, at best, worrisome.
Shalev-Shwartz made it clear, “We definitely track objects. We sometimes fail to track due to occlusion or partial visibility within a certain image, so we cannot ‘count’ on it and reassure existence by deep neural network (failures are typically due to occlusion or bad classification).”
He added, “Also, our RSS-based driving policy deals with occlusions and areas with limited visibility using novelty methods and techniques. This is part of the reason why such ‘errors’ don’t affect the driving experience nor the safety of the ride. We don’t just get lucky, as you say.”
While we did everything manually, we trust that Mobileye has tools to automatically spot mistakes the computer might have made in each drive. We asked Shalev-Shwartz what that process looks like.
He explained, “We have a very structured procedure for this. Every test drive is manned with a safety driver and a co-pilot responsible for tagging events in real-time. We then run the logs through an offline analysis that is being done by a dedicated team. For large scale validation, we use novelty methods like comparing to L/R system, offline ground-truth using reference sensors, others.”
At a time when there are no benchmarks to test the quality of AV software, measures like miles driven, disengagement numbers and conveniently produced YouTube video clips continue to be the guideposts left for the general public and the media to rely on in judging the safety of self-driving cars.
We’re grateful that Mobileye engaged in our public dialog. When more and more devices — including self-driving cars — are supposedly getting smarter than people, it becomes more and more urgent that we ask: “Show us how that works.”