When I first started driving, my parents would tell me to “drive safely” and then follow up with “it’s not you we’re worried about, it’s the other idiots on the road”. In effect, this problem – the problem of avoiding collision with other vehicles (not all idiots of course) – is one problem faced by companies developing self-driving vehicles.
How do self-driving cars “see” other vehicles?
Before a self-driving vehicle can predict where other vehicles (or bicycles or people) will be in the future, it needs to know where those vehicles are in the present and how they were moving in the past.
One way of doing this is to use a vehicle mounted camera to provide video of the surroundings, and to analyse (in real-time) individual frames of that video to detect vehicles. Such analysis can be performed using an artificial neural network (ANN) – in general, ANN’s are trained, using large amounts of real world data, so as to be able to perform various tasks (see Alex Savin’s blog post on machine learning). For object detection, those ANNs are usually in the form of convolutional neural networks (CNNs). CNNs, in a general sense, involve filtering an image in a number of different ways to extract different “features” of the image. These features are passed into a fully connected neural network which uses them to determine whether the image contains an object (see here for an interesting visualisation). The network learns the best filters to use and how to interpret the features produced by those features through a training process.
The camera mounted approach tends to be the direction car manufacturers are taking in developing their self-driving vehicles. Ride sharing companies, on the other hand, have instead directed their development towards LIDAR based solutions for the detection of other vehicles. This is possibly because the increased expense of LIDAR is justified by the savings that self-driving vehicles provide by replacing paid drivers. LIDAR uses the reflection of laser light to measure the distances of objects from a self-driving vehicle, which means that, unlike images provided by a camera, LIDAR can be used to accurately (and directly) locate other vehicles in three dimensional space.
How do you predict the seemingly unpredictable?
One difficulty with tracking other vehicles is that, from the point of view of the self-driving vehicle, the appearance of another vehicle will change as that vehicle gets closer or farther away from the self-driving vehicle, makes a turn, or when it is partially obscured by another vehicle on the road. This inconsistency in shape and size can make it difficult to track a vehicle over subsequent frames.
To address this issue, some systems construct a bird’s eye view of the surrounding environment – a vehicle’s size and shape remains consistent when viewed from above (such as when it turns), which can mitigate some of the issues discussed above with respect to tracking vehicles in a frontal view. As an additional benefit, a birds-eye-view can easily be overlaid with road mapping data, which can make it easier to incorporate road markings, traffic light data, footpaths, etc. into the system.
Thus, data representing real life driving situations can be captured (e.g. using LIDAR) and converted to bird’s eye view images that are incorporated with additional mapping data, then combined with data describing the movement of various actors. This data, which usually involves thousands of driving scenarios recorded by a fleet of vehicles, can in turn be used to train a neural network (e.g. a CNN). Once trained, the neural network can be presented with the bird’s eye views (and the additional data) and, in response, can make a prediction about the most likely future trajectory of that vehicle.
In real-world driving, however, it is not only the most likely trajectory that is important. Rather, a driver will usually consider multiple possible future trajectories of a nearby car and adjust their driving accordingly. As an example, a car on the opposite side of an intersection to you may be most likely to turn left at that intersection, but it is also useful to be aware that it could, alternatively, make a right turn (so as to cross your lane).
To account for this, some proposed systems are multimodal – they identify multiple separate trajectories and assign each with a corresponding probability of that trajectory occurring. One such system is Uber’s Advanced Technology Group’s recently proposed self-driving system (see https://arxiv.org/pdf/2006.02000.pdf), which predicts multiple future trajectories of other vehicles, bicycles, and people in the vicinity of a self-driving vehicle, so as to allow the self-driving vehicle to react to those predicted trajectories. In particular, Uber’s proposed system identifies three trajectories and three associated probabilities for each vehicle (representing the vehicle turning left or right, or continuing straight). This allows the system to predict trajectories that would otherwise be missed by other (unimodal) systems. This means a self-driving vehicle incorporating such a system may be able to react to scenarios in which another vehicle on the road takes a path that is somewhat unexpected (or at least less expected).
Of course, this is only one way in which this problem is being solved, but highlights the complexity of addressing just one of a huge number of problems that must be overcome so that self-driving vehicles can operate safely in a wide variety of environments. Nevertheless, it seems it’s only a matter of time before these problems are addressed and parents no longer have to be concerned about those “other idiots on the road”.
Sign up to our newsletter: Forward - news, insights and features
We have an easily-accessible office in central London, as well as a number of regional offices throughout the UK and an office in Munich, Germany. We’d love to hear from you, so please get in touch.