Driving a vehicle requires understanding of the World (surrounding us reality). This is the fundamental statement, which has to be clearly and deeply understood. When we do understand it, along with the limitations of the computers, then the solution I want to propose shall become obvious and inevitable.
Our problems in this area arise mainly from terms misuse. We talk about intelligence (artificial) and recognition (of pictures) among others. But both these terms miss the fundamental meaning, which we all (non-)consciously assume. Their origin is human. They are used to describe a human-like mind. Self-conscious, reasoning, free-will intelligence capable of understanding the reality. This is important: understanding is possible only in case of self-conscious, free-will intelligence. Otherwise it is simply non-existent. All the artificial, computer based solutions lack this fundamental capability. And this is a ‘must-have’. If a system lacks it, then by definition, by the very fact of lacking this (killer) feature, such system simply cannot achieve the level of comprehension of the situation (that is the reality) which is required for certain human-specific activities – like safe, smooth car driving. In fact, even speaking of ‘comprehension’ in case of a computer is pointless. There is no comprehension at all. Computers comprehend nothing. Their level of comprehension is the same as of a typewriter. People can perceive computers as ‘intelligent’, able to ‘understand’ only if they do not know how they work, which I show elsewhere.
The conclusion is clear: artificial, pre-programmed, task-designed systems are unable to deal with the infinitely complex reality. Because our reality is infinitely complex. Nothing ever happens twice exactly the same way. Nothing ever looks twice exactly the same way. These are the basic conclusions of physics. And ancient philosophy (panta rhei). Unless, we create something completely different from the contemporary computers, there is no hope for Autonomous Vehicles to become ubiquitous.
My answer is: ‘Yes. Driverless cars are inevitable. But not in the Autonomous Vehicle flavor. This is a dead end.’ I realized this two years ago, when I first read about the (nearly) entire Google AV team leaving the company, because they were paid too much. At least this was the only explanation, that the authors of such articles could come up with. Then I thought to myself: what an incredible bullsh..., I mean, I rather thought: what an obviously incorrect answer. Yes. That’s what I thought.
Anyway, a mere deduction can reveal it, dear Watson. The AV is probably the coolest project of a decade for the software industry. It’s coolest in many aspects:
- big everyday life impact, which will stay with us forever,
- revolution that attracts attention and imagination,
- hard to achieve – not everyone is good enough to take part in,
- it may give lot’s of fame and money to its creators, it may even put their names into history books and encyclopedias.
And they quit, because they got some money!?!
Imagine an architect who would resign from designing the highest building in the world, because he already got a 5 digit check. I cannot.
Imagine a biologist, who would quit a cancer-cure project, because she already got half a million...
Every professional in the world dreams on such an occasion. To work and succeed on a project, that will change the life of millions (and make him or her a multi-millionaire). A project like the AV. And they quit, because they’ve already earned ‘too much’? It is really unbelievable.
What could be the real reason, then? Maybe the firm could not afford to finance the project as required? – Google could not afford something?! Impossible. Or perhaps, it was about the unfriendly, even toxic work environment? Dumb managers, mobbing, etc.? – But where? At Google?! Impossible.
The only sensible answer was: the project must have been doomed, because of the objective (technical) reasons. No-one wants to take part in a failure. The bigger and more famous the project is, the bigger and more famous the failure. This makes their otherwise unreasonable behavior, a perfectly understandable one.
So I searched the Internet. I found the Tesla’s autonomous car demonstration movie. The car was pretty well avoiding accidents by nearly stopping before everything ‘dangerous’. That meant slowing down to 2-4mph every minute. Sometimes more often, sometimes less often. And it was traveling along a road we could describe as nearly empty. How much better can it get? But I’ll return to it later. Anyway, this project was doomed. It is impossible to achieve a true success here. Only a partial one is feasible. And they must have known it, or at least felt it.
But why the big companies still work on the AV? Why do they keep claiming that they will introduce driverless cars to our streets? The answer is a simple one, again. Because it sells. It sells to investors, stock markets, prospective employees. It attracts attention – it makes a great PR value. On the other hand, admitting that they spent years and millions of dollars to get into a dead end? Unimaginable. So, it goes on. Like the dot-com bubble. Like all the other bubbles.
Now, I’ll go into the details. I’ll describe the situation step by step, starting from the animal examples, which will help us to understand the requirements for a human like car driving. Then I’ll show the limitations of the contemporary computers in this area. Finally, I shall present the feasible answer, which is the main topic of this text.
At first, let’s define the difference between seeing the world and understanding what we see. A good starting point would be for example a horse. Horses see quite well everything around them. They can move with speeds comparable to cars. Also their size and mass make them nearly comparable. Could then a horse (brain) drive a car? Obviously, the answer is no. It could not. And it is not about the inability to read the road signs. It is because horses do not understand what they see. And understanding is about classifying, interpreting and most importantly: foreseeing – prediction.
Let’s take for instance shrubs swaying in the wind. First reaction for a horse being in 10 yards distance is to run away. Shrubs sway, make noise – it's a danger. Horse does not understand they are just shrubs swayed by the wind. Horse knows no difference between shrubs swayed by the wind and by a bear scrambling through. Perhaps a horse might even consider a shrub being a big animal itself? Anyway horse wants to run away. What happens if a rider makes his horse to come closer to the shrubs? Being one meter away horse will try to eat the leaves. It is afraid no more. Being closer horse sees shrubs as something to eat. Why? Because horse does not understand what a shrub is. For a horse shrub is just an object. Something. Unknown. Horse might have seen hundreds of different shrubs in its life. Yet, it is unable to create an idea of ‘a shrub’. It is unable to distinguish a shrub from anything else. Well, perhaps it can distinguish a shrub from a horse. But again. Not at a longer distance. And longer means just 20-50 yards.
Another example. I saw a woman who wanted to make acquainted her horse with her little dog. She came close to the horse with her dog in her arms. And… no reaction. Her horse was perfectly calm. As usual with her. She said: “Look, I was thinking it is going to be afraid and it just accepts my dog perfectly. What a relief.” Yet after half a minute or so, dog in her arms perhaps moved more than earlier, or made a sound. And her horse shuddered. With fear in its eyes, it moved away. Only then the horse realized, that there is something alien on her.
Everyone dealing with horses knows one cannot take off a jacket near a horse without scaring it. Horses do not understand the idea of ‘a jacket’. Or any clothes at all. For a horse it is just something alien, strange, what suddenly appears. Yet, horses do see the world pretty well. They have much broader area of view than humans. But without understanding, they can be easily deceived. Therefore their reactions may easily result in getting hurt. One could say: “OK. But cats and dogs are much more intelligent”. Yes they are. But they still lack the ability to understand. To reliable foresee the outcome of their actions. Of course, they can learn the outcome of some actions. Sometimes, they can even learn how to manipulate their human owners. Just as they learn to manipulate other animals while hunting or in a herd hierarchy. But they are unable to learn what to expect from a paper or foil bag moved by the wind. They cannot distinguish between a paper pellet and a rock. Just because they do not understand the idea of ‘a material’ and its features.
We, as humans, are so used to understand all the different things which surround us, that we consider it as obvious, and therefore easy. For us, seeing is understanding. I see a rock, a man, a trash bin, a carton box. I know pretty well what to expect from each of them in different situations. For instance, situations that happen on the road. And there is much more! I can distinguish between hundreds of normal behaviors and abnormal ones. For example, I understand that box moving without the wind blow is something abnormal. And that there must be something inside it. Even during a windy day I can pretty well distinguish a box moved by a wind blow, and by something that is inside it. All this is possible, because we do have this ‘something’ called mind: reflective self-consciousness able to understand the world. Without this ability all tasks that rely on it cannot be performed.
Or to be more precise: they cannot be performed without much bigger margin of error, than it is expected while having this ability. And these errors are inevitable. Situations happening in real world differ infinitely. One can never see two times the same picture of a running child. Even, while working in a kindergarten. Every child is different. Every child runs differently. The scenery is never the same. There are infinitely many points of observation. And no-one ever does anything exactly the same way twice. Each of many runs of a child is unique. It is really astonishing, one could even say: a miracle, that during our life we see an (nearly) infinite number of infinitely different pictures, yet we are able to correctly classify every element of this infinite chain of pictures in real time. With an error level which is negligibly small.
This is the killer issue for computers: we have to feed the systems of autonomous driving with an infinite chain of different images. We can only hope for 2 of them being the same if we lack resolution or color depth. No algorithm can properly deal with such stream of data. No matter if ‘pre-programmed’, ‘learning on-the-fly’, or anything in between. Only human mind can successfully deal with this infinity. How it does it, it is still a mystery.
All this can be better understood if we look at it at the basic level. From the computer perspective image is just a 2 dimensional array of numbers. Full HD picture is just 1080 rows of integer numbers (from 0 to 16777216, that is 2 to the power of 24). Each row consisting of 1920 numbers. And this is what an algorithm creator (a programmer) starts with. OK, I have this 2073600 positive integer numbers. They form a two dimensional array. What now?
We have to realize that even the simplest task like distinguishing basic figures, for instance triangles, rectangles and circles is really, I mean really hard to achieve. I know, that over the years, there have been invented plenty of algorithms and techniques. Currently, we are pretty well prepared to solving the problem of basic figure recognition. But we still do not know what's in the picture. We have those arrays of numbers, and we have to decide based on neighboring numbers. If neighboring numbers are close in value then do this, otherwise do something else. But there are billions of billions of possibilities. And we have to distinguish between all these possibilities just by comparing numbers. In fact, this is all we can do. We can try to compare clusters of numbers, or do other fancy things. But it doesn't change the main point: we cannot recognize the picture (elements) as we are used to, as humans do. We deal with a big array of numbers. The array of numbers which is not different than a warehouse stock data. This is our picture from the computer perspective.
This is crucial to understand. If we want a computer to recognize a cat, we would need to feed a ‘self-learning’ computer system with tens of thousands of images. This color cat, that color cat, cat on a tree, cat in the shrubs, cat on a tree walking down, jumping cat, lying cat, cat cleaning its leg, bigger cat, smaller cat, a kitty, etc. And we would still have a system which gives correct answers in 98% or 99,8%. And this is a ‘cat-only’ system. And we can encounter millions of different objects on the road. And even 0,01% of recognition failure means a failure every 20 minutes if we encounter 10 objects every second while driving. How many failures are needed to cause an accident? 20? 50? 200? No matter how much effort we will put into improving such system, we cannot hope for using it in the crowded urban areas.
As I’ve already mentioned, two years ago, I saw a filmed travel of a Tesla car. It was pretty well avoiding accidents by nearly stopping before everything ‘dangerous’. That meant slowing down to 2-4mph every minute. Sometimes more often, sometimes less often. And it was traveling along a road we could describe as nearly empty. How much better it can get? They spent millions of dollars, working for years, using every available technique to make it better. There is hardly any space for improvement. A laic may think, that computer systems may be improved infinitely. Basically, it is true. But from some point, the advance becomes asymptotic. The effort that gave 10% improvement, now gives only 1%. Later on, we need to spend twice as much time and effort to get a 0.1% better system. And still, it may get worse.
At some point, trying to make system to work better in one situation, may backfire in other situations. Just like the Uber car killing a cyclist, because she was moving across the street, instead of riding along. And this is not something that can be avoided. There are millions of possible situations for which our AVs will be unprepared. We simply cannot write code, which correctly recognizes every picture of the infinite chain of different pictures that the car’s cameras acquire. And taking into account our experiences with neural networks and similar ‘artificial self-learning systems’ there is no hope we could ever ‘teach’ such systems to correctly classify such infinite chain of pictures. I want to reiterate: we can make such systems to achieve pretty good results. Even results that seem astonishing for laics. Yet, they will be still systems, which are several orders of magnitude worse, than what we would consider satisfactory. No national regulator will ever agree to allow such cars in cities. If one would do, they will have to reconsider it soon. We cannot deceive the reality.
Obviously, such systems may be enough to drive cars on a well fenced highways, or in similar sterile environments. But this is not what we expect. It is not enough. And even adding additional sensors will not fix this flaw. Lasers, radars, heat sensors will add to cost and complexity, but the overall improvement will stay unsatisfactory. Just because we face the problem I mentioned in the beginning. There is no understanding. No way to consciously react on the just happening situation. This would require a human-like self-conscious, intelligent, free will mind. And there is no way to mimic this behavior. This is binary. Either we have it, or not. Nothing in between. It may seem, we are ‘really close’ to achieve the human level. But it will only seem so. To the first unpredicted situation.
How can we overcome the problems described in the previous chapter? What can we do to push the error threshold from ca. 0.1% - 1% to let's say 0.001% ? That is 3 orders of magnitude? How can we prevent the autonomous vehicle (AV) from slowing down (or even stopping) near every moving object, like Teslas do? More importantly: why does the Tesla car need to slow down? What for? There is no need to slow down near every dog, every pedestrian or runner moving along the road. What is the reason then? The answer is simple: they buy time. Precious time needed to properly identify the object near the road. To gather more data, to be able to classify it as ‘normal’, not causing danger. Not moving the potentially colliding course. Every additional second of observation provides valuable data. Data that allows for correct classification (that means recognition) not 98% to 99%, but 99.9% and higher. If we would let the existing algorithms to gather (and process) data for a minute, we could achieve satisfactory level of correct classification (recognition).
And what if we could ‘observe’ the area for hours before a vehicle passing it? Clearly we could then distinguish for instance a sitting cat from a rock. Hardly any cat sits in one place for hours without moving. Knowing (nearly 100%) that this particular ‘thing’ is something not living like a rock, a mound of a mole, or any other ‘terrain facture’ makes a big difference. We can be sure then (practically in 100%), that it will not jump on the road just in front of a passing car. More importantly: we can easily distinguish a 2 years old child squatting and observing a ladybug in the grass from a garden dwarf figure (or a similar thing), just because we registered the child was walking 2 minutes earlier. And then squatted in this very place. Such pieces of information are invaluable for really safe autonomous driving.
How can we acquire them in an AV? Well, we can’t. And that is why the AV being really autonomous, that is using only data available through sensors installed in the AV itself, cannot drive in our cities and suburbs without risking dreadful accidents. Because slowing down to 3mph or even stopping near every ‘strange’ object as Teslas do, could drive crazy even the very patient passengers. It is simply unacceptable.
How can we then solve this problem? We need to introduce an autonomous driving system (ADS). The ADS which continuously gathers all available data from all accessible sources: vehicles, street cameras, additional sensors. Each observed object has its own history in the ADS. For example, we would know if a pedestrian showed any symptoms of abnormal behavior – a minute ago, 5 minutes ago, or even earlier. Instead of letting every vehicle to ‘drive on its own’, using only its own, very limited data sources, we send all the data to a central (cloud based) system, where we can use all the sophisticated algorithms to make the vehicles movement safe, ecological and optimized to all and any factor we could think of. The ADS sends driving instructions to every vehicle in its commanded area. Simple and easy.
Of course, the ADS still lacks understanding, which is required for autonomous driving as humans do. Nothing changed here. What changed is the approach. We no longer pretend, that we can mimic the human level of the world comprehension using a computer. We simply provide the otherwise dumb and ‘nothing-understanding’ algorithms with such amount of data, that they can manage to do what we want. It’s like with an airplane. As long as we tried to copy the nature and build a flying machine which flapped its wings – we failed. We succeeded when we changed our approach to the one suitable for machines. Here is the same. We need to change the approach to the one suitable for the contemporary machines – computers.
And here is also the place, where the 5G comes in. The cell phones industry has a big problem with its newest technology. How to sell it? 4G killer feature was LTE – fast Internet access for the smart phones. But how to sell the 5G, which according to specs is going to be 10-100 times faster with the latency counted in ms? From the consumer perspective these gains are not very useful. Downloading a full HD movie to a smartphone in a few seconds? Who needs it? The 5G demands a widespread application to thrive. The ADS seems like a perfect answer. We would need to send data from every moving vehicle to the ADS. This data needs to be sent in real time. Since we need to send back the driving instructions, it is required for the latency to be at the lowest level. Just like the 5G promises. In fact, ADS is the answer for the 5G specification: it requires tons of data to be uploaded and the latency counted in milliseconds.
In the ADS, all the city traffic is managed centrally. That means all the traffic optimizations, we dreamed of for years, could be finally possible. To better understand what an enormous change it brings, I'll enumerate some of the benefits (apart from making the truly driverless cars a ubiquity). Here is a handful of examples:
And many other situations which we are unable to predict and classify using software solutions, but which do happen in real life. There could be one country-wide centre providing support for many ADS systems. Making it cheap solution thanks to the scale effect. It is also worth stressing here, that the ADS does not have to wait till a vehicle drives near. On the contrary, we want to react as soon as we detect something ‘strange’ using for instance street cameras and/or other sensors. This is an especially important feature. Hard to add to an AV. It would require to have a human driver in each AV. But if so, then what’s the benefit of the AV if we still need a driver present and ready?
There is still one more issue. Weather. We see the AVs rides in the perfect weather conditions. How would they behave in a rainy, stormy weather? When 50mph wind throws leaves, pieces of paper and heavy rain at the vehicle? What about snow flurry or fog? Is any of the contemporary AVs able to deal with such an environment? I doubt. I suppose, the strong wind alone could be enough to ‘eliminate’ many AVs in neighborhoods, where shrubs or similar elements are omnipresent.
The advantage of the ADS here is that we could use additional sensors. Too expensive to be mounted in every vehicle. Or the ones which do not give good results when installed in a single (moving) base. Not to mention, that the ADS ‘knows’ its neighborhood. It is ‘tied’ to the area in which vehicles are ‘guests’. This feature alone makes all things much easier.