Artificial Intelligence has been assuming significance beyond academic debates in the past couple of years. Google and Facebook have claimed that they now have face recognition systems based on Artificial Intelligence that can beat humans at the task. There are reports that many of the text chats are now manned by Artificial Intelligence systems without the user’s knowledge, thus surpassing the Turing test criterion. Proponents of Artificial Intelligence like Ray Kurzweil has been predicting that within the next 30 years, AI will enable immortality through a concept known as Singularity, where we will be able to upload our brain on to a cloud and then onwards, our thoughts live on forever. On the other end of the spectrum, people like Stephen Hawking predict Artificial Intelligence could spell the end of human civilization with computer systems eventually overpowering humans. So, it would be interesting to explore objectively where a computer artificial intelligence system stands vis-à-vis a human brain.
One of the biggest advantages a computer system has over a human is the speed of processing. A 1GHz processor can perform a single operation in 1 nanosecond. Even assuming that it takes about 40 CPU cycles to transfer the data from memory before processing, a computer can do a single operation including data fetch in about 40 nanoseconds. Compare this to the human neuron which collects inputs from a synapse, processes it and transfers it to the next neuron in about 5 milliseconds. This would mean that a computer system is 125,000 times faster than the human neuron. Yet, if you show 100 different objects to a human and ask him to identify each within five seconds and repeat the same exercise with a computer, a human would fare far better.
To answer why there is such anomaly, we will have to go into the detailed functioning of a single task, say object classification in computers vis-à-vis a human. The two systems start differing from the sensor node itself. Human visual sensory nodes are rods and cones for detecting light and color respectively. The 100 million rods and 6 million cones convert the data contained in the light to sensory impulses and transfer them to the neurons in parallel. However, a computer system is more or less a serial system. After the light is converted to electrical signals in the camera, they are scanned and sent serially. Even at a lane rate of 5.7 Gbps along four lanes, we could transmit about 170 UHD frames in a second. This is still much faster than the retentivity of the eye. So the problem lies elsewhere.
The basic difference between a human system and a computer system is that the human neural system takes much lesser number of steps to convert the visual stimulus to an inference when compared to a computer system. If a single neuron is to take about 5 milliseconds to process the data and transfer to the next neuron, the whole cycle of inference should take within 100 neurons if we were to make a correct inference in 500 milliseconds. Compare this is to a computer vision system. Each image is subject to the broad steps of image pre-processing, edge detection, feature extraction and object recognition. For image pre-processing and edge detection, one pixel at a time is processed. Each pixel is subject to at least 10 to 15 computations, while feature extraction and object recognition could be performed using either conventional image recognition systems using statistical methods or using neural networks with multiple hidden layers. In any case, the number of computations required to reach an inference is easily in the order of millions of CPU cycles. As the number of objects increase, the number of hidden neuron layers also increase, thus making such a system much slower than a human. The current parallel architecture offered by GPUs has some advantage in making the process faster. However, the sheer number of computations required to reach an inference coupled with the need for these computational stages to interact with each other makes an AI system look primitive when compared to the human brain, even though it has a clear advantage of speed.
So why do neurons require lesser stages of processing? What are the additional capabilities they possess when compared to a computer system? The answers are not clear, but there are some indications:
- Aggregation: - The human inference system does not work on a step-by-step method of looking at one pixel at a time, pre-processing it, detecting an edge, then detecting a higher-order feature and then doing an object recognition/classification. Many of these steps are aggregated. For example, if we see from a distance, a rose plant with red roses on it, even if the details of the flower are not visible, we are able to infer that it is a rose. There are some who argue that this is because the human system is a look-up based system where the image of a rose is compared to what is present in memory without many steps of intermediate processing so that we finally arrive at the inference much faster. However, this view is an oversimplification. If our frame of view has multiple objects, we are able to segregate every object and separately classify each one at the same time. So segregation is still a function done by the human brain just like edge detection. The aggregation is done dynamically, i.e., sometimes our attention zooms into a bee sitting on the flower and sometimes we totally miss it even from the same distance of view. We are able to focus on different objects of interest even while aggregating, which is a capability that has not evolved in computer systems.
- Association: - Consider a person goes to Japan during cherry blossom season. He has never seen a cherry tree let alone its flower his whole life. However, if he sees a lot of trees in full bloom, the very knowledge that it is cherry blossom season will make him associate the tree to a cherry tree. However, computers are still bad at this. The deep learning system has to be trained with a labeled object of each item it has to classify. If it is not trained with a cherry tree, it would not figure out the identity through a priori knowhow of the season and country.
- Filtering Noise: - Let us suppose we are given a drawing that is scribbled all over by a child with different sketch pens. The human brain is still capable of filtering the noise and inferring the image on the paper. Image processing systems use methods like smoothing to remove Gaussian noise and morphological operations to eliminate noise with a pattern. But if there is intentional noise which has similar characteristics of the image itself, like a child’s scribble, image processing systems fail to eliminate this. The auditory ability of filtering noise is even more amazing. Even in a highly noisy environment where the ambient noise is several decibels higher, we are still able to pick up what our friend is speaking to us. There are filters developed based on frequencies in audio systems that mask out ambient noise. But the unique ability of human sensory function is that we can dynamically focus on one voice even if there are many human voices in the background.
- Abstraction: - There are amazing levels of abstraction that the human mind can work with. Let us, again, take the rose example. We do not think so much about the number of petals of the rose, but aggregate it to a single red rose. A computer vision system also strives to bring in some levels of abstraction by removing high degrees of correlation. This is done by computing the principal components or Eigen values of a picture. But the very process of computing Eigen values takes close to a millisecond in modern GPUs. Also, this is more of a reductionist technique rather than an abstraction technique in order to make statistical correlation with the required object simpler. However, the human mind does not stop here. We associate adjectives like “beautiful” to the rose. We could then go off on a higher order trajectory of language and think of the Shakespearean verse, “A rose by any other name would smell as sweet.” We could associate feelings of love with the rose, and so on and so forth. An idle mind is indeed a devil’s workshop. In fact, the progress of the human race owes heavily to our ability to abstract. We create simpler models to explain very complex systems and then work with these models. A physicist who works on matter and energy hands over the system to a chemist once we start looking at a molecular level. The chemist in turn hands it over to a biologist when we begin looking at genes, cells and simple organisms. A biologist passes the baton to a doctor when the system grows in size to form a human system like the nervous system. They then give it to a psychologist when the abstraction level becomes as large as the mind. Then comes the layers of philosophy, political science, economics, arts, belief systems, et-al. And amazingly, the same human being can use these different levels of abstraction as the situation demands. Da Vinci was a scientist and a painter at the same time.
- Rapid reduction of the need for focused attention through training: - All of us remember the first time we learned to cycle. It was virtually impossible to keep our center of gravity below the wheels of the cycle to keep from falling. Even after struggling through this first step, when we had to turn a corner or take some higher order decision, like braking, we would lose our balance and fall. However, gradually all these tasks became a no- brainer. After a few months of cycling, the cyclist’s mind is preoccupied with unrelated thoughts most of the time and the only time he would focus back on the act of cycling is when he has to take a higher order decision like braking. However, computer systems take the same amount of resources to perform a task, no matter how long they are trained. Deep learning systems become better and better over time and reduce the output error. However, the resources taken remain the same and therefore, they are not freed up for some other task. In fact, the human system has a huge number of completely autonomic tasks which we do without cognition. Our heart pumps blood many times a second, our digestive system processes food and our respiratory system takes in oxygen and gives out carbon dioxide, none of which is with our active cognition. Computer systems have such partners like DMA engines, which are used to offload the CPU. However, all of them require regular housekeeping by the CPU.
- Handling diverse sensory functions and correlating inferences from them in parallel:- The human brain can see a rose, smell it, feel its soft petals, hear a bee buzzing around it while chewing a candy. Computers can also independently do multiple functions, but correlating all of them to stimulate higher-order feelings like happiness still has not evolved. There are algorithms that perform sensor fusion, but these are premature compared to human capability.
The mechanisms of how the human brain achieves many of the above are still unknown. Key research is still ongoing to understand the human brain better and emulate these in computer systems. Until such research matures, we might not achieve what we can classify as a truly artificial intelligent system. Individual functions like auto-chat may pass the Turing test criteria. But the human brain is such a wonderfully complex system that emulating all these functions together may be a couple of generations away. Or so let us hope, unless we want computer systems to overpower humans and cause our decline while we are still alive.