top of page
Writer's pictureClaire Matuka

Introduction to Computer Vision

In a quest to fully understand computer vision, I am starting a "Computer Vision" blog series. It's a series and not just one blog post because it is meant to educate both non-technical and technical individuals.



The first part of the series will consist of blog posts that scratch the surface on what computer vision is, its history, as well as its applications. If you have never written a line of code in your life, this part is for you (yes, you will also be armed with information to make you sound smart in conversations, you're very much welcome)


The second part of the series is for the techies or anyone who simply wants a deeper dive into Computer Vision. Here we will learn about computer vision algorithms and neural networks, AlexNet, creation of computer vision applications, tools used, as well as deployment and MLOps for computer vision. All this will be done with the help of the Python programming language.


Stay tuned and let's get learning.



What is Computer Vision

Human vision is quite fascinating to say the least. If I was to show you an image of a bowl of fruits, you would easily be able to identify mangoes, bananas, grapes, oranges and even the bowl itself. This is probably because you have come across these fruits several times in your life and can now easily identify them with little to no thinking. A computer on the other hand, does not "see" or understand image inputs. Thanks to Artificial Intelligence however, it can be taught and trained to understand image inputs.


Computer Vision is a type of AI that deals with the processing of data in the form of images, videos or any other visual data inputs. It basically enables a computer to see and understand visual data.



Why Computer Vision


According to a survey carried out by statista, as per 2021, 6.378 billion people own a smartphone. This accounts for over 80% of the world's population. Given that smartphones have cameras we can only imagine the amount of video and image data being produced on a daily basis.


Social media platforms which allow video and image sharing, have taken the world by storm. To prove this, here are a few statistics we need to understand:


  • Images and videos make up around 71.2% and 16.6% respectively, of all content on Facebook.

  • 100% of the content on Instagram consists of visual input data such as images, videos or carousel.

  • More than 500 hours of video content is uploaded to YouTube per minute. If we do the math, this totals to 30,000 hours of video content per hour and a total of 720,000 hours of content per day. In summary, it would take you approximately 82 years to watch all the content uploaded on YouTube in just a single day.

It is clear that we are producing tonnes of visual data. As the world becomes more data driven, it is important to harness this information through the power of Computer Vision. Benefits include:

  • Speed - Computer Vision systems are able to comb through and process massive amounts of data at a short amount of time.

  • Accuracy - Given the right algorithms, Computer Vision systems are able to produce accurate results in object recognition, image searching, optical character recognition and many other applications.

  • Reliability - Unlike humans, Computer Vision systems are not prone to being tired and can therefore run for hours, days, weeks, months, etc.



History of Computer Vision

Please note: I will refer to Computer Vision as CV, from this point onwards.


As I was studying, I couldn't help but wonder, who even came up with the idea of analyzing images. Just like everything else in life, CV was as a result of several gradual events and not just a single stroke of genius. I have highlighted some of these important events.


Hubel and Wiesel

So get this, the foundations of CV came from the study of neurological activity in the brain of cats in the 1950s and 1960s. The intention of the study was to investigate the short and long term effects of depriving kittens of vision in one eye, but what they found lay the foundation for understanding how the brain processes visual images and hence CV.


Hubel and Wiesel showed the cats different patterns on a television screen. They noted that neurons are organized in a hierarchical fashion. This means that the cells connected to the cat's retina identified simple patterns such as edges. The later layers of the cortical cells identified more complex patterns and shapes which then helped form the overall visual presentation.


Neocognitron

Neocognitron is a hierarchical artificial neural network that was created by Kunihiko Fukushima in 1979, for Japanese handwritten character recognition. The neocognitron was inspired by the discoveries of Hubel and Wiesel and marked the origin of convolutional neural networks (CNNs).


Image recognition with CNNs

In 1989, Yann LeCun developed machine learning models that could be used for handwritten zip code recognition and optical character recognition (OCR). These models involved the use of CNNs as well as back-propagation.


LeNet

After several years of improving on CNNs, Yann LeCun, Leon Bottou, Youshua Bengio and Patrick Haffner, created the LeNet-5 which was used to read characters on bank cheques. It consisted of 7 layers which were made up of the basic units of CNNs, such as convolutional layer, pooling layer and full connection layer. Its use was however not as popular due to hardware constraints such as enough GPUs.


AlexNet

AlexNet is said to have sparked the boom that currently exists in the CV world. It is a large, deep convolutional neural network developed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It was used to classify millions of high-resolution images into different classes. The network achieved a top-5 error of 15.3% which is better than anything before it.


It consisted of 8 layers; the first 5 were convolutional layers, some followed by max-pooling layers, and the last three were fully connected layers. AlexNet used CNNs and GPUs to accelerate deep learning. Several advancements have occurred in the field of CV since.


**********


Now that we all have a brief understanding of what Computer Vision is, why it is important and the history behind it, we can move on to understanding its modern day applications in various industries across the globe.


Hope you enjoyed the read :)




Comments


bottom of page