Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.
what computer vision consist of :
Areas of artificial intelligence deal with autonomous planning or deliberation for robotical systems to navigate through an environment. A detailed understanding of these environments is required to navigate through them. Information about the environment could be provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot.
Artificial intelligence and computer vision share other topics such as pattern recognition and learning techniques. Consequently, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general.
Computer vision is often considered to be part of information engineering.
A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing of visual stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how “real” vision systems operate in order to solve certain vision-related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behavior of biological systems, at different levels of complexity. Also, some of the learning-based methods developed within computer vision (e.g.neural net and deep learning based image and feature analysis and classification) have their background in biology.
Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in processing of one-variable signals. Together with the multi-dimensionality of the signal, this defines a subfield in signal processing as a part of computer vision.
The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. Different varieties of the recognition problem are described in the literature:
- Object recognition (also called object classification) – one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Blippar, Google Goggles and LikeThat provide stand-alone programs that illustrate this functionality.
- Identification – an individual instance of an object is recognized. Examples include identification of a specific person’s face or fingerprint, identification of handwritten digits, or identification of a specific vehicle.
- Detection – the image data are scanned for a specific condition. Examples include detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.
Currently, the best algorithms for such tasks are based on convolutional neural networks. An illustration of their capabilities is given by the ImageNet Large Scale Visual Recognition Challenge; this is a benchmark in object classification and detection, with millions of images and hundreds of object classes. Performance of convolutional neural networks, on the ImageNet tests, is now close to that of humans. The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters (an increasingly common phenomenon with modern digital cameras). By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained classes, such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this with ease.
Several specialized tasks based on recognition exist, such as:
- Content-based image retrieval – finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them).
Computer vision for people counterpurposes in public places, malls, shopping centres
- Pose estimation – estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation or picking parts from a bin.
- Optical character recognition (OCR) – identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).
- 2D code reading – reading of 2D codes such as data matrix and QR codes.
- Facial recognition
- Shape Recognition Technology (SRT) in people counter systems differentiating human beings (head and shoulder patterns) from objects
Several tasks relate to motion estimation where an image sequence is processed to produce an estimate of the velocity either at each points in the image or in the 3D scene, or even of the camera that produces the images . Examples of such tasks are:
- Egomotion – determining the 3D rigid motion (rotation and translation) of the camera from an image sequence produced by the camera.
- Tracking – following the movements of a (usually) smaller set of interest points or objects (e.g., vehicles, humans or other organisms) in the image sequence.
- Optical flow – to determine, for each point in the image, how that point is moving relative to the image plane, i.e., its apparent motion. This motion is a result both of how the corresponding 3D point is moving in the scene and how the camera is moving relative to the scene.
Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model. The advent of 3D imaging not requiring motion or scanning, and related processing algorithms is enabling rapid advances in this field. Grid-based 3D sensing can be used to acquire 3D images from multiple angles. Algorithms are now available to stitch multiple 3D images together into point clouds and 3D models.
The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More sophisticated methods assume a model of how the local image structures look, to distinguish them from noise. By first analysing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches.
An example in this field is inpainting.
So basically computer vision is the sub field of computer science the objective of which is to interpret the world like we humans do and probably even better than we humans do.
Human vision performs multiple visual tasks quite effortlessly and effectively. How is visual information processed and understood in biological systems?What is the nature of computation involved in visual tasks? And how might we build machines that can see? Partial answers to these questions have been offered over several decades by researchers in the fields of biology, neuroscience, and computer science. Let’s say someone across the room throws a ball at you and you catch it.
let’s explore a task of vision
This appears to be a simple task but in reality it is not Let us try to analyze this task step by step. First, the light rays of the ball pass through both eyes and strike on their respective retinas.
The retinas do some preliminary processing before sending the visual responses through optical nerves to the brain
where the visual cortex does the heavy lifting of thorough analysis. The brain taps into its knowledge base, classifies the object and dimensions, and having predicted its path, decides to act on it by sending signals to move the hand and catch the ball.
This takes place in a tiny fraction of second without almost no conscious effort and almost never fails depending upon how much prior catching practice you’ve had. Recreating human vision isn’t just a hard problem, it’s a set of them, each of which relies on the other. More formally, computer vision is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding.
From the biological science point of view, computer vision aims to come up with computational models for human visual system.
From the engineering point of view, computer vision aims to build autonomous systems to perform some of the tasks which the human visual system can perform and even surpass it in many cases. Many vision tasks are related to extraction of 3D and temporal information from time varying 2D data, or in other words, videos. Of course, the two goals are intimately related.
capability of a Human Vision:
see the picture below and can you identify the picture (3-4 sec) and content in it .
although the image is blur you can still identify the picture and the content in it ..so this is how strong human brain is . We can visualize a image in blur condition , in condition where the image is taken 20 years ago illumination etc and this is because our strong context of mind and its complex analysis.
Now see the below image :
can you comment on the length by looking at the black pictures you may have develop a sense of small and big dimension but it is not true this happens because our visual system judge the sizeof an object by seeing objects around it.
Now Why Computer Vision?
We are living in a 3D world so we are continuously interacting in the 3D world and lot’s of activity is happening around us so if we could capture all this we could build some interesting solutions for our problems and CV is definitely gonna help us with that .
By saying that if you are motivated enough than we will complete the entire journey of CV in this blog within just 2 months keep an eye to embrace your capability and build solutions that are real life applicable direct from your laptop.