Overview

The sole motive of Computer Vision is to train and make the computers self-reliant in interpreting and understanding the details of the visual world. Understanding in this context signifies the transformation of visual images (the input of the retina) into an array of pixels and numerical values reflecting the shades of red, green and blue. When fed with the digital images or videos from the camera, the system can accurately identify and distinguish between objects, giving them the means to practically ‘see’ their surroundings and then react as trained.

Basic forms of computer vision includes : Image segmentation, Object detection, Facial recognition, Edge detection, Pattern detection, Image classification and Feature matching.

Computer vision is one of the areas in Machine Learning where core concepts are already being integrated into major products that we use every day, from facial recognising systems and self driving cars to even health care. Computer vision also plays an important role in augmented and mixed reality, the technology that enables computing devices such as smartphones, tablets and smart glasses to overlay and embed virtual objects on real world imagery.

Summer Projects

Indian Sign Language Detection

Sign language is one of the means to communicate and to convey many things in an easy way. By extracting the region of interest and using the features of the human hand, we executed the project of “Indian Sign Language Detection”. Dumb and deaf people use this sign language to a significant extent.

THE FUNCTION

Provided a person makes the gestures in real-time or by feeding an image to the code, the aim is to detect which number he/she wants to convey. Here are a few approaches by which we implemented this task.

Methods Used

Area ratios and Hull Defects: As we have secured the contour (boundary) of the hand, we utilize the convexity defects, in simple words: the cavities in the object. These defects help us to classify half of the signs between 1-9. The remaining are discriminated based on the ratios taken in between the area of contour and convex hull.

Angle between the fingertips: As mentioned above, the convex hull encloses the given points. These vertices, in the case of a hand, are the fingertips. The coordinates of the following help to calculate the angle. The number of angles found between the tips is one less than the number of fingers. Thresholds imposed help us to avoid some undesired values of angles getting included in the count. Finally, with their number and signs, it is possible to identify the gestures.

Applications

Can be used to teach this language to deaf and dumb people.
One can identify various gestures for hands-free usage thereby being user-friendly.

GitHub

Videos

Scratch Algos

Canny Edge Detection

Canny is used in identifying the edges of a given image and has a wide application in image processing tasks. We implement the algorithm in five steps shown in the flowchart. In the whole process we classify the edge pixels with the help of gradient applied on a gray scale image and then picking from them which are best suitable for the edge in the data available after classification for accuracy so as to avoid an undesired result.

OpenCV gives direct functions of some algorithms used to implement in our projects. We tried to execute them from scratch with the help of simple python and numpy functions with the secondary aim of curbing the loops as much as we could for our own understanding that will take us deep in the subject.

GitHub

Output

Connected Component Analysis (Made from scratch) :

CCA(Connected Component Analysis) or CCL (Connected Component Labelling) in computer vision deals with detection of blobs or connected components from a binary image. The goal is to label each connected component (or blob) with the same label and then because each blob will have a unique number, we can infer the total number of individual blobs. CCA deals with two types of connectivities – 4-neighbour connectivity, and 8 neighbour connectivity as shown below.

The algorithm proceeds in two passes through each pixel in the image.

First pass:

For each non-zero pixel, we check it’s neighbours.
If the neighbours aren’t labelled, the current pixel is assigned a new label
If any one of the neighbours is labelled, then the current pixel is assigned the same label as the neighbour.
It may happen that the neighbouring pixels have different labels. In that case, the current pixel would be assigned the lowest label from the neighbouring pixels.
An equivalency list is maintained for the neighbouring pixels having different labels. It’d be cleared in 2nd pass.

Second pass:

Check each labelled pixel for their correspondence with the equivalence list.
If the current pixel has an equivalence label, change it.

GitHub

OUTPUT

Affine Transformation

Logic

We get coordinates of each pixel of the image.We multiply them with respective matrix,as requiredThen, we observe respective transformation

Step By Step program run through

The program will what ask type of transformation, do you want, give the required response.
If you choose rotation then,it will ask the angle to rotate ,if you choose translation, it will ask the distances in x and y direction,if you choose custom then, you will fill all 6 required matrix elements.
The result will be shown.

Math
The matrix multiplication performed is shown.

Notes

There can be non – integral coordinates after the multiplication,so either we can convert them to, integers or your interpolation techniques.
Using interpolation should be preferred, but since we had no knowledge of interpolation at this stage, we simply converted them to integers.

Code

Output

Hough Line Transform

As the name suggests, Hough line algorithm is used to detect the lines in an image. At times avoided to apply on the tasks as it is a slow process if we are capturing frames continuously.

It uses a simple concept of line that given a line we can access infinite points. Similarly given a point we can draw an infinite number of lines through it.
So we identify the edge points and analyze them by the lines we can draw through them in the “hough space”. The point where these lines intersect gives us the information the accessed point in the image lie upon.

Note: While working in the hough space we use the normal form instead of the slope-intercept and because of the slope limitation.

GItHub

Output

Virtual Drawing Pad

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos and can help us in creating fascinating things. The virtual drawing pad deals with assigning an object ( whose colour doesn’t match the background, as we are proceeding with HSV segmentation ) as our stylus, and then using it to draw on the screen by merely moving in front of the camera.

Contour detection of the region of interest followed by using a rectangle of minimum area or a circle to trap the stylus and to fetch the coordinates of the centre.
Canny edge in addition to contour detection can be used to execute the task.
For avoiding hard codes, a stylus image is fed and the HSV values are extracted from the image for stylus detection. Further process is the same as the rest ones.

Another approach we applied was to make the user place the stylus in a box which would again detect it and use it’s HSV values for the task.

The other method is to provide trackbars for upper and lower HSV values, so initially the user needs to make the mask by himself to get desired results.

The basic idea is that image thresholding using a HSV range yields the mask of our stylus, The aim is to track the center. After getting the centre coordinates of each frame, we need to join these points. HSV color space is preferred over RGB as it even takes into account the brightness conditions and saturation of color. Here are a few of the methods which can be used.