NAV Navbar
  • OpenCV
  • OpenCV

    Check your OpenCV installation version

    import cv2
    print(cv2.__version__)
    

    OpenCV - Open Source Computer Vision is a library of programming functions mainly aimed at real-time computer vision.

    You can process images as well as run deep learning frameworks Tensorflow, Torch/PyTorch and Caffe in OpenCV.


    Images

    Reading & displaying an image in OpenCV

    import cv2
    
    img = cv2.imread("YOUR_IMAGE_LOCATION.jpg")
    # Display the image
    cv2.imshow("WINDOW_NAME", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Read and Display an image

    Functions we used here:

    Write or save an image

    Save the image

    cv2.imwrite('image_name.png', img)
    
    

    Use cv2.imwrite() to save an image. The first argument is the file name, and the second argument is the image you want to save.

    You can mention the file format while naming the file (.png or .jpg).


    Videos

    Reading a video

    import numpy as np
    import cv2
    cap = cv2.VideoCapture(0)
    while(True):
    # Capture frame-by-frame
        ret, frame = cap.read()
    # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Display the resulting frame
        cv2.imshow('frame',gray)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
    

    To capture a video, you need to create a VideoCapture object. Its argument can be either the device index or the name of a video file.


    Drawing Functions

    Drawing geometric shapes with OpenCV

    import numpy as np
    import cv2
    
    # Create a black image
    img = np.zeros((512,512,3), np.uint8)
    
    # Draw a diagonal blue line with thickness of 5 px
    
    img = cv2.line(img,(0,0),(511,511),(255,0,0),5)
    
    # Draw a rectangle
    img = cv2.rectangle(img,(384,0),(510,128),(0,255,0),3)
    
    # Draw a circle
    img = cv2.circle(img,(447,63), 63, (0,0,255), -1)
    
    # Draw an ellipse
    img = cv2.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
    
    # Draw a polygon
    pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
    pts = pts.reshape((-1,1,2))
    img = cv2.polylines(img,[pts],True,(0,255,255))
    
    # Write some text
    font = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv2.LINE_AA)
    
    cv2.imshow('image',img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    OpenCV allows us to perform all kind of operations like drawing shapes, writing text, positioning them, etc.

    Let us learn how is it done:


    Image Operations

    Accessing Pixel Values

    import cv2
    import numpy as np
    img = cv2.imread('YOUR_IMAGE.jpg')
    px = img[100,100]
    print px
    # [157 166 200]
    # accessing only blue pixel
    blue = img[100,100,0]
    print blue
    # 157
    
    # You can modify the pixel values the same way
    img[100,100] = [255,255,255]
    print img[100,100]
    # [255 255 255]
    

    Accessing Pixel Values

    Almost all the image related operations are mainly related to Numpy rather than OpenCV here, so a good knowledge of Numpy is required to write better-optimized code with OpenCV.

    Loading image is simple, lust use cv2.imread() function.

    You can access a pixel value by its row and column coordinates. For BGR image, it returns an array of Blue, Green and Red values. For a grayscale image, it's corresponding intensity is returned.

    You can modify the pixel value by assigning it new (B, G, R) values.

    Image Properties

    # Print image shape
    print img.shape
    # (342, 548, 3) would be different for your image
    
    # Print image size
    print img.size
    # 562248
    
    # Print image datatype
    print img.dtype
    # uint8
    

    Access Image Properties

    Image properties include the number of rows, the columns, and the channels, type of image data, number of pixels etc.

    The shape of the image is accessed via img.shape. It returns a tuple of the number of rows, columns and channels (if image is color).

    If the image is grayscale, tuple returned contains only the number of rows and columns. So it is a good method to check if the loaded image is a grayscale or a colored image.

    The total number of pixels is accessed by img.size.

    Image datatype is obtained by img.dtype.

    img.dtype is very important while debugging because a large number of errors in OpenCV-Python code is caused by invalid datatype.

    Image ROI

    # Read a region in an image
    roi = img[280:340, 330:390]
    
    # Place this region somewhere else
    img[273:333, 100:160] = roi
    

    Splitting and Merging Image Channels

    # Method 1
    b,g,r = cv2.split(img)
    img = cv2.merge((b,g,r))
    
    # Or, Method 2
    new_img = img[:, :, 0]
    

    Region of Interest

    Sometimes, you will have to play with a certain region of images. For eye detection in images, first, perform face detection over the image until the face is found, then search within the face region for eyes. This approach improves accuracy (because eyes are always on faces :D ) and performance (because we search for a small area).

    ROI is again obtained using Numpy indexing. Here we are selecting the ball and copying it to another region in the image.

    Splitting and Merging Image Channels

    The BGR channels of an image can be split into their individual planes when needed. Then, the individual channels can be merged back together to form a BGR image again.


    Arithmatic Operations on Images

    Adding Images

    # Make sure these images are of same size. 
    # If you do not have same sized images, then read the size of smaller images, select roi of same size from bigger images and use this roi below. 
    
    img1 = cv2.imread('YOUR_IMG1.png')
    img2 = cv2.imread('YOUR_IMG2.png')
    
    added_img = cv2.addWeighted(img1, 0.7, img2, 0.3, 0)
    
    cv2.imshow("Added Images", added_img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    messi

    Bitwise Operations

    # Load two images
    img1 = cv2.imread('messi5.jpg')
    img2 = cv2.imread('opencv_logo.png')
    # I want to put logo on top-left corner, So I create a ROI
    rows,cols,channels = img2.shape
    roi = img1[0:rows, 0:cols ]
    # Now create a mask of logo and create its inverse mask also
    img2gray = cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
    ret, mask = cv2.threshold(img2gray, 10, 255, cv2.THRESH_BINARY)
    mask_inv = cv2.bitwise_not(mask)
    # Now black-out the area of logo in ROI
    img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
    # Take only region of logo from logo image.
    img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
    # Put logo in ROI and modify the main image
    dst = cv2.add(img1_bg,img2_fg)
    img1[0:rows, 0:cols ] = dst
    cv2.imshow('res',img1)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Image Addition

    You can add two images by OpenCV function, cv2.add() or simply by numpy operation, res = img1 + img2. Both images should be of same depth and type, or second image can just be a scalar value.

    There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a saturated operation while Numpy addition is a modulo operation.

    In OpenCV: x = 250, y = 10, cv2.add(x, y) = 255 while in numpy behaviour is different: x + y = 260.

    Image Blending

    We use cv2.addWeighted to blend two images. In the code we have written, add img1 with 0.7 and img2 with 0.3 weight. Last 0 means, we are not adding any scalar (bias) values to these images.

    Bitwise Operations

    This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting any part of the image (as we will see in coming chapters), defining and working with non-rectangular ROI etc. Below we will see an example of how to change a particular region of an image.

    opencv logo

    We want to put OpenCV logo above an image. If we add two images, it will change color. If we blend it, we get a transparent effect. But we want it to be opaque. If it was a rectangular region, we could use ROI as we did in last chapter. But OpenCV logo is a not a rectangular shape.

    See the result below. Left image shows the mask we created. Right image shows the final result. For more understanding, display all the intermediate images in the above code, especially img1_bg and img2_fg.

    image


    Playing with Colors

    image

    Original image (a) and its channels with color: hue (b), saturation (c) and value or brightness (d). On the second row, each channel in grayscale (single channel image), respectively.

    There are more than 150 color-space conversion methods available in OpenCV. But we will look into only two which are most widely used ones, BGR ↔ Gray and BGR ↔ HSV

    For color conversion, we use the function cv2.cvtColor(input_image, flag), where the flag determines the type of conversion.

    Few useful flags:

    Tracking Object using Colors

    import cv2
    import numpy as np
    cap = cv2.VideoCapture(0)
    while(1):
        # Take each frame
        _, frame = cap.read()
        # Convert BGR to HSV
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        # define range of blue color in HSV
        lower_blue = np.array([110,50,50])
        upper_blue = np.array([130,255,255])
        # Threshold the HSV image to get only blue colors
        mask = cv2.inRange(hsv, lower_blue, upper_blue)
        # Bitwise-AND mask and original image
        res = cv2.bitwise_and(frame,frame, mask= mask)
        cv2.imshow('frame',frame)
        cv2.imshow('mask',mask)
        cv2.imshow('res',res)
        k = cv2.waitKey(5) & 0xFF
        if k == 27:
            break
    cv2.destroyAllWindows()
    

    Object Tracking

    Now we know how to convert BGR image to HSV, we can use this to extract a colored object. In HSV, it is more easier to represent a color than in BGR color-space. In our application, we will try to extract a blue colored object. So here is the method:

    On Right is the code which is commented in detail. And tracking Blue object should look like this:

    image


    Scaling

    import cv2
    import numpy as np
    img = cv2.imread('messi5.jpg')
    res = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
    #OR
    height, width = img.shape[:2]
    res = cv2.resize(img,(2*width, 2*height), interpolation = cv2.INTER_CUBIC)
    

    Translation

    import cv2
    import numpy as np
    img = cv2.imread('messi5.jpg',0)
    rows,cols = img.shape
    M = np.float32([[1,0,100],[0,1,50]])
    dst = cv2.warpAffine(img,M,(cols,rows))
    cv2.imshow('img',dst)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Perspective Transformation

    img = cv2.imread('sudoku.png')
    rows,cols,ch = img.shape
    pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
    pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
    M = cv2.getPerspectiveTransform(pts1,pts2)
    dst = cv2.warpPerspective(img,M,(300,300))
    plt.subplot(121),plt.imshow(img),plt.title('Input')
    plt.subplot(122),plt.imshow(dst),plt.title('Output')
    plt.show()
    

    Scaling

    Scaling is just resizing of the image. OpenCV comes with a function cv2.resize() for this purpose. The size of the image can be specified manually, or you can specify the scaling factor. Different interpolation methods are used. Preferable interpolation methods are cv2.INTER_AREA for shrinking and cv2.INTER_CUBIC (slow) & cv2.INTER_LINEAR for zooming. By default, interpolation method used is cv2.INTER_LINEAR for all resizing purposes.

    Translation

    Translation is the shifting of object's location. If you know the shift in $(x,y)$ direction, let it be $(t_x,t_y)$, you can create the transformation matrix $M$ as follows:

    $$ M = \begin{bmatrix} 0 & 0 & t_x \\ 1 & 1 & t_y \end{bmatrix} $$

    For warping and rotation check the documentation

    Perspective Transformation

    For perspective transformation, you need a 3x3 transformation matrix. Straight lines will remain straight even after the transformation. To find this transformation matrix, you need 4 points on the input image and corresponding points on the output image. Among these 4 points, 3 of them should not be collinear. Then transformation matrix can be found by the function cv2.getPerspectiveTransform. Then apply cv2.warpPerspective with this 3x3 transformation matrix.

    image


    Image Thresholding

    Adaptive Thresholding

    import cv2
    import numpy as np
    from matplotlib import pyplot as plt
    img = cv2.imread('sudoku.png',0)
    img = cv2.medianBlur(img,5)
    ret,th1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
    th2 = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
                cv2.THRESH_BINARY,11,2)
    th3 = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
                cv2.THRESH_BINARY,11,2)
    titles = ['Original Image', 'Global Thresholding (v = 127)',
                'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
    images = [img, th1, th2, th3]
    for i in xrange(4):
        plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
        plt.title(titles[i])
        plt.xticks([]),plt.yticks([])
    plt.show()
    

    Adaptive Thresholding

    There are other simpler methods, but we are skipping them.

    In Adaptive Thresholding, the algorithm calculates the threshold for a small region of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.

    It has three ‘special’ input params and only one output argument.

    Adaptive Method - It decides how thresholding value is calculated.

    Results:

    image


    Image Blurring

    Averaging

    result

    import cv2
    import numpy as np
    from matplotlib import pyplot as plt
    img = cv2.imread('opencv-logo-white.png')
    blur = cv2.blur(img,(5,5))
    plt.subplot(121),plt.imshow(img),plt.title('Original')
    plt.xticks([]), plt.yticks([])
    plt.subplot(122),plt.imshow(blur),plt.title('Blurred')
    plt.xticks([]), plt.yticks([])
    plt.show()
    

    Gaussian Blur

    blur = cv2.GaussianBlur(img,(5,5),0)
    

    Bilateral Filter

    blur = cv2.bilateralFilter(img,9,75,75)
    

    Bilateral Filter Results

    This is done by convolving the image with a normalized box filter. It simply takes the average of all the pixels under kernel area and replaces the central element. This is done by the function cv2.blur() or cv2.boxFilter(). Check the docs for more details about the kernel. We should specify the width and height of kernel. A 3x3 normalized box filter would look like below:

    $$ K = 1/9 \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} $$

    Gaussian Blur

    In this, instead of box filter, Gaussian kernel is used. It is done with the function, cv2.GaussianBlur(). We should specify the width and height of kernel which should be positive and odd. We also should specify the standard deviation in X and Y direction, sigmaX and sigmaY respectively. If only sigmaX is specified, sigmaY is taken as same as sigmaX. If both are given as zeros, they are calculated from kernel size. Gaussian blurring is highly effective in removing Gaussian noise from the image.

    Bilateral Filtering

    cv2.bilateralFilter() is highly effective in noise removal while keeping edges sharp.

    Remember this operation is slower compared to other filters, so do not use if you are looking for real-time performance.

    In future, we will cover faster methods as well.


    Canny Edge Detector

    Canny Edge Detection

    import cv2
    import numpy as np
    from matplotlib import pyplot as plt
    img = cv2.imread('messi5.jpg',0)
    edges = cv2.Canny(img,100,200)
    plt.subplot(121),plt.imshow(img,cmap = 'gray')
    plt.title('Original Image'), plt.xticks([]), plt.yticks([])
    plt.subplot(122),plt.imshow(edges,cmap = 'gray')
    plt.title('Edge Image'), plt.xticks([]), plt.yticks([])
    plt.show()
    

    result

    Canny Edge Detection

    It is a popular edge detection algorithm.

    1. It is a multi-stage algorithm and we will go through each stage.
    2. Noise Reduction.
    3. Since edge detection is susceptible to noise in the image, the first step is to remove the noise in the image with a 5x5 Gaussian filter.
    4. Finds Intensity Gradient of the Image.
    5. Performs Non-maximum Suppression - very important algorithm also used in DNN Object Detection algorithms.
    6. Hysteresis Thresholding.

    If you want to learn more, check out the documentation.

    OpenCV puts all the above in a single function, cv2.Canny(). The first argument is our input image. The second and the third arguments are our minVal and maxVal respectively. The third argument is aperture_size.


    Background Subtraction

    MOG

    import numpy as np
    import cv2
    cap = cv2.VideoCapture('vtest.avi')
    fgbg = cv2.createBackgroundSubtractorMOG() 
    while(1):
        ret, frame = cap.read()
        fgmask = fgbg.apply(frame)
        cv2.imshow('frame',fgmask)
        k = cv2.waitKey(30) & 0xff
        if k == 27:
            break
    cap.release()
    cv2.destroyAllWindows()
    

    Background subtraction is a major preprocessing steps in many vision-based applications. For example, consider the cases like visitor counter where a static camera takes the number of visitors entering or leaving the room, or a traffic camera extracting information about the vehicles etc. In all these cases, first, you need to extract the person or vehicles alone. Technically, you need to extract the moving foreground from static background.

    Original Image

    image

    Resultant Image

    image


    Last Note

    MOG2 - Background Subtraction

    import numpy as np
    import cv2
    cap = cv2.VideoCapture('vtest.avi')
    fgbg = cv2.createBackgroundSubtractorMOG() 
    while(1):
        ret, frame = cap.read()
        fgmask = fgbg.apply(frame)
        cv2.imshow('frame',fgmask)
        k = cv2.waitKey(30) & 0xff
        if k == 27:
            break
    cap.release()
    cv2.destroyAllWindows()
    

    The algorithms available in OpenCV are numerous, and we need to make sure we know the best of them, specially the ones relevant to us.

    We have tried and cover the basic ones here.

    As we advance in our course for MLBLR, we will add more sections here.

    We also suggest you tell us what else you'd like to learn right now.

    You can find some awesome tutorials here:

    The second link has awesome resources covering many other things along with OpenCV. Do check it out!