Implementation of hand gesture detection using OpenCV

Computer vision is starting to take over the world as it opens another medium to interact with the computer. The foundational concept of computer vision has led to numerous real-life applications that have revolutionized various industries. Some of the remarkable applications of computer vision include:

  • Object detection and tracking: Computer Vision helps identify and locate objects in images or videos, which is fundamental in fields like autonomous vehicles, surveillance, and robotics.

  • Medical diagnosis: Computer vision aids in analyzing medical images such as X-rays, MRIs, and CT scans, assisting healthcare professionals in accurate and timely diagnosis.

  • Gesture recognition: Computer vision can interpret human gestures and movements, finding applications in sign language translation, virtual reality, and human-computer interaction.

  • Security and surveillance: Computer vision systems are employed for video surveillance, detecting suspicious activities, and enhancing public safety.

  • Text recognition (OCR): Optical character recognition (OCR) technology converts printed or handwritten text in images into editable and searchable formats used in document scanning and digitization.

These applications, along with continuous advancements in computer vision technology, have transformed various industries and continue to open up new possibilities for the future.

What is computer vision?

Computer vision is an interdisciplinary field enabling machines to interpret and understand visual information provided to them. It involves extracting meaningful insights and features from images and videos using techniques like image recognition, object detection, and image segmentation. Through advanced algorithms and artificial intelligence, it helps machines perceive and interpret the visual world, mimicking human vision to solve real-world challenges.

In this Answer, we will be going over how we can track hand movement by using specific libraries. Just as our hand movement can be recognized in some augmented reality (AR) software to incorporate more real-life experience. Let's look at what we are going to do.

Program explanation

In this program, we are going to track our hands by detecting landmarks, which can be further utilized in detecting gestures and other applications. The way landmarks are placed are visually demonstrated below.

Illustration of how hand landmarks work.
Illustration of how hand landmarks work.

Libraries used

In this program, we are going to use multiple libraries, which are:

  • cv2: Used for implementing the functionality of OpenCV.

pip install opencv-python
  • numpy: Used for numerical computations and array manipulation.

pip install numpy
  • mediapipe: Used for creating visualizations and plots.

pip install matplotlib

Complete code implementation

import cv2
import numpy as np
import mediapipe as mp
# initialize mediapipe
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=2, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils
def predictGesture(frame):
    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # Get hand landmark prediction
    result = hands.process(framergb)
    # Post-process the result
    if result.multi_hand_landmarks:
        landmarks = []
        for handslms in result.multi_hand_landmarks:
            for lm in handslms.landmark:
                lmx = int(lm.x * x)
                lmy = int(lm.y * y)
                landmarks.append([lmx, lmy])
                # Drawing landmarks on frames
            mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)
# Read the video file
cap = cv2.VideoCapture("https://player.vimeo.com/external/422931879.sd.mp4?s=bad0c26c1a08c07393146e5ee033d22d27920d4a&profile_id=164&oauth2_token_id=57447761")
while True:
    while cap.isOpened():
        # Read each frame from the video
        ret, frame = cap.read()
        if not ret:
            break

        x, y, c = frame.shape
        # Flip the frame vertically
        frame = cv2.flip(frame, 1)
        # Show the final output
        predictGesture(frame)
        cv2.imshow("Output", frame)
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break

    cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
# Release the video capture object and close the window
cap.release()
cv2.destroyAllWindows()

Code explanation

  • Line 1 – 3: In this section, we import the required libraries stated above that are used for image and video processing, numerical calculations, and hand landmark detection, respectively.

  • Line 5 – 7: This code section creates instances of the mpHands and mpDraw classes from the mediapipe library.

    • mpHands is used for hand landmark detection

    • mpDraw is used for drawing hand landmarks on frames

    • hands is an object of the Hands class, which will be used to perform hand landmark detection. It is initialized with parameters max_num_hands=2 (maximum number of hands to detect) and min_detection_confidence=0.7 (minimum confidence threshold for hand detection).

  • Line 8: This line defines the function predictGesture that takes a single frame as input and processes it for hand landmark detection and drawing.

  • Line 9: This line of code converts the input frame from BGR color format to RGB color format using cv2.cvtColor(frame, cv2.COLOR_BGR2RGB).

  • Line 11: This line of code processes the RGB frame using the Mediapipe hand landmark detection module, hands.process(), to obtain the hand landmarks.

  • Line 13 – 21: This code section checks if hand landmarks were detected in the frame.

    • landmarks: Initialize an empty list to store the (x, y) coordinates of each hand landmark which are then iterated over, using a for loop.

    • lmx: Calculate the x-coordinate of the landmark by multiplying its relative x-coordinate with the width of the frame.

    • lmy: Calculate the y-coordinate of the landmark by multiplying its relative y-coordinate with the height of the frame.

    • landmarks.append([lmx, lmy]): Add the (x, y) coordinates of the landmark to the landmarks list.

    • mpDraw.draw_landmarks(): Draw the detected hand landmarks and connections on the frame using the Mediapipe drawing utilities.

  • Line 23: This line creates a video capture object to read the video using cv2.VideoCapture() from a specified path or URL.

  • Line 25: This line enters a loop to process each video frame until the video ends or an interruption occurs.

    • ret, frame = cap.read(): Read the next frame from the video.

    • if not ret:: Check if the frame was successfully read. If not (end of video), exit the loop.

  • Line 31: Get the width (x), height (y), and the number of channels (c) of the frame using frame.shape.

  • Line 36: This line displays the processed frame with hand landmarks and connections in a window named "Output" using cv2.imshow.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved