Hand tracking is an exciting computer vision application that involves detecting and tracking human hand gestures and movements in real-time.
In this Answer, we’ll explore how to perform hand tracking using OpenCV and the MediaPipe library. We’ll walk through the entire process, from setting up the environment to creating a Python script that tracks hands in a video.
Hand tracking has numerous applications, from virtual reality and gesture-based interfaces to sign language recognition and more. OpenCV is a popular computer vision library that provides tools for image and video processing, while MediaPipe is a powerful library developed by Google that offers various pre-trained models for tasks like face detection, hand tracking, and pose estimation.
Before we begin, make sure to have the following installed:
Python (3.6 or later)
OpenCV (cv2
)
MediaPipe (mediapipe
)
We can install the necessary libraries using pip
:
pip install opencv-python mediapipe
The MediaPipe Hands module provides a pre-trained model for hand tracking, which can detect and track the landmarks (key points) of the human hand in images or video frames. Each landmark corresponds to a specific point on the hand, such as fingertips, knuckles, and palm center.
Let’s start by importing the required libraries and initializing the MediaPipe Hands module:
import cv2import mediapipe as mp# Initialize MediaPipe Hands modulemp_hands = mp.solutions.handsmp_drawing = mp.solutions.drawing_utils# Specify the path to your video filevideo_path = 'video.mov'# Initialize video capturevidcap = cv2.VideoCapture(video_path)# Set the desired window width and heightwinwidth = 350winheight = 600
We’ll load the video file and process each frame for hand tracking:
while vidcap.isOpened():ret, frame = vidcap.read()if not ret:break# Convert the BGR image to RGBrgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
Now, let’s perform hand detection and landmark tracking using the MediaPipe Hands module:
# Process the frame for hand trackingwith mp_hands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5) as hands:process_frames = hands.process(rgb_frame)# Draw landmarks on the frameif process_frames.multi_hand_landmarks:for lm in process_frames.multi_hand_landmarks:mp_drawing.draw_landmarks(frame, lm, mp_hands.HAND_CONNECTIONS)
Finally, we’ll resize the frame and display the results:
# Resize the frame to the desired window sizeresized_frame = cv2.resize(frame, (winwidth, winheight))# Display the resized framecv2.imshow('Hand Tracking', resized_frame)# Exit loop by pressing 'q'if cv2.waitKey(1) & 0xFF == ord('q'):break
Here's the complete code of hand tracking implementing the above steps:
import cv2 import mediapipe as mp # Initialize mediapipe hands module mphands = mp.solutions.hands mpdrawing = mp.solutions.drawing_utils # Specify the path to your video file vidpath = 'video.mov' # Initialize video capture vidcap = cv2.VideoCapture(vidpath) # Set the desired window width and height winwidth = 350 winheight = 600 # Initialize hand tracking with mphands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5) as hands: while vidcap.isOpened(): ret, frame = vidcap.read() if not ret: break # Convert the BGR image to RGB rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Process the frame for hand tracking processFrames = hands.process(rgb_frame) # Draw landmarks on the frame if processFrames.multi_hand_landmarks: for lm in processFrames.multi_hand_landmarks: mpdrawing.draw_landmarks(frame, lm, mphands.HAND_CONNECTIONS) # Resize the frame to the desired window size resized_frame = cv2.resize(frame, (winwidth, winheight)) # Display the resized frame cv2.imshow('Hand Tracking', resized_frame) # Exit loop by pressing 'q' if cv2.waitKey(1) & 0xFF == ord('q'): break # Release the video capture and close windows vidcap.release() cv2.destroyAllWindows()
Here's an explanation for the above code:
Lines 1–2: In these lines, we import the necessary libraries for the hand tracking project.
Lines 4–6: Here, we initialize the MediaPipe Hands module and its drawing utilities. This module will be used for hand tracking, and the drawing utilities will help visualize the detected landmarks on the frames.
Lines 8–16: In these lines, we specify the path to the input video file and initialize a video capture object using OpenCV’s VideoCapture
class. Additionally, we set the desired dimensions for the display window.
Lines 18–19: This section initializes the hand tracking process using the mphands.Hands
context manager. It specifies the minimum confidence levels for both detection and tracking. The main tracking process is enclosed within a while
loop that reads frames from the video capture object.
Lines 20–34: This block of code handles the processing of each frame for hand tracking. It involves converting the frame’s color space, processing the frame using the hand tracking module, and drawing the detected landmarks on the frame if any hands are detected.
Lines 36–44: These lines handle the final steps of the process. The frame is resized to the desired window dimensions, displayed using cv2.imshow()
, and the loop continues until the "q’" key is pressed.
Lines 46–48: The final lines of the code release the video capture object and close any open windows, ensuring a clean exit from the program.
In this Answer, we explored how to perform hand tracking using OpenCV and the MediaPipe library. We learned how to set up the environment, load and process a video, detect and track hand landmarks, and display the results in a resized window. Hand tracking has a wide range of applications, and with the help of OpenCV and MediaPipe, we can easily integrate this functionality into our projects.
Free Resources