Computer vision is starting to take over the world as it opens another medium to interact with the computer. The foundational concept of computer vision has led to numerous real-life applications that have revolutionized various industries. Some of the remarkable applications of computer vision include:
Object detection and tracking: Computer Vision helps identify and locate objects in images or videos, which is fundamental in fields like autonomous vehicles, surveillance, and robotics.
Medical diagnosis: Computer vision aids in analyzing medical images such as X-rays, MRIs, and CT scans, assisting healthcare professionals in accurate and timely diagnosis.
Gesture recognition: Computer vision can interpret human gestures and movements, finding applications in sign language translation, virtual reality, and human-computer interaction.
Security and surveillance: Computer vision systems are employed for video surveillance, detecting suspicious activities, and enhancing public safety.
Text recognition (OCR): Optical character recognition (OCR) technology converts printed or handwritten text in images into editable and searchable formats used in document scanning and digitization.
These applications, along with continuous advancements in computer vision technology, have transformed various industries and continue to open up new possibilities for the future.
Computer vision is an interdisciplinary field enabling machines to interpret and understand visual information provided to them. It involves extracting meaningful insights and features from images and videos using techniques like image recognition, object detection, and image segmentation. Through advanced algorithms and artificial intelligence, it helps machines perceive and interpret the visual world, mimicking human vision to solve real-world challenges.
In this Answer, we will be going over how we can track hand movement by using specific libraries. Just as our hand movement can be recognized in some augmented reality (AR) software to incorporate more real-life experience. Let's look at what we are going to do.
In this program, we are going to track our hands by detecting landmarks, which can be further utilized in detecting gestures and other applications. The way landmarks are placed are visually demonstrated below.
In this program, we are going to use multiple libraries, which are:
cv2
: Used for implementing the functionality of OpenCV.
pip install opencv-python
numpy
: Used for numerical computations and array manipulation.
pip install numpy
mediapipe
: Used for creating visualizations and plots.
pip install matplotlib
import cv2 import numpy as np import mediapipe as mp # initialize mediapipe mpHands = mp.solutions.hands hands = mpHands.Hands(max_num_hands=2, min_detection_confidence=0.7) mpDraw = mp.solutions.drawing_utils def predictGesture(frame): framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Get hand landmark prediction result = hands.process(framergb) # Post-process the result if result.multi_hand_landmarks: landmarks = [] for handslms in result.multi_hand_landmarks: for lm in handslms.landmark: lmx = int(lm.x * x) lmy = int(lm.y * y) landmarks.append([lmx, lmy]) # Drawing landmarks on frames mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS) # Read the video file cap = cv2.VideoCapture("https://player.vimeo.com/external/422931879.sd.mp4?s=bad0c26c1a08c07393146e5ee033d22d27920d4a&profile_id=164&oauth2_token_id=57447761") while True: while cap.isOpened(): # Read each frame from the video ret, frame = cap.read() if not ret: break x, y, c = frame.shape # Flip the frame vertically frame = cv2.flip(frame, 1) # Show the final output predictGesture(frame) cv2.imshow("Output", frame) if cv2.waitKey(25) & 0xFF == ord('q'): break cap.set(cv2.CAP_PROP_POS_FRAMES, 0) # Release the video capture object and close the window cap.release() cv2.destroyAllWindows()
Line 1 – 3: In this section, we import the required libraries stated above that are used for image and video processing, numerical calculations, and hand landmark detection, respectively.
Line 5 – 7: This code section creates instances of the mpHands
and mpDraw
classes from the mediapipe
library.
mpHands
is used for hand landmark detection
mpDraw
is used for drawing hand landmarks on frames
hands
is an object of the Hands
class, which will be used to perform hand landmark detection. It is initialized with parameters max_num_hands=2
(maximum number of hands to detect) and min_detection_confidence=0.7
(minimum confidence threshold for hand detection).
Line 8: This line defines the function predictGesture
that takes a single frame as input and processes it for hand landmark detection and drawing.
Line 9: This line of code converts the input frame from BGR color format to RGB color format using cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
.
Line 11: This line of code processes the RGB frame using the Mediapipe hand landmark detection module, hands.process()
, to obtain the hand landmarks.
Line 13 – 21: This code section checks if hand landmarks were detected in the frame.
landmarks
: Initialize an empty list to store the (x, y) coordinates of each hand landmark which are then iterated over, using a for
loop.
lmx
: Calculate the x-coordinate of the landmark by multiplying its relative x-coordinate with the width of the frame.
lmy
: Calculate the y-coordinate of the landmark by multiplying its relative y-coordinate with the height of the frame.
landmarks.append([lmx, lmy])
: Add the (x, y) coordinates of the landmark to the landmarks
list.
mpDraw.draw_landmarks()
: Draw the detected hand landmarks and connections on the frame using the Mediapipe drawing utilities.
Line 23: This line creates a video capture object to read the video using cv2.VideoCapture()
from a specified path or URL.
Line 25: This line enters a loop to process each video frame until the video ends or an interruption occurs.
ret, frame = cap.read()
: Read the next frame from the video.
if not ret:
: Check if the frame was successfully read. If not (end of video), exit the loop.
Line 31: Get the width (x
), height (y
), and the number of channels (c
) of the frame using frame.shape
.
Line 36: This line displays the processed frame with hand landmarks and connections in a window named "Output" using cv2.imshow
.
Free Resources