Sign language translator using OpenCV

Sign language is a crucial communication tool for individuals with hearing impairments. In this Answer, we’ll explore how to create a real-time hand gesture recognition application, known as the sign language converter, using Python and the Tkinter library for graphical user interface (GUI) design. This application bridges the gap between sign language and text, allowing communication between those who use sign language and those who may not understand it.

Technologies used

Before diving into the code, let’s understand the key technologies and libraries that we will use.

  1. Tkinter: A GUI library in Python that provides a set of tools for creating interactive graphical user interfaces.

  2. OpenCV: An open-source computer vision library that allows us to work with images and videos and perform various image-processing tasks.

  3. Mediapipe: A library developed by Google that provides ready-to-use solutions for various tasks, including hand tracking and pose estimation.

  4. Pyttsx3: A text-to-speech conversion library that enables the application to provide audio feedback.

Setting up the GUI

We begin by importing the necessary libraries and creating the main application window with appropriate dimensions, background color, and title.

from tkinter import *
from PIL import Image, ImageTk
import cv2
from tkinter import filedialog
import mediapipe as mp
import pyttsx3
win = Tk()
width = win.winfo_screenwidth()
height = win.winfo_screenheight()
win.geometry("%dx%d" % (width, height))
win.configure(bg="#FFFFF7")
win.title('Sign Language Converter')

Defining global variables

Several global variables are defined to store various elements of the application, such as images, hand-tracking results, GUI components, and more.

global img, finalImage, finger_tips, thumb_tip, cap, image, rgb, hand, results, _, w, h, status, mpDraw, mpHands, hands, label1, btn, btn2

Initializing hand detection

The wine function initializes the hand detection setup, configuring webcam access using OpenCV’s VideoCapture and setting up the Hands object from the Mediapipe library.

def wine():
global finger_tips,thumb_tip,mpDraw,mpHands,cap,w,h,hands,label1,label1,check,img
finger_tips = [8, 12, 16, 20]
thumb_tip = 4
w = 500
h = 400
label1 = Label(win, width=w, height=h,bg="#FFFFF7")
label1.place(x=40, y=200)
mpHands = mp.solutions.hands # From different
hands = mpHands.Hands() # The hands object from Hands Solution
mpDraw = mp.solutions.drawing_utils
cap = cv2.VideoCapture(0)

Gesture recognition and interpretation

The live function processes webcam frames, detects hand landmarks using Mediapipe, and interprets hand gestures based on finger positions and orientations. We will see the gestures like “STOP”, “OKAY”, “VICTORY”, and more, which will be recognized based on the position of landmarks.

def live():
global v
global upCount
global cshow,img
cshow=0
upCount = StringVar()
_, img = cap.read()
img = cv2.resize(img, (w, h))
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)
if results.multi_hand_landmarks:
for hand in results.multi_hand_landmarks:
lm_list = []
for id, lm in enumerate(hand.landmark):
lm_list.append(lm)
finger_fold_status = []
for tip in finger_tips:
x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h)
if lm_list[tip].x < lm_list[tip - 2].x:
finger_fold_status.append(True)
else:
finger_fold_status.append(False)
print(finger_fold_status)
x, y = int(lm_list[8].x * w), int(lm_list[8].y * h)
print(x, y)
# stop
if lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'STOP ! Dont move.'
upCount.set('STOP ! Dont move.')
print('STOP ! Dont move.')
# okay
elif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'Perfect , You did a great job.'
print('Perfect , You did a great job.')
upCount.set('Perfect , You did a great job.')
# spidey
elif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'Good to see you.'
print(' Good to see you. ')
upCount.set('Good to see you.')
# Point
elif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
upCount.set('You Come here.')
print("You Come here.")
cshow = 'You Come here.'
# Victory
elif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
upCount.set('Yes , we won.')
print("Yes , we won.")
cshow = 'Yes , we won.'
# Left
elif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \
lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x:
upCount.set('Move Left')
print(" MOVE LEFT")
cshow = 'Move Left'
# Right
elif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \
lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x:
upCount.set('Move Right')
print("Move RIGHT")
cshow = 'Move Right'
if all(finger_fold_status):
# like
if lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
print("I like it")
upCount.set('I Like it')
cshow = 'I Like it'
# Dislike
elif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
upCount.set('I dont like it.')
print(" I dont like it.")
cshow = 'I dont like it.'
mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS)
cv2.putText(rgb, f'{cshow}', (10, 50),
cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2)
image = Image.fromarray(rgb)
finalImage = ImageTk.PhotoImage(image)
label1.configure(image=finalImage)
label1.image = finalImage
win.after(1, live)
crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE )
status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE )
status.place(x=400,y=700)
crr.place(x=120,y=700)

Integrating voice feedback

The voice function uses the Pyttsx3 library to provide audio feedback. When called, it converts the recognized gesture message into speech and plays it using the system’s default audio output.

def voice():
engine = pyttsx3.init()
engine.say((upCount.get()))
engine.runAndWait()

Video playback and gesture recognition

The video function allows us to load external video files. It opens a file dialog using the filedialog library, captures video frames from the selected video file, and calls the live function to perform gesture recognition on the video frames.

def video():
global cap, ex, label1
filename = filedialog.askopenfilename(initialdir="/", title="Select file ",
filetypes=(("mp4 files", ".mp4"), ("all files", ".")))
cap = cv2.VideoCapture(filename)
live()

Adding flexibility with widgets

Buttons are added to the GUI using Tkinter. These buttons provide various functionalities like switching between live video and loaded video, enabling audio feedback, changing the webcam source, and exiting the application.

Button(win, text='Live', ... , command=live).place(x=width-250, y=350)
Button(win, text='Video', ... , command=video).place(x=width-250, y=400)
Button(win, text='Sound', ... , command=voice).place(x=width-250, y=450)
Button(win, text='Change Vid', ... , command=lbl).place(x=width-250, y=500)
Button(win, text='Change Cam', ... , command=lbl2).place(x=width-250, y=550)
Button(win, text='Exit', ... , command=win.destroy).place(x=width-250, y=600)

Creating text label

A label is created to display the current gesture status on the GUI. The textvariable attribute dynamically updates the label’s text with the recognized gesture message, providing real-time visual feedback.

Label(win, textvariable=upCount, ... ).place(x=400, y=700)

Implementation

Let's see now how the project works like by running it.

from tkinter import *
from PIL import Image, ImageTk
import cv2
from tkinter import filedialog
import mediapipe as mp
import pyttsx3


win = Tk()
width=win.winfo_screenwidth()
height=win.winfo_screenheight()
win.geometry("%dx%d" % (width, height))
win.configure(bg="#FFFFF7")
win.title('Sign Language Converter')

global img,finalImage,finger_tips,thumb_tip,cap, image, rgb, hand, results, _, w, \
       h,status,mpDraw,mpHands,hands,label1,btn,btn2

cap=None

Label(win,text='Sign Language Converter',font=('Helvatica',18,'italic'),bd=5,bg='#199ef3',fg='white',relief=SOLID,width=200 )\
     .pack(pady=15,padx=300)

def wine():
    global finger_tips, thumb_tip, mpDraw, mpHands, cap, w, h, hands, label1, check, img
    finger_tips = [8, 12, 16, 20]
    thumb_tip = 4
    w = 500
    h = 400

    if cap:
        cap.release()  # Release the previous video capture

    label1 = Label(win, width=w, height=h, bg="#FFFFF7")
    label1.place(x=40, y=200)
    mpHands = mp.solutions.hands
    hands = mpHands.Hands()
    mpDraw = mp.solutions.drawing_utils
    cap = cv2.VideoCapture(0)


###########################################Detection##########################################
def live():
    global v
    global upCount
    global cshow,img
    cshow=0
    upCount = StringVar()
    _, img = cap.read()

    img = cv2.resize(img, (w, h))
    rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(rgb)
    if results.multi_hand_landmarks:
        for hand in results.multi_hand_landmarks:
            lm_list = []
            for id, lm in enumerate(hand.landmark):
                lm_list.append(lm)
            finger_fold_status = []

            for tip in finger_tips:
                x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h)
                if lm_list[tip].x < lm_list[tip - 2].x:
                    finger_fold_status.append(True)
                else:
                    finger_fold_status.append(False)

            print(finger_fold_status)
            x, y = int(lm_list[8].x * w), int(lm_list[8].y * h)
            print(x, y)
            # stop
            if lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
                    lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
                    lm_list[5].x:
                cshow = 'STOP ! Dont move.'
                upCount.set('STOP ! Dont move.')
                print('STOP ! Dont move.')
            # okay
            elif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \
                    lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
                    lm_list[5].x:
                cshow = 'Perfect , You did  a great job.'
                print('Perfect , You did  a great job.')
                upCount.set('Perfect , You did  a great job.')

            # spidey
            elif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
                    lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
                    lm_list[5].x:
                cshow = 'Good to see you.'
                print(' Good to see you. ')
                upCount.set('Good to see you.')

            # Point
            elif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
                    lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
                upCount.set('You Come here.')
                print("You Come here.")
                cshow = 'You Come here.'

            # Victory
            elif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
                    lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
                upCount.set('Yes , we won.')
                print("Yes , we won.")
                cshow = 'Yes , we won.'

            # Left
            elif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \
                    lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x:
                upCount.set('Move Left')
                print(" MOVE LEFT")
                cshow = 'Move Left'
            # Right
            elif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \
                    lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x:
                upCount.set('Move Right')
                print("Move RIGHT")
                cshow = 'Move Right'
            if all(finger_fold_status):
                # like
                if lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
                    print("I like it")
                    upCount.set('I Like it')
                    cshow = 'I Like it'
                # Dislike
                elif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
                    upCount.set('I dont like it.')
                    print(" I dont like it.")
                    cshow = 'I dont like it.'

            mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS)
        cv2.putText(rgb, f'{cshow}', (10, 50),
                cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2)

    image = Image.fromarray(rgb)
    finalImage = ImageTk.PhotoImage(image)
    label1.configure(image=finalImage)
    label1.image = finalImage
    win.after(1, live)
    crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE )
    status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE )

    status.place(x=400,y=700)
    crr.place(x=120,y=700)

def voice():

    engine = pyttsx3.init()
    engine.say((upCount.get()))
    engine.runAndWait()

def video():
    global cap, ex, label1

    if cap:
        cap.release()  # Release the previous video capture

    filename = filedialog.askopenfilename(initialdir="/", title="Select file ",
                                          filetypes=(("mp4 files", ".mp4"), ("all files", ".")))
    cap = cv2.VideoCapture(filename)
    w = 500
    h = 400
    label1 = Label(win, width=w, height=h, relief=GROOVE)
    label1.place(x=40, y=200)
    live()

wine()

Button(win, text='Live',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=live)\
      .place(x=width-250,y=400)
Button(win, text='Video',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command= video)\
      .place(x=width-250,y=450)
Button(win,text='Sound',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=voice)\
      .place(x=width-250,y=500)
Button(win,text='Change Vid',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=video)\
      .place(x=width-250,y=550)
Button(win,text='Exit',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=win.destroy)\
      .place(x=width-250,y=600)

win.mainloop()

Due to the docker environment restrictions, the live functionality may not work here; however, running the application locally should function correctly.

Output

This is how the live button works.

1 of 3

And for the video case, it extracts frames from the selected video file and employs the live function to assess these frames for the purpose of recognizing gestures.

Conclusion

This project shows how technology can bring people together, no matter how they communicate. By using Python, Tkinter, OpenCV, Mediapipe, and Pyttsx3, we've built something that shows how software can create positive change and understanding between different ways of talking.

However, it's important to realize that sign language has a wide variety of gestures, and manually detecting landmarks for each gesture might not be practical. This is where machine learning (ML) comes into play. ML can help us create a system that learns to recognize different signs on its own.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved