Sign language is a crucial communication tool for individuals with hearing impairments. In this Answer, we’ll explore how to create a real-time hand gesture recognition application, known as the sign language converter, using Python and the Tkinter library for graphical user interface (GUI) design. This application bridges the gap between sign language and text, allowing communication between those who use sign language and those who may not understand it.
Before diving into the code, let’s understand the key technologies and libraries that we will use.
Tkinter: A GUI library in Python that provides a set of tools for creating interactive graphical user interfaces.
OpenCV: An open-source computer vision library that allows us to work with images and videos and perform various image-processing tasks.
Mediapipe: A library developed by Google that provides ready-to-use solutions for various tasks, including hand tracking and pose estimation.
Pyttsx3: A text-to-speech conversion library that enables the application to provide audio feedback.
We begin by importing the necessary libraries and creating the main application window with appropriate dimensions, background color, and title.
from tkinter import *from PIL import Image, ImageTkimport cv2from tkinter import filedialogimport mediapipe as mpimport pyttsx3win = Tk()width = win.winfo_screenwidth()height = win.winfo_screenheight()win.geometry("%dx%d" % (width, height))win.configure(bg="#FFFFF7")win.title('Sign Language Converter')
Several global variables are defined to store various elements of the application, such as images, hand-tracking results, GUI components, and more.
global img, finalImage, finger_tips, thumb_tip, cap, image, rgb, hand, results, _, w, h, status, mpDraw, mpHands, hands, label1, btn, btn2
The wine
function initializes the hand detection setup, configuring webcam access using OpenCV’s VideoCapture
and setting up the Hands
object from the Mediapipe library.
def wine():global finger_tips,thumb_tip,mpDraw,mpHands,cap,w,h,hands,label1,label1,check,imgfinger_tips = [8, 12, 16, 20]thumb_tip = 4w = 500h = 400label1 = Label(win, width=w, height=h,bg="#FFFFF7")label1.place(x=40, y=200)mpHands = mp.solutions.hands # From differenthands = mpHands.Hands() # The hands object from Hands SolutionmpDraw = mp.solutions.drawing_utilscap = cv2.VideoCapture(0)
The live
function processes webcam frames, detects hand landmarks using Mediapipe, and interprets hand gestures based on finger positions and orientations. We will see the gestures like “STOP”, “OKAY”, “VICTORY”, and more, which will be recognized based on the position of landmarks.
def live():global vglobal upCountglobal cshow,imgcshow=0upCount = StringVar()_, img = cap.read()img = cv2.resize(img, (w, h))rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)results = hands.process(rgb)if results.multi_hand_landmarks:for hand in results.multi_hand_landmarks:lm_list = []for id, lm in enumerate(hand.landmark):lm_list.append(lm)finger_fold_status = []for tip in finger_tips:x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h)if lm_list[tip].x < lm_list[tip - 2].x:finger_fold_status.append(True)else:finger_fold_status.append(False)print(finger_fold_status)x, y = int(lm_list[8].x * w), int(lm_list[8].y * h)print(x, y)# stopif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'STOP ! Dont move.'upCount.set('STOP ! Dont move.')print('STOP ! Dont move.')# okayelif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'Perfect , You did a great job.'print('Perfect , You did a great job.')upCount.set('Perfect , You did a great job.')# spideyelif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'Good to see you.'print(' Good to see you. ')upCount.set('Good to see you.')# Pointelif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:upCount.set('You Come here.')print("You Come here.")cshow = 'You Come here.'# Victoryelif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:upCount.set('Yes , we won.')print("Yes , we won.")cshow = 'Yes , we won.'# Leftelif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x:upCount.set('Move Left')print(" MOVE LEFT")cshow = 'Move Left'# Rightelif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x:upCount.set('Move Right')print("Move RIGHT")cshow = 'Move Right'if all(finger_fold_status):# likeif lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:print("I like it")upCount.set('I Like it')cshow = 'I Like it'# Dislikeelif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:upCount.set('I dont like it.')print(" I dont like it.")cshow = 'I dont like it.'mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS)cv2.putText(rgb, f'{cshow}', (10, 50),cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2)image = Image.fromarray(rgb)finalImage = ImageTk.PhotoImage(image)label1.configure(image=finalImage)label1.image = finalImagewin.after(1, live)crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE )status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE )status.place(x=400,y=700)crr.place(x=120,y=700)
The voice
function uses the Pyttsx3
library to provide audio feedback. When called, it converts the recognized gesture message into speech and plays it using the system’s default audio output.
def voice():engine = pyttsx3.init()engine.say((upCount.get()))engine.runAndWait()
The video
function allows us to load external video files. It opens a file dialog using the filedialog
library, captures video frames from the selected video file, and calls the live
function to perform gesture recognition on the video frames.
def video():global cap, ex, label1filename = filedialog.askopenfilename(initialdir="/", title="Select file ",filetypes=(("mp4 files", ".mp4"), ("all files", ".")))cap = cv2.VideoCapture(filename)live()
Buttons are added to the GUI using Tkinter. These buttons provide various functionalities like switching between live video and loaded video, enabling audio feedback, changing the webcam source, and exiting the application.
Button(win, text='Live', ... , command=live).place(x=width-250, y=350)Button(win, text='Video', ... , command=video).place(x=width-250, y=400)Button(win, text='Sound', ... , command=voice).place(x=width-250, y=450)Button(win, text='Change Vid', ... , command=lbl).place(x=width-250, y=500)Button(win, text='Change Cam', ... , command=lbl2).place(x=width-250, y=550)Button(win, text='Exit', ... , command=win.destroy).place(x=width-250, y=600)
A label is created to display the current gesture status on the GUI. The textvariable
attribute dynamically updates the label’s text with the recognized gesture message, providing real-time visual feedback.
Label(win, textvariable=upCount, ... ).place(x=400, y=700)
Let's see now how the project works like by running it.
from tkinter import * from PIL import Image, ImageTk import cv2 from tkinter import filedialog import mediapipe as mp import pyttsx3 win = Tk() width=win.winfo_screenwidth() height=win.winfo_screenheight() win.geometry("%dx%d" % (width, height)) win.configure(bg="#FFFFF7") win.title('Sign Language Converter') global img,finalImage,finger_tips,thumb_tip,cap, image, rgb, hand, results, _, w, \ h,status,mpDraw,mpHands,hands,label1,btn,btn2 cap=None Label(win,text='Sign Language Converter',font=('Helvatica',18,'italic'),bd=5,bg='#199ef3',fg='white',relief=SOLID,width=200 )\ .pack(pady=15,padx=300) def wine(): global finger_tips, thumb_tip, mpDraw, mpHands, cap, w, h, hands, label1, check, img finger_tips = [8, 12, 16, 20] thumb_tip = 4 w = 500 h = 400 if cap: cap.release() # Release the previous video capture label1 = Label(win, width=w, height=h, bg="#FFFFF7") label1.place(x=40, y=200) mpHands = mp.solutions.hands hands = mpHands.Hands() mpDraw = mp.solutions.drawing_utils cap = cv2.VideoCapture(0) ###########################################Detection########################################## def live(): global v global upCount global cshow,img cshow=0 upCount = StringVar() _, img = cap.read() img = cv2.resize(img, (w, h)) rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = hands.process(rgb) if results.multi_hand_landmarks: for hand in results.multi_hand_landmarks: lm_list = [] for id, lm in enumerate(hand.landmark): lm_list.append(lm) finger_fold_status = [] for tip in finger_tips: x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h) if lm_list[tip].x < lm_list[tip - 2].x: finger_fold_status.append(True) else: finger_fold_status.append(False) print(finger_fold_status) x, y = int(lm_list[8].x * w), int(lm_list[8].y * h) print(x, y) # stop if lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \ lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \ lm_list[5].x: cshow = 'STOP ! Dont move.' upCount.set('STOP ! Dont move.') print('STOP ! Dont move.') # okay elif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \ lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \ lm_list[5].x: cshow = 'Perfect , You did a great job.' print('Perfect , You did a great job.') upCount.set('Perfect , You did a great job.') # spidey elif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \ lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \ lm_list[5].x: cshow = 'Good to see you.' print(' Good to see you. ') upCount.set('Good to see you.') # Point elif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \ lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y: upCount.set('You Come here.') print("You Come here.") cshow = 'You Come here.' # Victory elif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \ lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y: upCount.set('Yes , we won.') print("Yes , we won.") cshow = 'Yes , we won.' # Left elif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \ lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x: upCount.set('Move Left') print(" MOVE LEFT") cshow = 'Move Left' # Right elif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \ lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x: upCount.set('Move Right') print("Move RIGHT") cshow = 'Move Right' if all(finger_fold_status): # like if lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y: print("I like it") upCount.set('I Like it') cshow = 'I Like it' # Dislike elif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y: upCount.set('I dont like it.') print(" I dont like it.") cshow = 'I dont like it.' mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS) cv2.putText(rgb, f'{cshow}', (10, 50), cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2) image = Image.fromarray(rgb) finalImage = ImageTk.PhotoImage(image) label1.configure(image=finalImage) label1.image = finalImage win.after(1, live) crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE ) status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE ) status.place(x=400,y=700) crr.place(x=120,y=700) def voice(): engine = pyttsx3.init() engine.say((upCount.get())) engine.runAndWait() def video(): global cap, ex, label1 if cap: cap.release() # Release the previous video capture filename = filedialog.askopenfilename(initialdir="/", title="Select file ", filetypes=(("mp4 files", ".mp4"), ("all files", "."))) cap = cv2.VideoCapture(filename) w = 500 h = 400 label1 = Label(win, width=w, height=h, relief=GROOVE) label1.place(x=40, y=200) live() wine() Button(win, text='Live',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=live)\ .place(x=width-250,y=400) Button(win, text='Video',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command= video)\ .place(x=width-250,y=450) Button(win,text='Sound',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=voice)\ .place(x=width-250,y=500) Button(win,text='Change Vid',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=video)\ .place(x=width-250,y=550) Button(win,text='Exit',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=win.destroy)\ .place(x=width-250,y=600) win.mainloop()
Due to the docker environment restrictions, the live functionality may not work here; however, running the application locally should function correctly.
This is how the live button works.
And for the video case, it extracts frames from the selected video file and employs the live
function to assess these frames for the purpose of recognizing gestures.
This project shows how technology can bring people together, no matter how they communicate. By using Python, Tkinter, OpenCV, Mediapipe, and Pyttsx3, we've built something that shows how software can create positive change and understanding between different ways of talking.
However, it's important to realize that sign language has a wide variety of gestures, and manually detecting landmarks for each gesture might not be practical. This is where machine learning (ML) comes into play. ML can help us create a system that learns to recognize different signs on its own.
Free Resources