Language detection in deep learning

When we talk about deep learning, we refer to the ability to train machines to intelligently draw conclusions and perform complex tasks related to different mediums like images, videos, text, and more. In this Answer, we'll look at how to perform language prediction, a primary example of text-related tasks in deep learning.

Language detection

Language detection is a technique that takes a text or a passage as input and, based on the languages the model has been taught or trained on, detects which language is mainly used in that text.

Detecting languages based on text
Detecting languages based on text

MediaPipe and deep learning

MediaPipe is an open-source framework that offers a collection of pre-trained deep-learning models, including language prediction. The main advantage is that such a model can be easily integrated into our custom text-oriented applications.

MediaPipe's logo
MediaPipe's logo

Language detection model

In our application, we will use MediaPipe's detector.tflite model. This model is pre-trained on a specific set of languages and can effectively detect them in the text submitted.

Note: You can download this model here.

List of supported languages

The pre-trained model supports about 110 languages and returns "unknown" otherwise. The supported language codes have been mapped to the complete language names and saved in language_names.

language_names = {
"unknown": "Unknown",
"af": "Afrikaans",
"am": "Amharic",
"ar": "Arabic",
"ar-Latn": "Arabic (Latin script)",
"az": "Azerbaijani",
"be": "Belarusian",
"bg": "Bulgarian",
"bg-Latn": "Bulgarian (Latin script)",
"bn": "Bengali",
"bs": "Bosnian",
"ca": "Catalan",
"ceb": "Cebuano",
"co": "Corsican",
"cs": "Czech",
"cy": "Welsh",
"da": "Danish",
"de": "German",
"el": "Greek",
"el-Latn": "Greek (Latin script)",
"en": "English",
"eo": "Esperanto",
"es": "Spanish",
"et": "Estonian",
"eu": "Basque",
"fa": "Persian",
"fi": "Finnish",
"fil": "Filipino",
"fr": "French",
"fy": "Frisian",
"ga": "Irish",
"gd": "Scottish Gaelic",
"gl": "Galician",
"gu": "Gujarati",
"ha": "Hausa",
"haw": "Hawaiian",
"hi": "Hindi",
"hi-Latn": "Hindi (Latin script)",
"hmn": "Hmong",
"hr": "Croatian",
"ht": "Haitian Creole",
"hu": "Hungarian",
"hy": "Armenian",
"id": "Indonesian",
"ig": "Igbo",
"is": "Icelandic",
"it": "Italian",
"iw": "Hebrew",
"ja": "Japanese",
"ja-Latn": "Japanese (Latin script)",
"jv": "Javanese",
"ka": "Georgian",
"kk": "Kazakh",
"km": "Khmer",
"kn": "Kannada",
"ko": "Korean",
"ku": "Kurdish",
"ky": "Kyrgyz",
"la": "Latin",
"lb": "Luxembourgish",
"lo": "Lao",
"lt": "Lithuanian",
"lv": "Latvian",
"mg": "Malagasy",
"mi": "Maori",
"mk": "Macedonian",
"ml": "Malayalam",
"mn": "Mongolian",
"mr": "Marathi",
"ms": "Malay",
"mt": "Maltese",
"my": "Burmese",
"ne": "Nepali",
"nl": "Dutch",
"no": "Norwegian",
"ny": "Chichewa",
"pa": "Punjabi",
"pl": "Polish",
"ps": "Pashto",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"ru-Latn": "Russian (Latin script)",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"sm": "Samoan",
"sn": "Shona",
"so": "Somali",
"sq": "Albanian",
"sr": "Serbian",
"st": "Southern Sotho",
"su": "Sundanese",
"sv": "Swedish",
"sw": "Swahili",
"ta": "Tamil",
"te": "Telugu",
"tg": "Tajik",
"th": "Thai",
"tr": "Turkish",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"xh": "Xhosa",
"yi": "Yiddish",
"yo": "Yoruba",
"zh": "Chinese",
"zh-Latn": "Chinese (Latin script)",
"zu": "Zulu",
}

Code walkthrough

We will create a GUI application that takes text as input from the user and plots the main language and its probability as the result. Let's get started!

Imports

import sys
import matplotlib.pyplot as plt
from PyQt6.QtWidgets import QApplication, QLabel, QLineEdit, QPushButton, QVBoxLayout, QWidget
from PyQt6.QtGui import QFont
from mediapipe.tasks import python
from mediapipe.tasks.python import text
language_names = {
...
}
  • The first step is to import the necessary modules for our code.

    • sys is used for window-related purposes

    • matplotlib is used for plotting the language results

    • PyQt6 is used for the GUI interaction between the user and the code

    • mediapipe is used for obtaining the pre-trained language detection model

  • We also define the language_names in which the language codes and complete language names are mapped. For instance, "en" for "English".

detect_language function

def detect_language(input_text):
base_options = python.BaseOptions(model_asset_path="detector.tflite")
options = text.LanguageDetectorOptions(base_options=base_options)
detector = text.LanguageDetector.create_from_options(options)
detection_result = detector.detect(input_text)
top_language = detection_result.detections[0].language_code
top_probability = f'{detection_result.detections[0].probability:.2f}'
return top_language, top_probability
  • We define a function named detect_language, which takes a single parameter called input_text. This is the text that the user feeds the code and aims to get the detected language against it.

  • Next, we create an instance of the model. base_options creates the configuration needed for the model and is passed the path of the model file "detector.tflite". These options are passed to the text.LanguageDetectorOptions function, which creates an instance of options. The model is then created by passing the final options to the function text.LanguageDetector.create_from_options and is saved in detector.

  • We use our detector model to detect the main language of the input_text by passing it a parameter and storing the result in detection_result.

  • The results are analyzed, and the top language's language_code and probability are extracted and saved in top_language and top_probability respectively. These two variables are returned.

on_button_click function

def on_button_click():
input_text = input_text_entry.text()
top_language, top_probability = detect_language(input_text)
top_probability = float(top_probability)
top_language_full = language_names.get(top_language, top_language)
plt.figure(figsize=(10, 4))
plt.barh([0], [top_probability], color='maroon', alpha=0.7)
plt.yticks([0], [f"{top_language_full} ({top_probability})",], fontsize=16, fontweight='bold', color='white')
plt.xlabel('Probability', fontsize=18, fontweight='bold', color='black')
plt.title('Detected Language', fontsize=20, fontweight='bold', color='black')
plt.gca().invert_yaxis()
plt.gca().set_facecolor('white')
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.tick_params(axis='both', colors='black')
plt.text(top_probability + 0.01, 0, f"{top_probability:.2f}", va='center', fontsize=16, fontweight='bold', color='black')
plt.subplots_adjust(left=0.3, right=0.95, top=0.8, bottom=0.2)
plt.tight_layout()
plt.show(block=False)
  • Our second function is defined as on_button_click. This function's core job is to display the prediction in a user-friendly manner. It saves the top_language and top_probability returned by our detect_language function and uses Matplotlib to generate a bar plot with the probability of the main language and its label. The complete language name is obtained from the mapping given by language_names. The rest of the code focuses on display customizations and can be subject to change.

Note: barh refers to horizontal bar charts. You can learn more about them here.

main function

if __name__ == "__main__":
app = QApplication(sys.argv)
window = QWidget()
window.setWindowTitle("Language Detection")
window.setStyleSheet("background-color: white;")
layout = QVBoxLayout()
input_label = QLabel("Enter the text:")
input_label.setFont(QFont('Arial', 18))
input_label.setStyleSheet("color: black;")
layout.addWidget(input_label)
input_text_entry = QLineEdit()
input_text_entry.setFont(QFont('Arial', 16))
input_text_entry.setStyleSheet("color: black; background-color: white; border: 1px solid black; padding: 5px;")
layout.addWidget(input_text_entry)
detect_button = QPushButton("Detect Language")
detect_button.setFont(QFont('Arial', 16))
detect_button.setStyleSheet("color: white; background-color: maroon; padding: 8px;")
detect_button.clicked.connect(on_button_click)
layout.addWidget(detect_button)
window.setLayout(layout)
window.show()
sys.exit(app.exec())
  • Finally, we put the code together in the main function. We use the Python library PyQt5cross platform GUI toolkit to create the GUI for our application and to aid in getting the text input from the user. The text is saved in input_text_entry. Once the detect_button is clicked, our on_button_click function is called, which in turn calls the detect_language function and plots the results in the window.

Executable code

Congratulations! Our language detection code is now complete. You can give it a go or perform any changes and click "Run" below to see it in action.

import sys
from PyQt6.QtWidgets import QApplication, QWidget, QLabel, QLineEdit, QPushButton, QVBoxLayout
from PyQt6.QtGui import QColor, QPalette

class MyWindow(QWidget):
    def __init__(self):
        super().__init__()

        self.setWindowTitle("PyQt6 Example Code")
        self.setGeometry(100, 100, 400, 200)

        self.init_ui()

    def init_ui(self):
        layout = QVBoxLayout()

        label1 = QLabel("Field 1:")
        self.input1 = QLineEdit()

        label2 = QLabel("Field 2:")
        self.input2 = QLineEdit()

        submit_button = QPushButton("Submit")

        palette = QPalette()
        label_color = QColor(0, 102, 204)  # Blue color
        button_color = QColor(255, 153, 0)  # Orange color
        palette.setColor(QPalette.ColorRole.WindowText, label_color)
        palette.setColor(QPalette.ColorRole.ButtonText, button_color)
        palette.setColor(QPalette.ColorRole.Button, QColor(240, 240, 240))  # Light gray

        label1.setPalette(palette)
        label2.setPalette(palette)
        submit_button.setPalette(palette)

        layout.addWidget(label1)
        layout.addWidget(self.input1)
        layout.addWidget(label2)
        layout.addWidget(self.input2)
        layout.addWidget(submit_button)

        self.setLayout(layout)

if __name__ == "__main__":
    app = QApplication(sys.argv)
    window = MyWindow()
    window.show()
    sys.exit(app.exec())

Language detection demonstration

Japanese language

On giving the application a Japanese text and clicking on "Detect Language", the model accurately predicted the language i.e. Japanese, as well as its probability of 1.0 i.e. 100% on the bar plot.

Input text
Input text
Language detection
Language detection

French language

On giving the application a French text and clicking on "Detect Language", the model accurately predicted the language i.e. French, as well as its probability of 1.0 i.e. 100% on the bar plot.

Input text
Input text
Language detection
Language detection

Mixed languages

Upon giving the application a mixed text containing both French and German, the model returned German with a probability of 0.74 i.e. 74% since two out of three keywords were from the German language.

Input text
Input text
Language detection
Language detection

Use cases of language detection

Language detection is a task that can be used on a stand-alone basis and incorporated into many more complex designs, including but not limited to the following.

Use cases of language detection

Note: Here's the complete list of related projects in MediaPipe or deep learning.

  1. Real time 3D face mesh

  2. Gesture recognizer

  3. Language detection

  4. Pose detection

  5. Emotion detection

  6. Real time emotion detection

How well do you know language detection?

Q

How do we understand what language code the model is referring to?

A)

Using the top_language variable

B)

Using the mapping from language_names

C)

The model gives the full name when accessed by language_code

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved