When we talk about deep learning, we refer to the ability to train machines to intelligently draw conclusions and perform complex tasks related to different mediums like images, videos, text, and more. In this Answer, we'll look at how to perform language prediction, a primary example of text-related tasks in deep learning.
Language detection is a technique that takes a text or a passage as input and, based on the languages the model has been taught or trained on, detects which language is mainly used in that text.
MediaPipe is an open-source framework that offers a collection of pre-trained deep-learning models, including language prediction. The main advantage is that such a model can be easily integrated into our custom text-oriented applications.
In our application, we will use MediaPipe's detector.tflite
model. This model is pre-trained on a specific set of languages and can effectively detect them in the text submitted.
Note: You can download this model here.
The pre-trained model supports about 110 languages and returns "unknown" otherwise. The supported language codes have been mapped to the complete language names and saved in language_names
.
language_names = {"unknown": "Unknown","af": "Afrikaans","am": "Amharic","ar": "Arabic","ar-Latn": "Arabic (Latin script)","az": "Azerbaijani","be": "Belarusian","bg": "Bulgarian","bg-Latn": "Bulgarian (Latin script)","bn": "Bengali","bs": "Bosnian","ca": "Catalan","ceb": "Cebuano","co": "Corsican","cs": "Czech","cy": "Welsh","da": "Danish","de": "German","el": "Greek","el-Latn": "Greek (Latin script)","en": "English","eo": "Esperanto","es": "Spanish","et": "Estonian","eu": "Basque","fa": "Persian","fi": "Finnish","fil": "Filipino","fr": "French","fy": "Frisian","ga": "Irish","gd": "Scottish Gaelic","gl": "Galician","gu": "Gujarati","ha": "Hausa","haw": "Hawaiian","hi": "Hindi","hi-Latn": "Hindi (Latin script)","hmn": "Hmong","hr": "Croatian","ht": "Haitian Creole","hu": "Hungarian","hy": "Armenian","id": "Indonesian","ig": "Igbo","is": "Icelandic","it": "Italian","iw": "Hebrew","ja": "Japanese","ja-Latn": "Japanese (Latin script)","jv": "Javanese","ka": "Georgian","kk": "Kazakh","km": "Khmer","kn": "Kannada","ko": "Korean","ku": "Kurdish","ky": "Kyrgyz","la": "Latin","lb": "Luxembourgish","lo": "Lao","lt": "Lithuanian","lv": "Latvian","mg": "Malagasy","mi": "Maori","mk": "Macedonian","ml": "Malayalam","mn": "Mongolian","mr": "Marathi","ms": "Malay","mt": "Maltese","my": "Burmese","ne": "Nepali","nl": "Dutch","no": "Norwegian","ny": "Chichewa","pa": "Punjabi","pl": "Polish","ps": "Pashto","pt": "Portuguese","ro": "Romanian","ru": "Russian","ru-Latn": "Russian (Latin script)","sd": "Sindhi","si": "Sinhala","sk": "Slovak","sl": "Slovenian","sm": "Samoan","sn": "Shona","so": "Somali","sq": "Albanian","sr": "Serbian","st": "Southern Sotho","su": "Sundanese","sv": "Swedish","sw": "Swahili","ta": "Tamil","te": "Telugu","tg": "Tajik","th": "Thai","tr": "Turkish","uk": "Ukrainian","ur": "Urdu","uz": "Uzbek","vi": "Vietnamese","xh": "Xhosa","yi": "Yiddish","yo": "Yoruba","zh": "Chinese","zh-Latn": "Chinese (Latin script)","zu": "Zulu",}
We will create a GUI application that takes text as input from the user and plots the main language and its probability as the result. Let's get started!
import sysimport matplotlib.pyplot as pltfrom PyQt6.QtWidgets import QApplication, QLabel, QLineEdit, QPushButton, QVBoxLayout, QWidgetfrom PyQt6.QtGui import QFontfrom mediapipe.tasks import pythonfrom mediapipe.tasks.python import textlanguage_names = {...}
The first step is to import the necessary modules for our code.
sys
is used for window-related purposes
matplotlib
is used for plotting the language results
PyQt6
is used for the GUI interaction between the user and the code
mediapipe
is used for obtaining the pre-trained language detection model
We also define the language_names
in which the language codes and complete language names are mapped. For instance, "en" for "English".
detect_language
functiondef detect_language(input_text):base_options = python.BaseOptions(model_asset_path="detector.tflite")options = text.LanguageDetectorOptions(base_options=base_options)detector = text.LanguageDetector.create_from_options(options)detection_result = detector.detect(input_text)top_language = detection_result.detections[0].language_codetop_probability = f'{detection_result.detections[0].probability:.2f}'return top_language, top_probability
We define a function named detect_language
, which takes a single parameter called input_text
. This is the text that the user feeds the code and aims to get the detected language against it.
Next, we create an instance of the model. base_options
creates the configuration needed for the model and is passed the path of the model file "detector.tflite". These options are passed to the text.LanguageDetectorOptions
function, which creates an instance of options
. The model is then created by passing the final options
to the function text.LanguageDetector.create_from_options
and is saved in detector
.
We use our detector
model to detect the main language of the input_text
by passing it a parameter and storing the result in detection_result
.
The results are analyzed, and the top language's language_code
and probability
are extracted and saved in top_language
and top_probability
respectively. These two variables are returned.
on_button_click
functiondef on_button_click():input_text = input_text_entry.text()top_language, top_probability = detect_language(input_text)top_probability = float(top_probability)top_language_full = language_names.get(top_language, top_language)plt.figure(figsize=(10, 4))plt.barh([0], [top_probability], color='maroon', alpha=0.7)plt.yticks([0], [f"{top_language_full} ({top_probability})",], fontsize=16, fontweight='bold', color='white')plt.xlabel('Probability', fontsize=18, fontweight='bold', color='black')plt.title('Detected Language', fontsize=20, fontweight='bold', color='black')plt.gca().invert_yaxis()plt.gca().set_facecolor('white')plt.gca().spines['right'].set_visible(False)plt.gca().spines['top'].set_visible(False)plt.tick_params(axis='both', colors='black')plt.text(top_probability + 0.01, 0, f"{top_probability:.2f}", va='center', fontsize=16, fontweight='bold', color='black')plt.subplots_adjust(left=0.3, right=0.95, top=0.8, bottom=0.2)plt.tight_layout()plt.show(block=False)
Our second function is defined as on_button_click
. This function's core job is to display the prediction in a user-friendly manner. It saves the top_language
and top_probability
returned by our detect_language
function and uses Matplotlib to generate a bar plot with the probability of the main language and its label. The complete language name is obtained from the mapping given by language_names
. The rest of the code focuses on display customizations and can be subject to change.
Note:
barh
refers to horizontal bar charts. You can learn more about them here.
main
functionif __name__ == "__main__":app = QApplication(sys.argv)window = QWidget()window.setWindowTitle("Language Detection")window.setStyleSheet("background-color: white;")layout = QVBoxLayout()input_label = QLabel("Enter the text:")input_label.setFont(QFont('Arial', 18))input_label.setStyleSheet("color: black;")layout.addWidget(input_label)input_text_entry = QLineEdit()input_text_entry.setFont(QFont('Arial', 16))input_text_entry.setStyleSheet("color: black; background-color: white; border: 1px solid black; padding: 5px;")layout.addWidget(input_text_entry)detect_button = QPushButton("Detect Language")detect_button.setFont(QFont('Arial', 16))detect_button.setStyleSheet("color: white; background-color: maroon; padding: 8px;")detect_button.clicked.connect(on_button_click)layout.addWidget(detect_button)window.setLayout(layout)window.show()sys.exit(app.exec())
Finally, we put the code together in the main
function. We use the Python library input_text_entry
. Once the detect_button
is clicked, our on_button_click
function is called, which in turn calls the detect_language
function and plots the results in the window.
Congratulations! Our language detection code is now complete. You can give it a go or perform any changes and click "Run" below to see it in action.
import sys from PyQt6.QtWidgets import QApplication, QWidget, QLabel, QLineEdit, QPushButton, QVBoxLayout from PyQt6.QtGui import QColor, QPalette class MyWindow(QWidget): def __init__(self): super().__init__() self.setWindowTitle("PyQt6 Example Code") self.setGeometry(100, 100, 400, 200) self.init_ui() def init_ui(self): layout = QVBoxLayout() label1 = QLabel("Field 1:") self.input1 = QLineEdit() label2 = QLabel("Field 2:") self.input2 = QLineEdit() submit_button = QPushButton("Submit") palette = QPalette() label_color = QColor(0, 102, 204) # Blue color button_color = QColor(255, 153, 0) # Orange color palette.setColor(QPalette.ColorRole.WindowText, label_color) palette.setColor(QPalette.ColorRole.ButtonText, button_color) palette.setColor(QPalette.ColorRole.Button, QColor(240, 240, 240)) # Light gray label1.setPalette(palette) label2.setPalette(palette) submit_button.setPalette(palette) layout.addWidget(label1) layout.addWidget(self.input1) layout.addWidget(label2) layout.addWidget(self.input2) layout.addWidget(submit_button) self.setLayout(layout) if __name__ == "__main__": app = QApplication(sys.argv) window = MyWindow() window.show() sys.exit(app.exec())
On giving the application a Japanese text and clicking on "Detect Language", the model accurately predicted the language i.e. Japanese, as well as its probability of 1.0 i.e. 100% on the bar plot.
On giving the application a French text and clicking on "Detect Language", the model accurately predicted the language i.e. French, as well as its probability of 1.0 i.e. 100% on the bar plot.
Upon giving the application a mixed text containing both French and German, the model returned German with a probability of 0.74 i.e. 74% since two out of three keywords were from the German language.
Language detection is a task that can be used on a stand-alone basis and incorporated into many more complex designs, including but not limited to the following.
Note: Here's the complete list of related projects in MediaPipe or deep learning.
How well do you know language detection?
How do we understand what language code the model is referring to?
Using the top_language
variable
Using the mapping from language_names
The model gives the full name when accessed by language_code
Free Resources