Step up your coding game with AI-powered Code Explainer. Get insights like never before!
Python is a universal language that has many applications in different fields. In this article, we will use Python to build a language detector.
A language detector is a tool that can automatically identify the language of a given text. This can be useful in several situations. For example, suppose you want to categorize or filter articles on your blog based on their languages or clean data in your data science projects. In that case, you can do all these easily with the help of a language detector tool.
Python provides a lot of packages for language detection, but this article will cover four different Python packages that are used for detecting languages, the langdetect
, langid
, googletrans
, and language_detector
.
Here is the table of contents:
Throughout this article, we will be using these sentences to detect the languages:
I love programming, Python is my favorite language.
أحب البرمجة ، بايثون هي لغتي المفضلة.
我喜欢编程,Python 是我最喜欢的语言。
Me encanta programar, Python es mi lenguaje favorito.
Eu amo programar, Python é minha linguagem favorita.
These are the same sentence translated into different languages. To access all the language codes we will use in this article, visit this page.
Before everything else, the first task is installing all the required packages, as they are not in Python's standard utility packages. We will, first of all, create a virtual environment and then install all the required packages in it:
$ python -m venv project
And activate it using the command on Windows:
$ .\project\Scripts\activate
or Linux/macOS:
$ source project/bin/activate
Now that the virtual environment is up and running, let us install all the packages that we are going to use:
$ pip install langdetect langid googletrans==3.1.0a0 language-detector
In this section, we will build the language detector command line tool using one package at a time. So inside the virtual environment, create two files and name them language_detector_cli_1.py
and sentences.txt
respectively:
Note that you can call the files whatever you want per your preference, but make sure the names are meaningful. In the sentences.txt
file, we will have the sentences we want to be detected, so open and paste these lines:
I love programming, Python is my favorite language.
أحب البرمجة ، بايثون هي لغتي المفضلة.
我喜欢编程,Python 是我最喜欢的语言。
Me encanta programar, Python es mi lenguaje favorito.
Eu amo programar, Python é minha linguagem favorita.
You can add as many sentences as you want to this file.
Our first implementation of the language detector command line tool will use the langdetect
package. As mentioned in the documentation, it supports 55 languages and is part of Google's language detection library.
Open the .py
file we have just created and paste this following code:
# import the detect function from langdetect
from langdetect import detect
# openning the txt file in read mode
sentences_file = open('sentences.txt', 'r')
# creating a list of sentences using the readlines() function
sentences = sentences_file.readlines()
Here we are importing the detect()
function from langdetect
package, we will use it for detecting words or sentences. Then we open the sentences.txt
file in read
mode, after successfully opening it, we get all the sentences from it.
Let us now create the function for detecting languages; we will call it detect_language()
, and paste this code:
# a function for detection language
def detect_langauage(sentence, n):
"""try and except block for catching exception errors"""
# the try will run when everything is ok
try:
# checking if the sentence[n] exists
if sentences[n]:
# creating a new variable, the strip() function removes newlines
new_sentence = sentences[n].strip('\n')
print(f'The language for the sentence "{new_sentence}" is {detect(new_sentence)}')
# this will catch all the errors that occur
except:
print(f'Sentence does not exist')
Let us break the code inside the detect_language()
function a bit so that we are on the same page. The function takes two arguments, sentence
and n
, the sentence
is of type str
and the n
is of type int
, and inside this function, we have a try/except
block for handling any errors.
Inside the try
statement, we have an if
statement checking whether the sentence exists. If this sentence exists, we are removing the newline characters from it, and we are detecting the language. Inside the except
we are just catching any errors that may likely occur. Finally, just below the function, paste this code:
# printing the the number of sentences in the sentences.txt
print(f'You have {len(sentences)} sentences')
# this will prompt the user to enter an integer
number_of_sentence = int(input('Which sentence do you want to detect?(Provide an integer please):'))
# calling the detect_langauage function
detect_langauage(sentences_file, number_of_sentence)
We have a print()
function, an input()
function for getting data from the user, and a function call. Let us now test the program; we will detect the language of the first sentence from the sentences_file.txt file
whose number is 0.
Now let's run it:
$ python language_detector_cli_1.py
The output will be as follows after providing 0 as input:
You have 5 sentences
Which sentence do you want to detect?(Provide an integer please):0
The language for the sentence "I love programming, Python is my favorite language." is en
If you check the language codes, en
is for English.
Note: As mentioned in the documentation, the langdetect
package uses a non-deterministic algorithm, which means that you might get different results every time you try to detect a short or ambiguous text.
The second package that we can use for language detection is the langid
package. Open another new Python file, name it language_detector_cli_2.py
and make it look like this:
import langid
# opening the txt file in read mode
sentences_file = open('sentences.txt', 'r')
# creating a list of sentences using the readlines() function
sentences = sentences_file.readlines()
# looping through all the sentences in thesentences.txt file
for sentence in sentences:
# detecting the languages for the sentences
lang = langid.classify(sentence)
# formatting the sentence by removing the newline characters
formatted_sentence = sentence.strip('\n')
print(f'The sentence "{formatted_sentence}" is in {lang[0]}')
In the above code snippet, we are importing the langid
then we open the sentences.txt
file, and reading the five sentences.
After that, we loop through all these sentences and, at the same time, detect the language of each sentence using langid.classify()
function, this function takes a sentence as an argument, and finally, we print the formatted result.
Let's run it:
$ python langauage_detector_cli_2.py
The output we get is this:
The sentence "I love programming, Python is my favorite language." is in en
The sentence "أحب البرمجة ، بايثون هي لغتي المفضلة." is in ar
The sentence "我喜欢编程,Python 是我最喜欢的语言。" is in zh
The sentence "Me encanta programar, Python es mi lenguaje favorito." is in es
The sentence "Eu amo programar, Python é minha linguagem favorita." is in gl
All the predictions are correct, except for the last one, where it should be pt
.
Third on the list is the googletrans
package, this package can be used for translations and language detection. We have used it in the text translation tutorial if you want to check it out.
According to the documentation, it is free and unlimited. Now let us use it for detecting languages; open up a new Python file and name it language_detector_cli_3.py
and add the following:
# importing the Translator function from googletrans
from googletrans import Translator
# translator object
translator = Translator()
We are importing the Translator
object from googletrans
and then initializing it. Now since we will be getting sentences from the sentences.txt
file, we need to open it and get the sentences:
# openning the txt file in read mode
sentences_file = open('sentences.txt', 'r')
# creating a list of sentences using the readlines() function
sentences = sentences_file.readlines()
And for detecting the languages, let us use this function:
# a function for detection language
def detect_langauage(sentence, n):
"""try and except block for catching exception errors"""
# the try will run when everything is ok
try:
# checking if the sentence[n] exists
if sentences[n]:
# creating a new variable, the strip() function removes newlines
new_sentence = sentences[n].strip('\n')
# detecting the sentence language using the translator.detect()
# .lang extract the language code
detected_sentence_lang = translator.detect(new_sentence).lang
print(f'The language for the sentence "{new_sentence}" is {detected_sentence_lang}')
# this will catch all the errors that occur
except:
print(f'Make sure the sentence exists or you have internet connection')
The above function is similar to the other function we used for langdetect
package, but the difference is this line of code inside the if
statement.
We are using the translator.detect()
to detect the language and extract the language code using the lang
attribute.
And finally, paste these lines of code after the function:
print(f'You have {len(sentences)} sentences')
# this will prompt the user to enter an integer
number_of_sentence = int(input('Which sentence do you want to detect?(Provide an integer please):'))
# calling the detect_langauage function
detect_langauage(sentences_file, number_of_sentence)
To run the program, use this command:
$ python langauage_detector_cli_3.py
You will be prompted to choose the sentence to detect, and this is the output you get after providing valid input:
You have 5 sentences
Which sentence do you want to detect?(Provide an integer please):1
The language for the sentence "أحب البرمجة ، بايثون هي لغتي المفضلة." is ar
Our final package for language detection is the language_detector
, without further ado, open a new Python file, name it language_detector_cli_4.py
and import the package:
from language_detector import detect_language
Now we will create a function for handling language detection and name it detectLanguage()
.
Something to note here, we have imported detect_language
from language_detector
; this must not conflict with the function's name; that's why we have named the function detectLanguage()
. The function will take text
as an argument:
def detectLanguage(text):
# detecting the language using the detect_language function
language = detect_language(text)
print(f'"{text}" is written in {language}')
In the above function, the text
passed to the detectLanguage()
function is also passed to the detect_language()
.
Just after detectLanguage()
function, paste this code:
# an infinite while while loop
while True:
# this will prompt the user to enter options
option = input('Enter 1 to detect language or 0 to exit:')
if option == '1':
# this will prompt the user to enter the text
data = input('Enter your sentence or word here:')
# calling the detectLanguage function
detectLanguage(data)
# if option is 0 break the loop
elif option == '0':
print('Quitting........\nByee!!!')
break
# if option isnt 1 or 0 then its invalid
else:
print('Wrong input, try again!!!')
In the above code snippet, we have an infinite while loop; the user is prompted to enter two options, 1 and 0. If the option is 1, the user will be prompted to enter the text to be detected, if the option is 0, the loop will be broken, and if the options are neither 1 nor 0, the user will be notified about the wrong input.
To test this program, run:
$ python langauage_detector_cli_4.py
The output:
Enter 1 to detect language or 0 to exit:1
Enter your sentence or word here:J'adore programmer, Python est mon langage préféré
"J'adore programmer, Python est mon langage préféré" is written in French
Enter 1 to detect language or 0 to exit:0
Quitting........
Byee!!!
This article has shown you how to make a language detector using Python. We have not exhausted the whole list of packages that can be used for detecting languages, but we hope that you now know how to detect languages using Python.
Some packages we have used were not precise enough, but the good thing is that Python comes with other, more precise packages for the job. I invite you to experiment with the libraries and see which one fits you best.
You can get all the scripts here.
Learn also: How to Translate Text in Python.
Happy coding ♥
Loved the article? You'll love our Code Converter even more! It's your secret weapon for effortless coding. Give it a whirl!
View Full Code Switch My Framework
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!