Andrés Giordano (https://www.freeimages.com/photographer/andrezor-30438)

Translation in Python, the sustainable way; Using deep-translator library

“A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.”

Luis Garcia Fuentes
5 min readOct 12, 2021

--

Context to my NLP problem:

I am currently working quite a bit with text data, and ergo in Natural language processing (NLP) projects. While there is a sea 🌊 of methods within the NLP field available to crunch data, the language(s) used in your business may be a limiting factor to your work. Take my case; I work in Europe, where almost all markets have their own language.

Unfortunately for me, while open-source models and resources exist for almost all European languages, the best pre-trained models use English as the default language in their training corpus.

Additionally, if I were to decide to train a model per language and build a custom solution for each region, I would be forced to create custom models for each language; which is not only time consuming, but reduces the volume of training data that can be feed to the ML algorithms, which in turn, will decrease the likelihood that these will achieve a decent level of learning.

A model per language will result in duplicate work and a less powerful model (less training data per model)

For this reason, I decided to learn how to translate all my text data into English. In the beginning, I felt quite positive about this task, as there seemed to be plenty of tutorials and medium articles on how to go about this. However, I soon discovered that the most referenced library, googletrans , kept being brought down by Google (the library uses their free API without Google’s consent). Every time I found a solution, it seemed the fix had already been caught by Google.

The solutions in these videos do not run any longer (at least for me)

Finally, I stumbled upon the solution that worked for me.

Provided Solution by Nidhal Baccouri:

Nidhal Baccouri, a software engineer based in Germany, developed a Python Library that packages nine free translation modules written for Python. Some require that you create an API key, but there are enough in there that do not if you want to process your data with ease.

Read this for details and documentation, a must if you intend to use https://pythonrepo.com/repo/nidhaloff-deep-translator-python-miscellaneous

His solution works wonders! With a few lines of code, you can translate sentences or a list of them. Some modules even autodetect the language of origin. Note that not all modules support as many languages as we might wish, but you can also write code to print the languages supported by each module.

easy translation, using Google API module, but you can try all of them

My contribution (if you can call it that):

If you read through the documentation, you will notice that you are given three main options to translate your data.

  • Pass a list containing a set of sentences and translate the set
  • Pass a file and translate the whole file
  • Pass a variable containing a sentence, and translate each variable at a time.

However, if you are working with thousands of comments at a time, you need a more structured way to keep track of your data. In my case, I want to load a data frame in memory and create a new column containing the translated content from one of my already existing columns. That, is I just want to create a new column with the translated content.

In order to do this, I create a “translate” function and pass this function inside a lambda function that is applied to my data frame.

If you are not familiar with lambda functions, I will point you to Tanu N Prabhu's article on the subject. A summary table was created by him and you can see it below.

Breakdown of functions from Tanu N Prabhu

In essence, we are writing a function that translates based on the already talked modules, and then through the lambda function, we are passing and applying this function to each row of the data frame column.

In-plane(ish) English, we are iterating through the data frame column “Column_NO” which contains comments in Norwegian and translating one at a time to English, storing the results in “Column_ENG” in the same data frame.

code to adjust translation across data frame

Note that the created custom function also returns an empty string if there was no comment in the row. This is needed as the function will return an error if there is no text content in a row.

Additionally, we write it so that it lets 1 millisecond pass each time it runs in order to not saturate Google servers, which would result in us being kicked out of their services 🙄 (don't worry here, as long as you are not sending more than 5 requests per second, or running more than 200k translations in a day, your data processing will not cause you any troubles).

For my 65K comments, it took 6 hours to translate all comments. I know, it is not light speed, but it is a reliable method, free of charge, that won't get you in trouble (don't quote me on that though).

--

--