What is a neural network based translation?

What is Neural Network based translation? Why does it provide better translations?

Machine translation has been widely available in apps and websites since the mid-2000s.

For many years (since the 1960s), computer scientists tried to build machine translation systems based on grammatical rules and structures of individual languages. In most cases the results were, let’s say, less than great.

The breakthrough came when a new concept, machine learning, was applied to machine translation. Using vast amounts of data pre-translated by professional translators, powerful algorithms would learn how to translate words given some limited context and these pre-existing translations.

All of the machine translation products (websites or apps) available until late 2016 were based on algorithms using statistical methods to try to guess the best possible translation for a given word. This technology is called statistical machine translation.

However, one of the limitations of statistical machine translation is that it only translates words within the context of a few words before and after the translated word. For small sentences, it works pretty well. For longer ones, the translation quality can vary from very good to, in some cases, borderline nonsensical. It is almost always possible to see it has been machine-generated.

In the late 2000s, a new machine learning technology called deep learning or deep neural networks, one that tries to mimic how the human brain works (at least partially), became a viable option to work on many hard to crack computer science problems thanks to advances both on the research side (how to build, train and run these large neural networks) and on the compute side with the arrival of the extremely large scale compute power of the cloud.

Specifically, neural networks for machine translation recently became possible and, although still in its early stages, it already provides better translations than the 10+ years old statistical machine translation ones do for many languages.

At a high-level, neural network translation works with in two stages:

  • A first stage models the word that needs to be translated based on the context of this word (and its possible translations) within the full sentence, whether the sentence is 5 words or 20 words long.

  • A second stage then translates this word model (not the word itself but the model the neural network has built of it), within the context of the sentence, into the other language.

High level illustration of the two stages of neural network translation process

One way to think about neural network-based translation could be to think of a fluent speaker in another language that would see a word, say “dog”. This would create the image of a dog in his or her brain, then this image would be associated to, for instance “le chien” in French. The neural network would intrinsically know that the word “chien” is masculine in French (“le” not “la”). But, if the sentence were to be “the dog just gave birth to six puppies” , it would picture the same dog with puppies nursing and would then automatically use “la chienne” (female form of “le chien”) when translating the sentence.

This approach provides better results because it:

  • Takes into account the full sentence, not only a few consecutive words

  • Can handle the infinite variations of language through brain-like pattern recognition

  • Learns subtleties of languages based on each language’s characteristics such as genders, formality, etc...

Because of this approach, sentences that are generated from a neural network based machine translation are usually better than statistical machine ones but also sound more fluent and natural, as if a human had translated them and not a machine.

To learn in more details about the technologies behind this neural network architecture please go to