冷战时期

by Ilya Pestov

通过伊利亚·佩斯托夫(Ilya Pestov)

从冷战到深度学习的机器翻译历史 (A history of machine translation from the Cold War to deep learning)

I open Google Translate twice as often as Facebook, and the instant translation of the price tags is not a cyberpunk for me anymore. That’s what we call reality. It’s hard to imagine that this is the result of a centennial fight to build the algorithms of machine translation and that there has been no visible success during half of that period.

我打开Google Translate的频率是Facebook的两倍，而价格标签的即时翻译对我而言已不再是电子朋克。这就是我们所说的现实。很难想象这是建立机器翻译算法百年奋斗的结果，并且在此期间的一半都没有明显的成功。

The precise developments I’ll discuss in this article set the basis of all modern language processing systems — from search engines to voice-controlled microwaves. I’m talking about the evolution and structure of online translation today.

我将在本文中讨论的精确开发为所有现代语言处理系统奠定了基础-从搜索引擎到语音控制的微波。我现在谈论的是在线翻译的发展和结构。

在一开始的时候 (In the beginning)

The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera.

故事始于1933年。苏联科学家彼得·特罗扬斯基(Peter Troyanskii)向苏联科学院介绍了“用于从一种语言翻译成另一种语言时选择和打印单词的机器”。这项发明非常简单-它具有四种不同语言的卡片，一台打字机和一台老式胶卷相机。

The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology.

操作员从文本中取出第一个单词，找到相应的卡片，拍摄照片，然后在打字机上键入其形态特征(名词，复数，和格)。打字机的键编码功能之一。磁带和相机的胶卷同时使用，制作了一组带有单词及其形态的框架。

Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956.

尽管有所有这些，就像苏联经常发生的那样，该发明被认为是“无用的”。特罗扬斯基(Troyanskii)在尝试完成他的发明20年后死于Stenocardia。直到1956年，两名苏联科学家找到了他的专利，世界上才知道这种机器。

It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history.

那是冷战的开始。 1954年1月7日，在纽约的IBM总部开始了乔治敦– IBM实验。 IBM 701计算机是历史上首次将60个俄语句子自动翻译成英语。

“A girl who didn’t understand a word of the language of the Soviets punched out the Russian messages on IBM cards. The “brain” dashed off its English translations on an automatic printer at the breakneck speed of two and a half lines per second,” — reported the IBM press release.

“一个不懂苏联语言的女孩在IBM卡上打出了俄语信息。 “大脑”以每秒两行半线的惊人速度在自动打印机上冲破了英语翻译。” — IBM新闻稿报道。

However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation.

但是，胜利的头条隐藏了一个小细节。没有人提到翻译过的示例经过精心选择和测试，以排除任何歧义。对于日常使用，该系统并不比袖珍短语手册更好。尽管如此，这种军备竞赛还是启动了：加拿大，德国，法国，尤其是日本，都参加了机器翻译竞赛。

机器翻译竞赛 (The race for machine translation)

The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade.

提高机器翻译的徒劳努力持续了四十年。 1966年，美国ALPAC委员会在其著名的报告中称机器翻译昂贵，不准确且没有希望。相反，他们建议专注于词典的开发，这使美国研究人员从种族中消失了近十年。

Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other.

即便如此，现代自然语言处理的基础仅是由科学家及其尝试，研究和开发创造的。当今，所有搜索引擎，垃圾邮件过滤器和个人助理都出现了，这要归功于许多国家相互监视。

基于规则的机器翻译(RBMT) (Rule-based machine translation (RBMT))

The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of:

围绕基于规则的机器翻译的第一个想法出现在70年代。科学家们凝视着口译员的工作，试图迫使笨拙的计算机重复这些动作。这些系统包括：

Bilingual dictionary (RU -> EN)双语词典(RU-> EN)
A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine)每种语言的一组语言规则(例如，以-heit，-keit和-ung等后缀结尾的名词是女性的)

That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators.

而已。如果需要，可以使用一些黑客手段来补充系统，例如名称列表，拼写校正器和音译器。

PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age.

PROMPT和Systran是RBMT系统最著名的例子。只需看一下速卖通，即可感受这个黄金时代的柔和气息。

But even they had some nuances and subspecies.

但是，即使他们有一些细微差别和亚种。

直接机器翻译 (Direct Machine Translation)

This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word.

这是最直接的机器翻译类型。它把文本分成单词，翻译它们，略微纠正形态，并协调语法，使整个事物或多或少听起来正确。太阳下山时，训练有素的语言学家为每个单词写下规则。

The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing.

输出返回某种翻译。通常，它很烂。语言学家似乎白白浪费了时间。

Modern systems do not use this approach at all, and modern linguists are grateful.

现代系统根本不使用这种方法，现代语言学家对此深表感谢。

基于转移的机器翻译 (Transfer-based Machine Translation)

In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This helps to get quite decent conversion of the word order in translation. In theory.

与直接翻译相反，我们在学校教书时首先确定句子的语法结构。然后，我们随后操纵整个结构，而不是文字。这有助于在翻译中获得相当不错的词序转换。理论上。

In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words.

在实践中，它仍然导致逐字翻译和精疲力竭的语言学家。一方面，它带来了简化的通用语法规则。但是另一方面，由于与单个单词相比单词构造数量增加，它变得更加复杂。

语际机器翻译 (Interlingual Machine Translation)

In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity!

在这种方法中，源文本被转换为中间表示形式，并且对于所有世界上的语言(interlingua)都是统一的。这是笛卡尔梦dream以求的国际语言：元语言，遵循通用规则并将翻译转换为简单的“来回”任务。接下来，interlingua将转换为任何目标语言，这就是唯一性！

Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems.

由于转换，Interlingua经常与基于传输的系统混淆。区别在于特定于每种单一语言和国际语言的语言规则，而不是语言对。这意味着，我们可以在国际语言系统中添加第三种语言，并在这三种语言之间进行翻译。在基于传输的系统中，我们无法做到这一点。

It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune!

它看起来很完美，但在现实生活中却并非如此。创建这样的通用语言非常困难-许多科学家一生都在为之努力。它们并没有成功，但是由于有了它们，我们现在有了表示形式，语法甚至语义层次的表示。但是，唯一的意义文本理论要付出巨大的代价！

The idea of intermediate language will be back. Let’s wait awhile.

中间语言的想法将回来。让我们稍等一下。

As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example).

如您所见，所有RBMT都很笨拙，这就是除非在特定情况下(例如天气预报报告翻译等)，否则很少使用它们的原因。 RBMT的众多优势中，经常提到的是它的形态学准确性(它不会混淆单词)，结果的可重复性(所有翻译者都得到相同的结果)以及将其调整到主题领域的能力(教授经济学家或术语)例如，特定于程序员)。

Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours.

即使有人成功创建了理想的RBMT，并且语言学家使用所有拼写规则对其进行了增强，也始终会有一些例外：英语中的所有不规则动词，德语中的可分离前缀，俄语中的后缀以及人们仅仅说不同的话。任何考虑到所有细微差别的尝试都会浪费数百万个工时。

And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope?

并且不要忘记同音异义词。同一单词在不同的上下文中可能具有不同的含义，从而导致各种翻译。您在这里能得到多少意思： 我看见一个人在山上拿着望远镜 ？

Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine?

语言不是基于固定的规则而发展的，这是语言学家喜欢的事实。他们受过去三百年入侵历史的影响更大。您如何向机器解释？

Forty years of the Cold War didn’t help in finding any distinct solution. RBMT was dead.

冷战四十年没有帮助寻找任何不同的解决方案。 RBMT已经死了。

基于示例的机器翻译(EBMT) (Example-based Machine Translation (EBMT))

Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation.

日本对争取机器翻译特别感兴趣。没有冷战，但是有原因：该国很少有人会英语。在即将到来的全球化聚会上，这肯定是一个大问题。因此，日本人非常想找到一种机器翻译的工作方法。

Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation.

基于规则的英日翻译极为复杂。语言结构完全不同，几乎所有单词都必须重新排列并添加新单词。 1984年，京都大学的长冈诚(Makoto Nagao)提出了使用现成短语而不是重复翻译的想法。

Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary.

假设我们必须翻译一个简单的句子-“我要去电影院。” 假设我们已经翻译了另一个类似的句子-“我要去剧院了”-我们可以在字典中找到“电影”一词。

All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation.

我们所需要做的就是弄清楚两个句子之间的区别，翻译丢失的单词，然后再将其弄乱。我们拥有的示例越多，翻译效果越好。

I build phrases in unfamiliar languages exactly the same way!

我用完全不一样的语言来构建短语！

EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years.

EBMT向来自世界各地的科学家展示了今天的曙光：事实证明，您只需为计算机提供现有的翻译，而无需花费多年的时间制定规则和例外。还不是革命，但显然是迈出了第一步。统计翻译的革命性发明将在短短五年内发生。

统计机器翻译(SMT) (Statistical Machine Translation (SMT))

In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns.

1990年初，在IBM研究中心首次展示了一个机器翻译系统，该系统对规则和语言学一无所知。它用两种语言分析了相似的文本，并试图理解其模式。

The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on.

这个想法很简单但是很漂亮。用两种语言将相同的句子分成单词，然后将其匹配。这项操作重复了大约5亿次，以计算“ Das Haus”一词翻译成“房屋”与“建筑物”与“建筑”的次数，等等。

If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born.

如果大多数时候源词都被翻译为“ house”，则机器会使用它。请注意，我们没有设置任何规则，也没有使用任何字典-所有结论都是由机器根据统计数据和“如果人们以这种方式翻译的话，我也会这样做”的逻辑来完成的。统计翻译由此诞生。

The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got.

该方法比所有以前的方法都更加有效和准确。不需要语言学家。我们使用的文字越多，翻译效果就越好。

There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations?

还剩下一个问题：机器如何将“ Das Haus”一词与“ building”一词相关联？我们如何知道这些是正确的翻译？

The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning.

答案是我们不知道。开始时，机器假定单词“ Das Haus”与翻译句子中的任何单词均等相关。接下来，当“ Das Haus”出现在其他句子中时，与“ house”的相关数将增加。那就是“单词对齐算法”，这是大学级机器学习的典型任务。

The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download at UN Corpora and Europarl Corpora.

该机器需要使用两种语言的数百万个句子来收集每个单词的相关统计信息。我们是如何得到它们的？好吧，我们决定采用欧洲议会和联合国安理会会议的摘要-它们以所有成员国的语言提供，现在可以在联合国Corpora和Europarl Corpora上下载。

基于单词的SMT (Word-based SMT)

In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one?

最初，第一种统计翻译系统通过将句子拆分为单词来工作，因为这种方法既简单又合乎逻辑。 IBM的第一个统计转换模型称为Model 1。很优雅吧？猜猜他们叫第二个吗？

Model 1: “the bag of words”

模式1：“言之有物”

Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa.

模型一使用经典方法-分解为单词并统计状态。没有考虑单词顺序。唯一的技巧是将一个单词翻译成多个单词。例如，“ Der Staubsauger”可以变成“ Vacuum Cleaner”，但这并不意味着反过来也可以。

Here’re some simple implementations in Python: shawa/IBM-Model-1.

以下是一些使用Python的简单实现： shawa / IBM-Model-1 。

Model 2: considering the word order in sentences

模型2：考虑句子中的词序

The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases.

缺乏关于语言的单词顺序的知识成为Model 1的问题，这在某些情况下非常重要。

Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy.

模型2解决了这一问题：它记住了单词在输出句子中通常使用的位置，并在中间步骤将单词改编为更自然的声音。情况有所好转，但仍然有些cr脚。

Model 3: extra fertility

模型3：额外生育

New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I do not want Persimmons.” To deal with it, two more steps were added to Model 3.

新词经常出现在翻译中，例如德语文章或英语否定词时使用“ do”。 “我会柿子”→“我不要柿子。” 为了解决这个问题，模型3又增加了两个步骤。

The NULL token insertion, if the machine considers the necessity of a new wordNULL令牌插入(如果计算机认为需要新单词)
Choosing the right grammatical particle or word for each token-word alignment为每个标记词对齐选择正确的语法粒子或词

Model 4: word alignment

模式4：字对齐

Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places.

模型2考虑了单词对齐，但对重新排序一无所知。例如，形容词经常会用名词切换位置，并且无论记忆的顺序如何，都不会使输出更好。因此，模型4考虑到了所谓的“相对顺序”，即两个单词始终切换位置时该模型的学习结果。

Model 5: bugfixes

模型5：错误修正

Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions.

这里没有新内容。模型5为学习提供了更多参数，并修复了词位置冲突的问题。

Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods.

尽管基于单词的系统具有革命性，但仍然无法处理案例，性别和同名异义词。根据机器的说法，每个单词都以一种真实的方式进行翻译。这样的系统已不再使用，因为它们已被更高级的基于短语的方法所取代。

基于短语的SMT (Phrase-based SMT)

This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row.

该方法基于所有基于单词的翻译原则：统计，重新排序和词汇技巧。尽管为了学习，它不仅将文本分割为单词，而且将短语分解为单词。确切地说，这些是n元语法，它们是连续n个单词的连续序列。

Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy.

因此，该机器学会了翻译单词的稳定组合，从而显着提高了准确性。

The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.”

诀窍是，短语并不总是简单的语法结构，如果任何了解语言学和句子结构的人干扰了翻译质量，翻译质量就会大大下降。计算机语言学的开拓者弗雷德里克·杰利内克(Frederick Jelinek)曾开玩笑说：“每次我解雇语言学家时，语音识别器的性能都会提高。”

Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose.

除了提高准确性外，基于短语的翻译还提供了更多选择双语文本进行学习的选项。对于基于单词的翻译，来源的精确匹配至关重要，不包括任何文学或自由翻译。基于短语的翻译从他们那里学习毫无问题。为了提高翻译质量，研究人员甚至开始为此目的解析不同语言的新闻网站。

Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features.

从2006年开始，每个人都开始使用这种方法。直到2016年，Google Translate，Yandex，Bing以及其他备受瞩目的在线翻译都以词组为基础。你们每个人都可以回忆起Google完美无瑕地翻译句子或完全废话的时刻，对吗？废话来自基于短语的功能。

The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly.

良好的基于规则的旧方法始终提供可预测但可怕的结果。统计方法令人惊讶且令人困惑。 Google翻译毫不犹豫地将“三百”变成了“ 300”。这就是所谓的统计异常。

Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation.

基于短语的翻译已经变得非常流行，以至于当您听到“统计机器翻译”时，这实际上是什么意思。直到2016年，所有研究都称赞基于短语的翻译是最新技术。那时，甚至没有人想到Google已经在惹火了，准备改变我们整个机器翻译的形象。

基于语法的SMT (Syntax-based SMT)

This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off.

还应简要提及此方法。在神经网络出现之前的很多年，基于语法的翻译被认为是“未来或翻译”，但是这个想法并没有成功。

The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all.

支持基于语法的翻译的人认为可以将其与基于规则的方法合并。必须对句子进行相当精确的语法分析-确定句子的主语，谓语和其他部分，然后构建句子树。使用它，机器学习如何在语言之间转换句法单位，并通过单词或短语来翻译其余部分。那将彻底解决单词对齐问题。

The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method.

问题是，尽管我们认为它已经解决了一段时间(因为我们有现成的语言支持多种语言的库)，但语法分析却难以完成。我试图使用语法树来完成比解析主题和谓词更为复杂的任务。而且每次我放弃并使用另一种方法。

Let me know in the comments if you succeed using it at least once.

如果您至少成功使用过一次，请在评论中让我知道。

神经机器翻译(NMT) (Neural Machine Translation (NMT))

A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement.

关于在机器翻译中使用神经网络的一篇非常有趣的论文于2014年发表。除了Google之外，互联网根本没有注意到它-他们掏出铁锹开始挖掘。两年后的2016年11月，Google 宣布了改变游戏规则的公告。

The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful.

这个想法几乎是在照片之间传递风格。还记得Prisma之类的应用程序吗？这些应用程序以某种著名艺术家的风格增强了图片的显示能力？没有魔法。训练了神经网络识别艺术家的画作。接下来，删除了包含网络决策的最后一层。生成的风格化图片只是网络获得的中间图像。那是网络的幻想，我们认为它很美。

If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text).

如果我们可以将样式转移到照片上，如果尝试在源文本中加上另一种语言怎么办？文本就是那个精确的“艺术家风格”，我们将在保留图像实质(即文本实质)的同时尝试进行传递。

Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it.

想象一下我要描述我的狗-平均大小，鼻子尖，尾巴短，总是吠叫。如果我为您提供了这组狗的功能，并且描述很精确，即使您从未见过，也可以绘制它。

Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can.

现在，假设源文本是一组特定功能。基本上，这意味着您要对其进行编码，然后让另一个神经网络将其解码回文本，但是要使用另一种语言。解码器仅知道其语言。它不知道功能的起源，但可以用西班牙语表示。继续类推，用蜡笔，水彩画或手指画狗都没关系。您可以绘画它。

Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da.

再一次-一个神经网络只能将句子编码为一组特定的功能，而另一个神经网络只能将其解码回文本。彼此都不知道，彼此都只知道自己的语言。还记得吗？国际语回来了。 -

The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure.

问题是，我们如何找到这些功能？当我们谈论狗时很明显，但是如何处理文字呢？ 30年前，科学家们已经尝试创建通用语言代码，但最终却以失败告终。

Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well.

尽管如此，我们现在已经有了深度学习。这就是它的基本任务！深度学习和经典神经网络之间的主要区别恰好在于能够搜索那些特定特征，而无需对其性质进行任何了解。如果神经网络足够大，并且手头有成千上万个视频卡，那么也可以在文本中找到这些功能。

Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves.

从理论上讲，我们可以将神经网络获得的特征传递给语言学家，以便他们可以为自己开辟新的视野。

The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels.

问题是，应使用哪种类型的神经网络进行编码和解码？卷积神经网络(CNN)非常适合图片，因为它们使用独立的像素块进行操作。

But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case.

但是，文本中没有独立的块-每个词都取决于其周围环境。文字，语音和音乐始终是一致的。因此，递归神经网络(RNN)将是处理它们的最佳选择，因为它们会记住先前的结果-在我们的例子中是先前的单词。

Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots.

现在，RNN随处可见-Siri的语音识别(解析声音的顺序，下一个取决于前一个)，键盘提示(记住前一个，猜下一个)，音乐生成，甚至是聊天机器人。

For the nerds like me: in fact, the neural translators’ architecture varies widely. The regular RNN was used at the beginning, then upgraded to bi-directional, where the translator considered not only words before the source word, but also the next word. That was much more effective. Then it followed with the hardcore multilayer RNN with LSTM-units for long-term storing of the translation context.

对于像我这样的书呆子：实际上，神经翻译的体系结构千差万别。常规RNN在开始时使用，然后升级为双向，其中翻译器不仅考虑源单词之前的单词，还考虑下一个单词。那更有效。然后是带有LSTM单元的硬核多层RNN，用于长期存储翻译上下文。

In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so.

在两年内，神经网络超越了过去20年翻译中出现的一切。神经翻译减少了50％的单词顺序错误，17％的词汇错误和19％的语法错误。神经网络甚至学会了用不同语言协调性别和大小写。而且没有人教他们这样做。

The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss.

最明显的改进发生在从未使用直接翻译的领域。统计机器翻译方法始终以英语为主要来源。因此，如果您从俄语翻译为德语，则机器首先将文本翻译为英语，然后再从英语翻译为德语，这会造成双重损失。

Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible.

神经翻译不需要-只需解码器即可工作。这是首次没有普通词典的语言之间的直接翻译成为可能。

Google翻译(自2016年起) (Google Translate (since 2016))

In 2016, Google turned on neural translation for nine languages. They developed their system named Google Neural Machine Translation (GNMT). It consists of 8 encoder and 8 decoder layers of RNNs, as well as attention connections from the decoder network.

在2016年，Google启用了9种语言的神经翻译。他们开发了名为Google神经机器翻译(GNMT)的系统。它由RNN的8个编码器层和8个解码器层以及来自解码器网络的注意连接组成。

They not only divided sentences, but also words. That was how they dealt with one of the major NMT issues — rare words. NMTs are helpless when the word is not in their lexicon. Let’s say, “Vas3k”. I doubt anyone taught the neural network to translate my nickname. In that case, GMNT tries to break words into word pieces and recover the translation of them. Smart.

他们不仅划分句子，而且划分单词。这就是他们如何处理NMT的主要问题之一-罕见的单词。当单词不在词典中时，NMT是无助的。假设是“ Vas3k”。我怀疑有人教过神经网络来翻译我的昵称。在这种情况下，GMNT会尝试将单词分解为单词片段并恢复其翻译。聪明。

Hint: Google Translate used for website translation in the browser still uses the old phrase-based algorithm. Somehow, Google hasn’t upgraded it, and the differences are quite noticeable compared to the online version.

提示：在浏览器中用于网站翻译的Google Translate仍使用旧的基于短语的算法。不知何故，Google尚未对其进行升级，与在线版本相比，差异非常明显。

Google uses a crowdsourcing mechanism in the online version. People can choose the version they consider the most correct, and if lots of users like it, Google will always translate this phrase that way and mark it with a special badge. This works fantastically for short everyday phrases such as, “Let’s go to the cinema,” or, “I’m waiting for you.” Google knows conversational English better than I do :(

Google在在线版本中使用了众包机制。人们可以选择他们认为最正确的版本，如果很多用户喜欢它，Google会始终以这种方式翻译该短语并用特殊的徽章标记。这对于日常的简短用语非常有用，例如“我们去电影院”或“我在等你”。 Google比我更了解会话英语:(

Microsoft’s Bing works exactly like Google Translate. But Yandex is different.

微软的Bing的工作方式与Google Translate完全相同。但是Yandex是不同的。

Yandex Translate(自2017年) (Yandex Translate (since 2017))

Yandex launched its neural translation system in 2017. Its main feature, as declared, was hybridity. Yandex combines neural and statistical approaches to translate the sentence, and then it choose the best one with its favorite CatBoost algorithm.

Yandex于2017年推出了其神经翻译系统。正如所宣称的那样，其主要功能是杂交。 Yandex结合了神经和统计方法来翻译句子，然后通过其最喜欢的CatBoost算法选择最佳的句子。

The thing is, neural translation often fails when translating short phrases, since it uses context to choose the right word. It would be hard if the word appeared very few times in a training data. In such cases, a simple statistical translation finds the right word quickly and simply.

事实是，神经翻译在翻译短短语时通常会失败，因为它使用上下文来选择正确的单词。如果单词在训练数据中很少出现几次将很难。在这种情况下，简单的统计翻译可以快速，简单地找到正确的单词。

Yandex doesn’t share the details. It fends us off with marketing press-releases. OKAY.

Yandex没有透露细节。它为我们提供了营销新闻稿。好的。

It looks like Google uses SMT for the translation of words and short phrases. They don’t mention that in any articles, but it’s quite noticeable if you look at the difference between the translation of short and long expressions. Besides, SMT is used for displaying the word’s stats.

看起来Google使用SMT来翻译单词和短语。他们在任何文章中都没有提到这一点，但是如果您看一下短表达式和长表达式的翻译之间的区别，那将是非常明显的。此外，SMT用于显示单词的统计信息。

结论与未来 (The conclusion and the future)

Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype…

每个人仍然对“通天鱼”的想法感到兴奋，即即时语音翻译。 Google已通过其Pixel Buds向其迈出了一步，但实际上，这并不是我们梦were以求的。即时语音翻译与通常的翻译不同。您需要知道何时开始翻译以及何时关闭和收听。我还没有找到合适的方法来解决这个问题。除非，也许是Skype…

And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language.

这里还有一个空白区域：所有学习仅限于并行文本块的集合。最深层的神经网络仍在平行文本中学习。如果不为神经网络提供源，我们就无法教它。相反，人们可以通过阅读书籍或文章来补充其词典，即使不将其翻译为母语。

If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it.

理论上，如果人们能够做到，神经网络也能做到。我发现只有一个原型试图煽动使用一种语言的网络来阅读另一种语言的文本，以获取经验。我会自己尝试，但我很傻。好的，就是这样。

This story originally was written in Russian and then translated into English on Vas3k.com by Vasily Zubarev. He is my pen-friend and I’m pretty sure that his blog should be spread.

这个故事最初是用俄语写的，然后由Vasily Zubarev在Vas3k.com上翻译成英文。他是我的笔友，我很确定他的博客应该传播。

有用的链接 (Useful links)

Philipp Koehn: Statistical Machine Translation. Most complete collection of the methods I’ve found.

Philipp Koehn：统计机器翻译。我发现的方法的最完整的集合。
Moses — popular library for creating own statistical translations

Moses-用于创建自己的统计翻译的流行图书馆
OpenNMT — one more library, but for the neural translators

OpenNMT —另一个库，但用于神经翻译
The article from one of my favorite bloggers explaining RNN and LSTM

我最喜欢的一位博主的文章解释了RNN和LSTM
A video “How to Make a Language Translator”, funny guy, neat explanation. Still not enough.

视频“ How to Make a Language Translator” ，有趣的家伙，简洁的解释。还不够。
Text guide from TensorFlow about creation of your own neural translator, for those who want more examples and to try the code.

TensorFlow的文本指南，关于想要创建更多示例并尝试代码的人，如何创建自己的神经翻译器。

Vas3k.com上的其他文章 (Others articles from Vas3k.com)

How Ethereum and Smart Contracts WorkDistributed Turing Machine with Blockсhain Protectionvas3k.comBlockchain Inside Out: How Bitcoin WorksOnce and for all in simple wordsvas3k.com

以太坊和智能合约如何 使用具有Blockсhain保护功能的分布式Turing机器 vas3k.com 区块链由内而外： 简单地说， 比特币将 一劳永逸地运作

最后一件事… (One last thing…)

If you liked this article, click the? below, and share it with other people so they can enjoy it as well.

如果您喜欢本文，请单击“ ？”。 在下面，并与其他人分享，以便他们也可以享受。

翻译自: https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/