TensorFlow 神经机器翻译教程-TensorFlow Neural Machine Translation Tutorial

seq2seq 模型在广泛的任务比如机器翻译,语音识别,文本总结中取得了巨大的成功。这个教程给读者 seq2seq 模型一个完整的理解,并且展示如何从原型建立一个有竞争力的 seq2seq 模型。我们专注于神经机器翻译任务,这是 seq2seq 模型取得的第一个广泛的成功。下面包含的代码是轻量级,高质量,产品级,并且包含了最新的研究思路。

我们通过以下实现了这个目标:

1.使用了最近的 decoder attention API

2.包含了我们强大的简历 RNN 和 seq2seq 模型的经验。

3.提供了技巧来简历最好的 NMT 模型来复制  GNMT 系统。

Sequence-to-sequence (seq2seq) models (Sutskever et al., 2014Cho et al., 2014) have enjoyed great success in a variety of tasks such as machine translation, speech recognition, and text summarization. This tutorial gives readers a full understanding of seq2seq models and shows how to build a competitive seq2seq model from scratch. We focus on the task of Neural Machine Translation (NMT) which was the very first testbed for seq2seq models with wild success. The included code is lightweight, high-quality, production-ready, and incorporated with the latest research ideas. We achieve this goal by:

  1. Using the recent decoder / attention wrapper API, TensorFlow 1.2 data iterator
  2. Incorporating our strong expertise in building recurrent and seq2seq models
  3. Providing tips and tricks for building the very best NMT models and replicating Google’s NMT (GNMT) system.

We believe that it is important to provide benchmarks that people can easily replicate. As a result, we have provided full experimental results and pretrained on models on the following publicly available datasets:

  1. Small-scale: English-Vietnamese parallel corpus of TED talks (133K sentence pairs) provided by the IWSLT Evaluation Campaign.
  2. Large-scale: German-English parallel corpus (4.5M sentence pairs) provided by the WMT Evaluation Campaign.

We first build up some basic knowledge about seq2seq models for NMT, explaining how to build and train a vanilla NMT model. The second part will go into details of building a competitive NMT model with attention mechanism. We then discuss tips and tricks to build the best possible NMT models (both in speed and translation quality) such as TensorFlow best practices (batching, bucketing), bidirectional RNNs, beam search, as well as scaling up to multiple GPUs using GNMT attention.

Related posts

Leave a Comment