Journal
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Volume 42, Issue 1, Pages 154-163Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2018.2876404
Keywords
Decoding; Task analysis; Semantics; NIST; Encoding; Neural networks; Analytical models; Deep attention network; neural machine translation (NMT); attention-based sequence-to-sequence learning; natural language processing
Funding
- National Natural Science Foundation of China [61672440, 61622209]
- Fundamental Research Funds for the Central Universities [ZK1024]
- Scientific Research Project of National Language Committee of China [YB135-49]
- Baidu Scholarship
Ask authors/readers for more resources
Deepening neural models has been proven very successful in improving the models capacity when solving complex learning tasks, such as the machine translation task. Previous efforts on deep neural machine translation mainly focus on the encoder and the decoder, while little on the attention mechanism. However, the attention mechanism is of vital importance to induce the translation correspondence between different languages where shallow neural networks are relatively insufficient, especially when the encoder and decoder are deep. In this paper, we propose a deep attention model (DeepAtt). Based on the low-level attention information, DeepAtt is capable of automatically determining what should be passed or suppressed from the corresponding encoder layer so as to make the distributed representation appropriate for high-level attention and translation. We conduct experiments on NIST Chinese-English, WMT English-German, and WMT English-French translation tasks, where, with five attention layers, DeepAtt yields very competitive performance against the state-of-the-art results. We empirically find that with an adequate increase of attention layers, DeepAtt tends to produce more accurate attention weights. An in-depth analysis on the translation of important context words further reveals that DeepAtt significantly improves the faithfulness of system translations.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available