Attention Is All You Need

The authors propose the Transformer, a novel network architecture for sequence transduction based entirely on attention mechanisms, replacing recurrent and convolutional layers. The Transformer is significantly more parallelizable and requires less time to train compared to previous state-of-the-art models. Experiments on machine translation tasks (WMT 2014 English-to-German and English-to-French) demonstrate the model's superior quality and efficiency. The model also generalizes well to other tasks like English constituency parsing, establishing new state-of-the-art performance in multiple benchmarks.

Attention Is All You Need

Abstract

Comments

Attention Is All You Need

Listen & Save