Attention Is All You Need
Paper: Attention Is All You Need Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Published: NeurIPS 2017 Summary This paper introduced the Transformer architecture, which relies entirely on attention mechanisms and dispenses with recurrence and convolutions. The model achieves state-of-the-art results on machine translation tasks while being significantly more parallelizable and requiring less time to train. Key Contributions Self-attention mechanism: Introduced multi-head self-attention as the primary building block for sequence modeling. ...