In simplistic terms, transformer ( originally proposed "Attention is all you need !" ) is an encoder-decoder ( "without any RNNs" ) model. The cool thing about transformer is that the architecture of encoder or decoder seperately is quite deep, and they both individually form cool language models. In not very strict terms, encoder part forms the BERT, and the decoder part forms the GPT
The paper showed state of the art results competent with the BLEU scores of earlier state of the art techniques, sometimes even surpassing the earlier SOTA scores. Another important aspect is the reduced complexity in terms of time and space compared to the earlier model architectures.
The paper introduced a new architecure in the domain of deep learning that is being actively used in other areas of the field including but not limited to Computer Vision, NLP.
Initial paper focuses on the problem of neural machine translation, most of the earlier approaches in this area have been vanilla encoder decoder ( using RNNs ) and more recent ones with attention mechanism combined with vanilla encoder decoder models.
This paper goes by its name - Attention is all you need! and proposes a new architecture for Neural Machine Transaltion.
Attention mechanism discussed in the form of vector, query and key is kind of heart of the complete paper. Another major component is the concept of positional encodings, since paper does not use any RNNs, there is no correspondence given to the positioning of words with respect to each other as such. To incorporate the importance of ordering, it concatenates a positional encoding along side input embedding of the words