Skip to main content

Transformers

Self-attention, multi-head attention, positional encoding, encoder-decoder architecture.