Positional encoding is a method for incorporating sequence order information into transformer models, which lack inherent recurrence or convolution to understand token position. Since the transformer's core self-attention mechanism is permutation-invariant, these encodings are added to the input token embeddings before processing. This allows the model to differentiate between "the dog bit the man" and "the man bit the dog," where word order changes meaning. Common implementations include fixed sinusoidal functions and learned positional embeddings.
