A context window is the fixed-size, sequential block of tokens—representing text, images, or other data—that a transformer-based language model can attend to and process in a single forward pass, establishing its absolute working memory limit. This architectural constraint, determined by the model's positional encoding scheme and training data, forces all relevant information for a task—prompts, conversation history, and retrieved documents—to fit within this finite token limit. Exceeding it triggers context truncation or requires context compression techniques.
