Context truncation is the process of discarding tokens from a sequence—typically from the beginning or middle—to forcibly fit it within a model's fixed context window. This is a blunt but necessary operation when input exceeds the model's token limit, as exceeding this limit will cause the inference to fail. It is the most basic form of context window management, often implemented as a first-line defense in agentic workflows when more sophisticated techniques like context summarization or sliding window attention are unavailable or too costly.
