Context compression is a category of algorithms designed to reduce the token count of input data—such as conversation history, retrieved documents, or system instructions—while aiming to preserve its semantic utility for a language model's reasoning. This is a critical engineering technique for agentic workflows, where the finite context window imposes a hard limit on operational memory. Methods include summarization, distillation, and selective filtering, each trading off compression ratio against information fidelity to maximize task performance within token budgets.
