Cache eviction is the automated process of removing entries from a cache according to a predefined policy to free space for new data. In AI systems, this is essential for managing the KV Cache during long-sequence generation and for maintaining context within an agent's working memory. Common policies include Least Recently Used (LRU), which discards the oldest-accessed data, and First-In-First-Out (FIFO), which removes the earliest cached items. The process is triggered when the cache reaches capacity, preventing memory overflow and ensuring efficient resource utilization.
