A short-term memory cache is a fast, volatile memory buffer in an agentic system that holds recently accessed or generated information for immediate reuse, drastically reducing latency in repetitive operations like context retrieval or state tracking. It functions as the agent's working memory, analogous to a CPU's L1/L2 cache, providing sub-millisecond access to critical operational context such as the last few user interactions, intermediate reasoning steps, or frequently referenced tool outputs. This cache is typically implemented in RAM and managed by eviction policies like Least Recently Used (LRU) to maintain relevance.
