Query-aware chunking is a dynamic document segmentation strategy where the granularity or boundaries of text chunks are optimized at retrieval time based on the specific information need expressed in a user's query. Unlike static methods like semantic chunking or recursive character text splitting, it treats chunking as a retrieval-time optimization problem. The core mechanism involves re-evaluating or re-segmenting source documents—or their pre-computed embeddings—to create query-specific chunks that maximize the relevance of the information passed to a large language model in a retrieval-augmented generation pipeline.
