Entity-aware chunking is a document segmentation strategy that uses named entity recognition (NER) to inform split decisions, explicitly aiming to keep all mentions of a specific entity—such as a person, organization, or location—within a single text chunk. This method contrasts with naive character or token-based splitting by prioritizing semantic cohesion around key subjects. The primary goal is to preserve the complete contextual relationships and attributes associated with an entity, which is critical for downstream tasks like retrieval-augmented generation (RAG) and knowledge graph population, where fragmented entity information leads to poor retrieval recall and factual inconsistency.
