Recursive character text splitting is a chunking algorithm that recursively divides text using a prioritized hierarchy of separators—such as paragraphs, sentences, and words—until the resulting segments conform to a specified size constraint, typically measured in characters or tokens. This method prioritizes semantic coherence by attempting to split at natural language boundaries first, only resorting to less ideal separators (like spaces or fixed character counts) if necessary to meet size targets. It is a foundational technique in Retrieval-Augmented Generation (RAG) architectures for preparing documents for vector storage.
