NTK-Aware Scaling is a method for extending the context window of language models that use Rotary Positional Embeddings (RoPE) by adjusting the base frequency of the embeddings according to insights from Neural Tangent Kernel (NTK) theory. Instead of naively interpolating positional indices, it scales the rotary base, allowing the model to better generalize to longer sequences without catastrophic failure. This technique enables models trained on shorter contexts to handle significantly longer inputs with minimal or no fine-tuning, addressing the fundamental extrapolation problem in positional encodings.
