Adaptive computation is a model efficiency paradigm where a neural network's computational graph is not fixed but dynamically adjusted per input. Instead of applying the same uniform processing to all samples, the model allocates more resources—such as layers, neurons, or processing time—to complex inputs and fewer to simple ones. This is achieved through mechanisms like early exiting, where intermediate layers can produce a final output, or conditional computation, where specialized subnetworks are activated only when needed. The core goal is to reduce average inference latency and computational cost without sacrificing accuracy on challenging tasks.
