Sparse training is a neural network optimization technique that trains a model from initialization with a permanently fixed, sparse connectivity pattern, bypassing the traditional cycle of dense pre-training followed by pruning. This method enforces structural sparsity—where a large percentage of weights are set to zero and remain frozen—throughout the entire training process. By avoiding the calculation and storage of gradients for these zeroed parameters, it achieves significant reductions in computational footprint and memory bandwidth requirements during both forward and backward passes.
