LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning (PEFT) technique that adapts large pre-trained models by injecting trainable, low-rank decomposition matrices into their existing weight layers. Instead of updating all original parameters, LoRA freezes the pre-trained weights and adds a pair of small matrices whose product represents a low-rank update to a specific layer, such as the attention or feed-forward modules in a transformer. This approach drastically reduces the number of trainable parameters—often by over 99%—and the associated computational and memory overhead, making it feasible to fine-tune massive models like GPT or Llama on consumer-grade hardware.
