Inferensys

Glossary

Trainable Parameters

Trainable parameters are the specific weights and biases within a neural network that are updated via gradient descent during the training or fine-tuning process to minimize a loss function.
Enterprise console with connected nodes and monitoring panels for orchestrated systems.
PARAMETER-EFFICIENT FINE-TUNING

What are Trainable Parameters?

In machine learning, trainable parameters are the numerical values within a neural network that are updated via gradient descent during training to minimize a loss function.

In the context of Parameter-Efficient Fine-Tuning (PEFT), trainable parameters refer specifically to the small, strategically selected subset of a model's total weights that are updated during adaptation, while the vast majority of the frozen backbone remains fixed. This subset includes the weights of injected modules like adapters, Low-Rank Adaptation (LoRA) matrices, or prompt embeddings. By updating only this tiny fraction—often less than 1% of total parameters—PEFT achieves efficient domain adaptation with minimal compute and memory overhead, preventing catastrophic forgetting of the model's pre-trained knowledge.

The count and configuration of trainable parameters are controlled by hyperparameters like an adapter's bottleneck dimension or LoRA's rank. Managing this sparse set of delta weights is central to PEFT's efficiency, enabling techniques like model merging via task vectors. For encoder PEFT (e.g., BERT) and multimodal fusion PEFT (e.g., CLIP), these parameters are inserted at specific injection points to adapt the model's processing for new tasks without the cost of full retraining.

PARAMETER-EFFICIENT FINE-TUNING

Core Characteristics of Trainable Parameters

In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the small, strategic subset of a model's total weights that are updated to adapt the model to a new task, while the vast majority of the pre-trained 'frozen backbone' remains unchanged.

01

Sparse Activation

Trainable parameters in PEFT represent a sparse subset of the model's total parameter space. Instead of updating all weights (full fine-tuning), methods like BitFit (which updates only bias terms) or sparse fine-tuning selectively activate a tiny fraction—often less than 1%—of the model's parameters. This sparsity is the core mechanism for achieving massive reductions in memory footprint and compute cost during adaptation.

02

Additive & Non-Destructive

PEFT trainable parameters are typically additive. They introduce new, task-specific parameters (e.g., Adapter modules, LoRA matrices) alongside the frozen base model. The original pre-trained knowledge is preserved intact, preventing catastrophic forgetting. The adaptation is captured in a separate, composable component, often called delta weights or a task vector, which can be applied, removed, or combined.

03

Low-Rank Structure

Many PEFT methods impose a low-rank structure on the trainable parameter updates to maximize efficiency. Low-Rank Adaptation (LoRA) hypothesizes that weight updates during adaptation have a low 'intrinsic rank.' It represents the update ΔW for a weight matrix as the product of two smaller matrices (B * A), where the rank is a critical hyperparameter controlling capacity. This reduces parameters from d * k to r * (d + k), where r << d, k.

04

Bottleneck Design

Adapter-based methods use a bottleneck architecture to limit trainable parameters. A standard adapter applies a down-projection to a small bottleneck dimension, a non-linearity, and an up-projection back to the original dimension. The bottleneck dimension (e.g., 64) and reduction factor (e.g., 16) determine parameter count. This forces information through a compressed, learnable channel, enabling efficient task-specific transformation of activations.

05

Strategic Injection

Trainable parameters are inserted at precise injection points within the model architecture. Common locations include:

  • After the multi-head attention module.
  • After the feed-forward network in a transformer block.
  • As prefixes to the key/value matrices in attention (Prefix Tuning). The choice of injection point (e.g., for BERT Adapters or ViT Adapters) is critical for effectively steering model behavior with minimal parameters.
06

Modality-Agnostic & Specialized

The core principle is modality-agnostic: the same PEFT concepts apply to text (LLMs), vision (ViTs), audio, and multimodal models. However, specialized instantiations exist:

  • Visual Adapters for image tasks.
  • Audio Adapters for speech models.
  • VL-Adapters or Cross-Modal Adapters for vision-language models like CLIP, which adapt the fusion mechanisms between modalities using efficient parameters.
CORE CONCEPT

The Role of Trainable Parameters in PEFT

In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the minimal set of weights updated to adapt a massive pre-trained model to a new task, while the vast majority of the model's original parameters remain frozen.

Trainable parameters are the specific, newly introduced weights—such as those in Low-Rank Adaptation (LoRA) matrices, adapter modules, or prompt embeddings—that are optimized during fine-tuning. They represent a tiny fraction (often <1%) of the model's total parameters, enabling efficient adaptation by learning a compact task vector or delta weights that encapsulate the new knowledge. The core frozen backbone model provides the pre-trained knowledge and feature representations.

The strategic selection and configuration of these parameters—governed by hyperparameters like bottleneck dimension and rank—directly control the trade-off between adaptation capacity, computational cost, and risk of catastrophic forgetting. In multimodal fusion PEFT, trainable parameters are often placed at injection points between modalities to efficiently learn new cross-modal interactions without retraining the entire vision-language model.

METHOD COMPARISON

How PEFT Methods Define Trainable Parameters

A comparison of how different Parameter-Efficient Fine-Tuning (PEFT) techniques define and manage the subset of parameters updated during training, relative to a frozen pre-trained model backbone.

PEFT MethodParameter Definition StrategyTypical % of Params TrainedPrimary Injection PointsKey Hyperparameter

Adapter

Inserts small, fully-connected bottleneck modules (down-projection + non-linearity + up-projection) into transformer layers.

0.5% - 8%

After attention & FFN sub-layers

Bottleneck Dimension (reduction factor r)

LoRA (Low-Rank Adaptation)

Approximates weight update ΔW for a frozen matrix W with a low-rank decomposition: ΔW = B * A, where A and B are trainable.

0.01% - 0.1%

Query & Value projection matrices in attention

Rank (r) of decomposition

Prefix Tuning

Prepends trainable continuous vectors (prefix) to the key and value sequences in the attention mechanism of every layer.

< 0.1%

Key & Value caches in attention

Prefix Length (number of virtual tokens)

Prompt Tuning

Optimizes a small set of continuous token embeddings (soft prompts) prepended only to the model's input layer.

< 0.01%

Input embedding layer

Prompt Length

BitFit

Updates only the bias terms (b) within the transformer architecture, leaving all weight matrices (W) frozen.

0.09% - 0.1%

All bias vectors in linear/attention layers

(IA)³

Introduces trainable scaling vectors that multiplicatively modulate (amplify/inhibit) inner activations (keys, values, FFN outputs).

~0.01% - 0.1%

Key, Value, and FFN output activations

Visual / VL-Adapter

Inserts adapter modules into the transformer blocks of a vision or vision-language model (e.g., ViT, CLIP).

1% - 5%

After attention & FFN in visual encoder; cross-attention in fusion modules

Bottleneck Dimension

QLoRA

Applies LoRA to a 4-bit quantized base model, with additional trainable parameters for quantization constants.

0.01% - 0.1%

Same as LoRA (Query & Value projections)

Rank (r) & 4-bit Block Size

TRAINABLE PARAMETERS

Frequently Asked Questions

In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the small, strategic subset of a model's total weights that are updated during adaptation. This FAQ addresses their role, calculation, and impact on model performance and deployment.

In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters refer to the minimal subset of a neural network's total weights that are updated during the adaptation process, while the vast majority of the frozen backbone model remains static. These parameters constitute the delta weights—the small, learned change applied to the base model to specialize it for a new task or domain. Common examples include the weights within Adapter modules, the low-rank matrices in LoRA, or the continuous embeddings in Prompt Tuning. The core principle is to achieve performance comparable to full fine-tuning by training only 0.1% to 5% of the total parameters, drastically reducing computational cost and memory footprint.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.