In the context of Parameter-Efficient Fine-Tuning (PEFT), trainable parameters refer specifically to the small, strategically selected subset of a model's total weights that are updated during adaptation, while the vast majority of the frozen backbone remains fixed. This subset includes the weights of injected modules like adapters, Low-Rank Adaptation (LoRA) matrices, or prompt embeddings. By updating only this tiny fraction—often less than 1% of total parameters—PEFT achieves efficient domain adaptation with minimal compute and memory overhead, preventing catastrophic forgetting of the model's pre-trained knowledge.
Glossary
Trainable Parameters

What are Trainable Parameters?
In machine learning, trainable parameters are the numerical values within a neural network that are updated via gradient descent during training to minimize a loss function.
The count and configuration of trainable parameters are controlled by hyperparameters like an adapter's bottleneck dimension or LoRA's rank. Managing this sparse set of delta weights is central to PEFT's efficiency, enabling techniques like model merging via task vectors. For encoder PEFT (e.g., BERT) and multimodal fusion PEFT (e.g., CLIP), these parameters are inserted at specific injection points to adapt the model's processing for new tasks without the cost of full retraining.
Core Characteristics of Trainable Parameters
In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the small, strategic subset of a model's total weights that are updated to adapt the model to a new task, while the vast majority of the pre-trained 'frozen backbone' remains unchanged.
Sparse Activation
Trainable parameters in PEFT represent a sparse subset of the model's total parameter space. Instead of updating all weights (full fine-tuning), methods like BitFit (which updates only bias terms) or sparse fine-tuning selectively activate a tiny fraction—often less than 1%—of the model's parameters. This sparsity is the core mechanism for achieving massive reductions in memory footprint and compute cost during adaptation.
Additive & Non-Destructive
PEFT trainable parameters are typically additive. They introduce new, task-specific parameters (e.g., Adapter modules, LoRA matrices) alongside the frozen base model. The original pre-trained knowledge is preserved intact, preventing catastrophic forgetting. The adaptation is captured in a separate, composable component, often called delta weights or a task vector, which can be applied, removed, or combined.
Low-Rank Structure
Many PEFT methods impose a low-rank structure on the trainable parameter updates to maximize efficiency. Low-Rank Adaptation (LoRA) hypothesizes that weight updates during adaptation have a low 'intrinsic rank.' It represents the update ΔW for a weight matrix as the product of two smaller matrices (B * A), where the rank is a critical hyperparameter controlling capacity. This reduces parameters from d * k to r * (d + k), where r << d, k.
Bottleneck Design
Adapter-based methods use a bottleneck architecture to limit trainable parameters. A standard adapter applies a down-projection to a small bottleneck dimension, a non-linearity, and an up-projection back to the original dimension. The bottleneck dimension (e.g., 64) and reduction factor (e.g., 16) determine parameter count. This forces information through a compressed, learnable channel, enabling efficient task-specific transformation of activations.
Strategic Injection
Trainable parameters are inserted at precise injection points within the model architecture. Common locations include:
- After the multi-head attention module.
- After the feed-forward network in a transformer block.
- As prefixes to the key/value matrices in attention (Prefix Tuning). The choice of injection point (e.g., for BERT Adapters or ViT Adapters) is critical for effectively steering model behavior with minimal parameters.
Modality-Agnostic & Specialized
The core principle is modality-agnostic: the same PEFT concepts apply to text (LLMs), vision (ViTs), audio, and multimodal models. However, specialized instantiations exist:
- Visual Adapters for image tasks.
- Audio Adapters for speech models.
- VL-Adapters or Cross-Modal Adapters for vision-language models like CLIP, which adapt the fusion mechanisms between modalities using efficient parameters.
The Role of Trainable Parameters in PEFT
In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the minimal set of weights updated to adapt a massive pre-trained model to a new task, while the vast majority of the model's original parameters remain frozen.
Trainable parameters are the specific, newly introduced weights—such as those in Low-Rank Adaptation (LoRA) matrices, adapter modules, or prompt embeddings—that are optimized during fine-tuning. They represent a tiny fraction (often <1%) of the model's total parameters, enabling efficient adaptation by learning a compact task vector or delta weights that encapsulate the new knowledge. The core frozen backbone model provides the pre-trained knowledge and feature representations.
The strategic selection and configuration of these parameters—governed by hyperparameters like bottleneck dimension and rank—directly control the trade-off between adaptation capacity, computational cost, and risk of catastrophic forgetting. In multimodal fusion PEFT, trainable parameters are often placed at injection points between modalities to efficiently learn new cross-modal interactions without retraining the entire vision-language model.
How PEFT Methods Define Trainable Parameters
A comparison of how different Parameter-Efficient Fine-Tuning (PEFT) techniques define and manage the subset of parameters updated during training, relative to a frozen pre-trained model backbone.
| PEFT Method | Parameter Definition Strategy | Typical % of Params Trained | Primary Injection Points | Key Hyperparameter |
|---|---|---|---|---|
Adapter | Inserts small, fully-connected bottleneck modules (down-projection + non-linearity + up-projection) into transformer layers. | 0.5% - 8% | After attention & FFN sub-layers | Bottleneck Dimension (reduction factor r) |
LoRA (Low-Rank Adaptation) | Approximates weight update ΔW for a frozen matrix W with a low-rank decomposition: ΔW = B * A, where A and B are trainable. | 0.01% - 0.1% | Query & Value projection matrices in attention | Rank (r) of decomposition |
Prefix Tuning | Prepends trainable continuous vectors (prefix) to the key and value sequences in the attention mechanism of every layer. | < 0.1% | Key & Value caches in attention | Prefix Length (number of virtual tokens) |
Prompt Tuning | Optimizes a small set of continuous token embeddings (soft prompts) prepended only to the model's input layer. | < 0.01% | Input embedding layer | Prompt Length |
BitFit | Updates only the bias terms (b) within the transformer architecture, leaving all weight matrices (W) frozen. | 0.09% - 0.1% | All bias vectors in linear/attention layers | |
(IA)³ | Introduces trainable scaling vectors that multiplicatively modulate (amplify/inhibit) inner activations (keys, values, FFN outputs). | ~0.01% - 0.1% | Key, Value, and FFN output activations | |
Visual / VL-Adapter | Inserts adapter modules into the transformer blocks of a vision or vision-language model (e.g., ViT, CLIP). | 1% - 5% | After attention & FFN in visual encoder; cross-attention in fusion modules | Bottleneck Dimension |
QLoRA | Applies LoRA to a 4-bit quantized base model, with additional trainable parameters for quantization constants. | 0.01% - 0.1% | Same as LoRA (Query & Value projections) | Rank (r) & 4-bit Block Size |
Frequently Asked Questions
In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters are the small, strategic subset of a model's total weights that are updated during adaptation. This FAQ addresses their role, calculation, and impact on model performance and deployment.
In Parameter-Efficient Fine-Tuning (PEFT), trainable parameters refer to the minimal subset of a neural network's total weights that are updated during the adaptation process, while the vast majority of the frozen backbone model remains static. These parameters constitute the delta weights—the small, learned change applied to the base model to specialize it for a new task or domain. Common examples include the weights within Adapter modules, the low-rank matrices in LoRA, or the continuous embeddings in Prompt Tuning. The core principle is to achieve performance comparable to full fine-tuning by training only 0.1% to 5% of the total parameters, drastically reducing computational cost and memory footprint.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Understanding trainable parameters requires familiarity with the architectural components they modify and the broader fine-tuning paradigms they enable.
Frozen Backbone
The frozen backbone is the large, pre-trained base model (e.g., BERT, ViT, GPT) whose original parameters are kept fixed during parameter-efficient fine-tuning. This is the foundation upon which PEFT operates, preserving the model's general knowledge and preventing catastrophic forgetting. Only the small set of newly introduced trainable parameters (like adapters or prompts) are updated, drastically reducing memory and compute requirements compared to full model fine-tuning.
Delta Weights
Delta weights (ΔW) refer to the small, learned parameter changes applied to a frozen pre-trained model during PEFT. They mathematically represent the task-specific adaptation. In methods like LoRA, these deltas are approximated by low-rank matrices. Key properties include:
- Additive Application: The adapted weights are computed as W + ΔW.
- Encapsulated Knowledge: The delta contains all information learned for the new task.
- Modularity: Deltas from different tasks can be stored, combined, or subtracted, enabling model merging and task arithmetic.
Injection Points
Injection points are the specific architectural locations within a neural network where parameter-efficient modules are inserted to introduce trainable parameters. Strategic placement is critical for performance. Common points in transformers include:
- After the Multi-Head Attention block.
- After the Feed-Forward Network block.
- Within attention key/value projections (for prefix tuning). The choice of injection point determines how the adapted signals flow through the frozen backbone and influences the module's impact on model behavior.
Bottleneck Dimension (Adapters)
In adapter-based methods, the bottleneck dimension is the size of the hidden layer within the adapter module. It is the primary hyperparameter controlling the adapter's capacity and number of trainable parameters. The adapter projects the input activation down to this dimension, applies a non-linearity, then projects back up. A smaller bottleneck increases efficiency but may reduce representational power. It is often set via a reduction factor (e.g., reducing a 768-dim activation to 96, a factor of 8).
Rank (LoRA)
In Low-Rank Adaptation (LoRA), the rank (r) is the intrinsic dimension of the low-rank matrices used to approximate the weight update ΔW = BA. It is the central hyperparameter governing the number of trainable parameters. A higher rank increases adaptability at the cost of more parameters. For a weight matrix of dimension d x k, the trainable parameter count is r*(d+k). Typical ranks are very low (e.g., 4, 8, 16), which is sufficient because weight updates during fine-tuning are hypothesized to have a low intrinsic rank.
Task Vectors
A task vector is the arithmetic difference between the weights of a fully fine-tuned model and its pre-trained base, τ = θ_fine-tuned - θ_base. In PEFT, the delta weights effectively form a sparse or low-rank task vector. This conceptualization enables powerful operations:
- Model Merging: Adding task vectors from multiple models: θ_merged = θ_base + ατ_A + βτ_B.
- Task Negation: Forgetting a skill by subtracting its vector.
- Linearity Exploration: Studying how model capabilities change along vector directions.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us