Inferensys

Glossary

Model Merging (PEFT)

Model merging in PEFT is the process of combining delta weights or task vectors from multiple fine-tuned models into a single model to achieve multi-task capabilities or improved generalization.
Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.
PARAMETER-EFFICIENT FINE-TUNING

What is Model Merging (PEFT)?

Model merging in Parameter-Efficient Fine-Tuning (PEFT) is the process of combining the learned parameter changes (delta weights) from multiple independently fine-tuned models into a single, unified model to achieve multi-task capabilities or enhanced generalization.

In PEFT, each specialized model is created by training a small set of parameters—like Low-Rank Adaptation (LoRA) matrices or adapter modules—on top of a frozen base model. The resulting task vectors (the arithmetic difference between the fine-tuned and base weights) encode distinct capabilities. Model merging performs arithmetic operations, such as linear interpolation or task arithmetic, on these vectors to combine their knowledge into one model without catastrophic interference, enabling a single model to perform multiple tasks efficiently.

This technique is foundational for building multi-task models and improving cross-task generalization without the prohibitive cost of training separate full models. It leverages the modular nature of PEFT methods, where delta weights are often additive and disentangled, allowing for safe combination. The merged model retains the efficiency of the original PEFT approach, requiring only the storage and inference of the single, consolidated set of delta parameters alongside the original frozen backbone.

PEFT

Core Mechanisms of Model Merging

Model merging in PEFT is the process of combining the delta weights or task vectors from multiple independently fine-tuned models into a single model to achieve multi-task capabilities or improved generalization.

01

Task Vector Arithmetic

The foundational operation for model merging. A task vector is calculated as the arithmetic difference between a fine-tuned model's weights and the original pre-trained base model's weights (Δ = W_finetuned - W_base). Merging involves performing linear operations on these vectors.

  • Averaging: Combining vectors from similar tasks (Δ_merged = (Δ_A + Δ_B) / 2) to improve robustness.
  • Interpolation: Creating a weighted sum (Δ_merged = α * Δ_A + (1-α) * Δ_B) to balance task performance.
  • Negation: Subtracting a vector (W_new = W_base - Δ) to potentially remove undesired behaviors or "unlearn" a task.
02

TIES-Merging (TrIm, Elect Sign & Merge)

A state-of-the-art method that addresses interference from conflicting parameter signs across different task vectors. It performs three key steps:

  • TrIm: Retains only the top-k% most significant parameters in each task vector, sparsifying the updates.
  • Elect Sign: For each parameter, resolves sign conflicts by electing the majority sign across all vectors.
  • Disjoint Merge: Averages only the parameter values that agree with the elected sign, reducing destructive interference.

This method enables the stable merging of a large number of diverse models, significantly outperforming simple averaging.

03

DARE (Drop And REscale)

A technique designed to merge models fine-tuned with Low-Rank Adaptation (LoRA). It addresses the redundancy and overlap in LoRA delta weights.

  • Random Drop: A large percentage (e.g., 90%) of delta weights are randomly set to zero.
  • Rescaling: The remaining non-zero weights are rescaled (e.g., by 10x) to preserve the norm of the original delta.
  • Merging: The sparsified and rescaled deltas are then averaged.

DARE allows for the lossless merging of dozens of LoRA-tuned models without performance degradation, as the dropped parameters are largely redundant.

04

Slerp (Spherical Linear Interpolation)

An interpolation technique used when merging models, preferred over linear interpolation for certain parameter spaces. It interpolates along the geodesic (shortest path) on a hypersphere, treating weight sets as vectors.

  • Use Case: Particularly effective for merging models whose fine-tuned weights have similar magnitudes but different directions in the high-dimensional parameter space.
  • Process: Given two model weight vectors A and B, Slerp interpolates at angle θ, providing a smoother and more natural transition between model behaviors than linear interpolation (Lerp).
  • Application: Commonly used in merging diffusion models or foundational LLMs to create balanced blends of capabilities.
05

Model Soups & Gradient Souping

Methods for creating a unified model from multiple fine-tuned checkpoints.

  • Uniform Soup: The simplest form, averaging the weights of multiple models fine-tuned from the same base with different hyperparameters or data orders.
  • Greedy Soup: Iteratively adds a model to the soup only if it improves validation performance on a target task.
  • Gradient Souping: An advanced technique that merges models by approximating the task vectors that would result from fine-tuning on a mixture of all source tasks simultaneously. It computes a weighted average of gradients from each task to construct a more coherent merged model.
06

Reg-Merge (Regression-Based Merge)

A data-driven merging approach that frames merging as a regression problem. Instead of purely geometric operations on weights, it uses a small calibration dataset to learn the optimal linear combination of multiple model outputs.

  • Process: A lightweight regression layer (e.g., linear) is trained to combine the logits or hidden states of several frozen, task-specific models.
  • Advantage: Directly optimizes for performance on the target mixture of skills, often yielding better results than weight-space arithmetic.
  • PEFT Context: Highly compatible with merged PEFT modules, where the regression layer learns to weight the contributions of different adapters or LoRA modules.
PEFT

How Does Model Merging Work?

Model merging in Parameter-Efficient Fine-Tuning (PEFT) is a technique for combining multiple specialized adaptations into a single, more capable model without retraining from scratch.

Model merging is the process of arithmetically combining the delta weights or task vectors from multiple independently fine-tuned models into a unified parameter set. Each task vector represents the learned change from a base pre-trained model to a model adapted for a specific task. By strategically merging these vectors—through simple averaging, weighted summation, or more advanced linear arithmetic—a single model can acquire multi-task capabilities or improved generalization, all while preserving the efficiency gains of PEFT methods like LoRA or adapters.

The technique relies on the linear mode connectivity hypothesis, which posits that fine-tuned models often reside in linearly connected low-error basins within the loss landscape. This allows their weight spaces to be combined. Common merging algorithms include Task Arithmetic, which adds weighted task vectors to the base model, and Fisher Merging, which weights contributions by parameter importance. The result is a consolidated model that performs well across the source tasks, enabling efficient multi-task inference from a single checkpoint.

MODEL MERGING (PEFT)

Primary Use Cases & Applications

Model merging leverages the compact delta weights from PEFT to combine multiple specialized models into a single, more capable system. This enables efficient multi-task learning, improved generalization, and the creation of foundational multi-purpose models.

COMPARISON

Model Merging vs. Alternative Multi-Task Approaches

This table compares the core characteristics of the Model Merging paradigm against other established methods for building multi-task capable models.

Feature / MetricModel Merging (PEFT)Multi-Task Learning (MTL)Mixture-of-Experts (MoE)Single Multi-Task Model

Core Paradigm

Arithmetic combination of task-specific delta weights

Joint training on multiple tasks with a shared backbone

Sparse activation of specialized expert sub-networks

Full fine-tuning on a blended multi-task dataset

Parameter Efficiency

Preserves Base Model Knowledge

Training Compute Overhead

Low (independent fine-tuning)

High (joint optimization)

Very High (expert routing + training)

High (full fine-tuning)

Task Addition / Removal

Modular; additive or subtractive

Requires retraining or complex continual learning

Requires expert addition/retraining

Requires full retraining from base

Inference Cost

Same as base model

Same as base model

~2-4x base model (active params)

Same as base model

Risk of Task Interference

Very Low (post-hoc merging)

High (gradient competition)

Low (experts are specialized)

High (single set of weights)

Typical Use Case

Combining 3-10 specialized adapters (e.g., code, math, chat)

Training a model on closely related tasks (e.g., NER, POS, Chunking)

Extremely large-scale models with 1000s of tasks/capabilities

Domain-specific model for 2-3 tightly coupled tasks

MODEL MERGING

Frequently Asked Questions

Model merging is a core technique in Parameter-Efficient Fine-Tuning (PEFT) that enables the creation of multi-capability models by combining specialized adaptations. This FAQ addresses key technical questions about its mechanisms, applications, and implementation.

Model merging in PEFT is the process of arithmetically combining the delta weights or task vectors from multiple independently fine-tuned models into a single unified model. It works by first fine-tuning a shared frozen backbone model on different tasks using a PEFT method like LoRA or adapters, which produces a small set of task-specific parameters. The core operation is a weighted summation: Merged_Weights = Base_Weights + α * Task_Vector_A + β * Task_Vector_B, where α and β are scaling coefficients. This creates a model that can perform multiple tasks without the catastrophic interference typical of sequential fine-tuning, as the majority of the base model's knowledge remains intact and stable.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.