Glossary

Rehearsal-Based Methods

Rehearsal-Based Methods are continual learning techniques that mitigate catastrophic forgetting by storing and interleaving past data with new task data during sequential training.

Get in touch Learn more

Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.

CONTINUAL LEARNING ON EDGE

What is Rehearsal-Based Methods?

A core family of techniques in continual learning designed to mitigate catastrophic forgetting by explicitly revisiting past data during sequential training.

Rehearsal-Based Methods are continual learning algorithms that retain a subset of real data from previous tasks—or generate synthetic equivalents—and interleave these stored examples with new task data during training. This process of rehearsal or experience replay forces the model to repeatedly practice old patterns, directly combating the stability-plasticity dilemma by balancing the retention of prior knowledge with the acquisition of new information. The stored data is typically managed in a fixed-size replay buffer.

These methods are foundational for Edge-CL, where models must learn sequentially on devices. Key challenges include buffer management strategies like reservoir sampling for efficient memory use and the development of generative replay to create synthetic data, avoiding storage of raw sensitive information. While highly effective, rehearsal can introduce memory overhead and potential bias based on which past data is selected for retention.

CONTINUAL LEARNING ON EDGE

Key Rehearsal Techniques

Rehearsal-based methods combat catastrophic forgetting by strategically retaining or generating data from past tasks. This card grid details the core techniques and buffer management strategies essential for implementing effective rehearsal on edge devices.

Experience Replay

Experience Replay is the foundational rehearsal technique where a subset of raw data from previous tasks is stored in a replay buffer and interleaved with new task data during training. This direct rehearsal of old examples provides strong constraints against forgetting.

Core Mechanism: The loss function becomes a weighted sum of the loss on new data and the loss on buffered old data.
Edge Consideration: Storing raw data can be memory-intensive, making efficient buffer management critical for edge deployment.

Generative Replay

Generative Replay (or Pseudo-Rehearsal) uses a separately trained generative model to produce synthetic data that mimics the distribution of past tasks. The main model rehearses on these generated samples instead of storing raw data.

Core Mechanism: A Generative Adversarial Network (GAN) or Variational Autoencoder (VAE) is trained on each task's data and retained to produce samples for future rehearsal.
Advantage: Drastically reduces memory footprint, as only model weights are stored, not data. This is highly relevant for edge-CL.
Challenge: Requires training and maintaining a generative model, adding computational overhead.

Gradient Episodic Memory (GEM)

Gradient Episodic Memory (GEM) is an optimization-centric rehearsal method. It stores past data in an episodic memory and uses it to constrain the gradient updates for new tasks, ensuring they do not increase the loss on old tasks.

Core Mechanism: Solves a quadratic programming problem to find a gradient direction for the new task that satisfies inequality constraints derived from the memory data.
Outcome: Provides a formal guarantee of non-negative backward transfer, meaning new learning does not harm old performance.
Use Case: Suitable for scenarios with strict performance guarantees on all previous tasks.

iCaRL

iCaRL (Incremental Classifier and Representation Learning) is a seminal algorithm for class-incremental learning. It combines rehearsal with a nearest-mean-of-exemplars classification rule and a distillation loss.

Core Components:
- Bounded Replay Buffer: Stores a fixed number of exemplars per class using a herding selection algorithm.
- Distillation Loss: Preserves knowledge in the network's representations.
- Nearest-Mean Classification: At inference, a sample is classified based on the smallest distance to the mean feature vector of each class's exemplars.
Significance: Established a strong benchmark for learning new classes over time without task identity.

Buffer Management Strategies

Buffer Management is the algorithmic strategy for selecting and maintaining which data points are stored in a fixed-size replay buffer, directly impacting rehearsal quality and efficiency.

Reservoir Sampling: A probabilistic algorithm that maintains a uniformly random sample from an infinite or large data stream. It gives each incoming sample an equal probability of being included in the buffer.
Core-Set Selection: Aims to select a subset of data that best represents the entire dataset, often by solving a k-center or diversity-maximization problem. This maximizes coverage of the data distribution with minimal samples.
Ring Buffer: A simple FIFO (First-In-First-Out) buffer that overwrites the oldest entry when full. It's computationally cheap but may not retain the most informative samples.

Learning without Forgetting (LwF)

Learning without Forgetting (LwF) is a rehearsal-adjacent method that uses knowledge distillation instead of storing raw data. It encourages the model's new parameters to produce similar outputs (soft labels) for new data as the old parameters did.

Core Mechanism: A distillation loss term is added to the new task loss, measuring the KL divergence between the current and old model's output distributions for the new task data.
Advantage: Zero memory overhead for past data, as only the previous model's parameters are needed.
Limitation: Performance is highly dependent on the similarity between new and old tasks; it can struggle with disjoint tasks. Often used in combination with a small replay buffer for robustness.

METHOD COMPARISON

Rehearsal vs. Other Continual Learning Methods

A comparison of core continual learning strategies based on their mechanisms, resource requirements, and suitability for edge deployment.

Feature / Mechanism	Rehearsal-Based Methods	Regularization-Based Methods	Architectural Methods
Core Mechanism	Interleaves stored/generated past data with new data	Adds penalty to loss to constrain parameter updates	Dynamically expands or isolates network parameters per task
Mitigates Catastrophic Forgetting By	Direct rehearsal of old task data	Slowing learning on important old parameters	Allocating dedicated, non-overlapping capacity
Requires Storing Raw Past Data
Memory Overhead (Typical)	Buffer size (e.g., 1-5% per task)	~0% (stores only importance weights)	Grows with number of tasks (parameters)
Computational Overhead	Moderate (trains on mixed data)	Low (adds loss term)	High (task-specific forward/backward paths)
Suitability for Online/Streaming Data	Yes (with buffer management)	Yes	Limited (requires task boundary)
Forward Transfer Potential	Medium (shared representation via data)	High (shared, constrained representation)	Low (parameters often isolated)
Primary Edge Deployment Challenge	Buffer storage & data privacy	Importance estimation stability	Unbounded parameter growth on device
Example Algorithms	Experience Replay, GEM, iCaRL	EWC, Synaptic Intelligence, LwF	Progressive Nets, HAT, PackNet

REHEARSAL-BASED METHODS

Challenges for Edge Deployment

While rehearsal-based methods are powerful for mitigating catastrophic forgetting, their practical application on edge devices introduces unique constraints related to memory, compute, and data privacy.

Memory and Storage Constraints

Edge devices have severely limited RAM and persistent storage. Maintaining a replay buffer of raw data or generated samples directly competes with the memory needed for the model itself and its runtime operations.

Buffer Size vs. Performance: A small buffer may not adequately represent past tasks, leading to forgetting. A large buffer is infeasible.
Data Representation: Storing compressed representations or embeddings instead of raw data is a common trade-off, but this adds complexity to the training pipeline.
Example: A smart camera with 512MB RAM cannot store thousands of high-resolution images for rehearsal while also running a vision model.

Computational Overhead

The core rehearsal operation—interleaving old and new data during training—significantly increases computational cost, which is prohibitive for on-device learning.

Training Cost: Each training step involves forward/backward passes on mixed batches, multiplying the compute required compared to standard inference.
Energy Drain: Continuous computation rapidly depletes battery-powered devices.
Latency Impact: The device cannot perform its primary function (e.g., object detection) if compute is monopolized by rehearsal training. Techniques like sparse rehearsal or performing updates only during idle periods are necessary.

Buffer Management Complexity

Selecting what to store in the fixed-size replay buffer is a critical algorithmic challenge that must operate efficiently on-device.

Core-Set Selection: Finding a minimal subset that best represents the data distribution is computationally expensive (e.g., requires solving a k-center problem).
Online Strategies: Algorithms like reservoir sampling provide a uniform random sample but do not optimize for representativeness.
Dynamic Prioritization: Should the buffer prioritize difficult, rare, or recent examples? Implementing these heuristics adds logic and overhead.

Data Privacy and Security

Rehearsal inherently involves retaining user data. On edge devices, this creates significant privacy risks and compliance challenges.

Local Data Persistence: Sensitive data (e.g., personal photos, location traces) stored in the buffer is vulnerable if the device is compromised.
Regulatory Compliance: Regulations like GDPR may treat the replay buffer as a data store, requiring mechanisms for right-to-erasure.
Mitigations: Using generative replay (synthetic data) or federated continual learning where only model updates are shared can reduce privacy exposure, but they introduce other technical hurdles.

Catastrophic Forgetting Under Resource Limits

The very problem rehearsal aims to solve—catastrophic forgetting—can be exacerbated by the stringent limits of edge deployment.

Insufficient Rehearsal: Due to memory limits, the model may only rehearse on a tiny, non-representative subset of past data, leading to partial forgetting.
Interference from Non-IID Data: Edge data streams are highly non-independent and identically distributed (Non-IID). Rapidly shifting data distributions can overwhelm a small buffer's ability to stabilize learning.
Example: A wearable health monitor learning new user patterns might forget older, seasonal patterns because the buffer cannot hold enough long-term history.

Integration with System Workflows

Deploying a rehearsal-based continual learning system requires co-design with the device's operating system, scheduler, and other applications.

Resource Arbitration: The system must manage conflicts when the model needs compute for rehearsal versus other processes.
Checkpointing and Recovery: Model and buffer state must be reliably saved to persistent storage to survive reboots or crashes, adding I/O complexity.
Lifecycle Management: Over-the-air updates must handle versioning for both the model architecture and the structure/content of the replay buffer.

REHEARSAL-BASED METHODS

Frequently Asked Questions

Rehearsal-based methods are a core family of techniques in continual learning designed to prevent catastrophic forgetting. They work by retaining or generating data from past tasks and 'rehearsing' it during training on new tasks. This section answers common technical questions about their implementation, trade-offs, and application on edge devices.

A rehearsal-based method is a continual learning technique that mitigates catastrophic forgetting by retaining a subset of data from previous tasks (or generating synthetic equivalents) and interleaving these 'rehearsal' samples with new task data during training. This process forces the model to jointly optimize for both new knowledge and the retention of old knowledge. The retained data is typically stored in a fixed-size replay buffer. By rehearsing past experiences, the model maintains a more stable representation, balancing the stability-plasticity dilemma inherent to sequential learning.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

CONTINUAL LEARNING ON EDGE

Related Terms

Rehearsal-based methods are a core strategy for mitigating catastrophic forgetting in continual learning. The following terms define the specific techniques, scenarios, and metrics that contextualize this approach.

Experience Replay

A foundational rehearsal technique where a subset of past training data or their learned representations are stored in a replay buffer and interleaved with new task data during training. This forces the model to rehearse old tasks concurrently with learning new ones. It is the most direct implementation of rehearsal, often using strategies like reservoir sampling for buffer management.

Generative Replay

A rehearsal variant that uses a generative model (e.g., a Generative Adversarial Network or Variational Autoencoder) trained on past data to produce synthetic samples (pseudo-rehearsal). These generated examples mimic old experiences, allowing the main model to rehearse without storing raw data. This is critical for privacy-preserving or memory-constrained edge scenarios where storing real data is infeasible.

Replay Buffer

The fixed or dynamic memory storage component used in experience replay. Effective continual learning depends on buffer management strategies to select which past examples to retain. Common approaches include:

Reservoir Sampling: Maintains a uniform random sample from a stream.
Core-Set Selection: Chooses a representative subset that approximates the full data distribution.
Herding: Selects prototypes closest to the class mean.

Catastrophic Forgetting

The core problem rehearsal methods aim to solve. It is the phenomenon where a neural network abruptly and drastically loses performance on previously learned tasks when trained on new data. This occurs due to unconstrained parameter overwriting. Rehearsal directly combats this by providing interleaved data that regularizes the loss landscape for old tasks.

Stability-Plasticity Dilemma

The fundamental trade-off in all continual learning. Stability refers to a model's ability to retain old knowledge (resisting forgetting). Plasticity is its capacity to learn new information efficiently. Rehearsal-based methods explicitly manage this trade-off: the replay of old data promotes stability, while training on new data maintains plasticity. The buffer size is a direct knob for tuning this balance.

Online Continual Learning

A strict, realistic variant of continual learning where the model receives a single, non-repeating pass through a stream of data, often one sample or a small batch at a time. This imposes severe constraints on rehearsal, as the model cannot loop over old data. Efficient buffer management and sample-efficient rehearsal strategies are paramount for success in this challenging scenario, especially on edge devices.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Rehearsal-Based Methods

What is Rehearsal-Based Methods?

Key Rehearsal Techniques

Experience Replay

Generative Replay

Gradient Episodic Memory (GEM)

iCaRL

Buffer Management Strategies

Learning without Forgetting (LwF)

Rehearsal vs. Other Continual Learning Methods

Challenges for Edge Deployment

Memory and Storage Constraints

Computational Overhead

Buffer Management Complexity

Data Privacy and Security

Catastrophic Forgetting Under Resource Limits

Integration with System Workflows

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there