Inferensys

Glossary

Rehearsal-Based Methods

Rehearsal-Based Methods are continual learning techniques that mitigate catastrophic forgetting by storing and interleaving past data with new task data during sequential training.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
CONTINUAL LEARNING ON EDGE

What is Rehearsal-Based Methods?

A core family of techniques in continual learning designed to mitigate catastrophic forgetting by explicitly revisiting past data during sequential training.

Rehearsal-Based Methods are continual learning algorithms that retain a subset of real data from previous tasks—or generate synthetic equivalents—and interleave these stored examples with new task data during training. This process of rehearsal or experience replay forces the model to repeatedly practice old patterns, directly combating the stability-plasticity dilemma by balancing the retention of prior knowledge with the acquisition of new information. The stored data is typically managed in a fixed-size replay buffer.

These methods are foundational for Edge-CL, where models must learn sequentially on devices. Key challenges include buffer management strategies like reservoir sampling for efficient memory use and the development of generative replay to create synthetic data, avoiding storage of raw sensitive information. While highly effective, rehearsal can introduce memory overhead and potential bias based on which past data is selected for retention.

CONTINUAL LEARNING ON EDGE

Key Rehearsal Techniques

Rehearsal-based methods combat catastrophic forgetting by strategically retaining or generating data from past tasks. This card grid details the core techniques and buffer management strategies essential for implementing effective rehearsal on edge devices.

01

Experience Replay

Experience Replay is the foundational rehearsal technique where a subset of raw data from previous tasks is stored in a replay buffer and interleaved with new task data during training. This direct rehearsal of old examples provides strong constraints against forgetting.

  • Core Mechanism: The loss function becomes a weighted sum of the loss on new data and the loss on buffered old data.
  • Edge Consideration: Storing raw data can be memory-intensive, making efficient buffer management critical for edge deployment.
02

Generative Replay

Generative Replay (or Pseudo-Rehearsal) uses a separately trained generative model to produce synthetic data that mimics the distribution of past tasks. The main model rehearses on these generated samples instead of storing raw data.

  • Core Mechanism: A Generative Adversarial Network (GAN) or Variational Autoencoder (VAE) is trained on each task's data and retained to produce samples for future rehearsal.
  • Advantage: Drastically reduces memory footprint, as only model weights are stored, not data. This is highly relevant for edge-CL.
  • Challenge: Requires training and maintaining a generative model, adding computational overhead.
03

Gradient Episodic Memory (GEM)

Gradient Episodic Memory (GEM) is an optimization-centric rehearsal method. It stores past data in an episodic memory and uses it to constrain the gradient updates for new tasks, ensuring they do not increase the loss on old tasks.

  • Core Mechanism: Solves a quadratic programming problem to find a gradient direction for the new task that satisfies inequality constraints derived from the memory data.
  • Outcome: Provides a formal guarantee of non-negative backward transfer, meaning new learning does not harm old performance.
  • Use Case: Suitable for scenarios with strict performance guarantees on all previous tasks.
04

iCaRL

iCaRL (Incremental Classifier and Representation Learning) is a seminal algorithm for class-incremental learning. It combines rehearsal with a nearest-mean-of-exemplars classification rule and a distillation loss.

  • Core Components:
    • Bounded Replay Buffer: Stores a fixed number of exemplars per class using a herding selection algorithm.
    • Distillation Loss: Preserves knowledge in the network's representations.
    • Nearest-Mean Classification: At inference, a sample is classified based on the smallest distance to the mean feature vector of each class's exemplars.
  • Significance: Established a strong benchmark for learning new classes over time without task identity.
05

Buffer Management Strategies

Buffer Management is the algorithmic strategy for selecting and maintaining which data points are stored in a fixed-size replay buffer, directly impacting rehearsal quality and efficiency.

  • Reservoir Sampling: A probabilistic algorithm that maintains a uniformly random sample from an infinite or large data stream. It gives each incoming sample an equal probability of being included in the buffer.
  • Core-Set Selection: Aims to select a subset of data that best represents the entire dataset, often by solving a k-center or diversity-maximization problem. This maximizes coverage of the data distribution with minimal samples.
  • Ring Buffer: A simple FIFO (First-In-First-Out) buffer that overwrites the oldest entry when full. It's computationally cheap but may not retain the most informative samples.
06

Learning without Forgetting (LwF)

Learning without Forgetting (LwF) is a rehearsal-adjacent method that uses knowledge distillation instead of storing raw data. It encourages the model's new parameters to produce similar outputs (soft labels) for new data as the old parameters did.

  • Core Mechanism: A distillation loss term is added to the new task loss, measuring the KL divergence between the current and old model's output distributions for the new task data.
  • Advantage: Zero memory overhead for past data, as only the previous model's parameters are needed.
  • Limitation: Performance is highly dependent on the similarity between new and old tasks; it can struggle with disjoint tasks. Often used in combination with a small replay buffer for robustness.
METHOD COMPARISON

Rehearsal vs. Other Continual Learning Methods

A comparison of core continual learning strategies based on their mechanisms, resource requirements, and suitability for edge deployment.

Feature / MechanismRehearsal-Based MethodsRegularization-Based MethodsArchitectural Methods

Core Mechanism

Interleaves stored/generated past data with new data

Adds penalty to loss to constrain parameter updates

Dynamically expands or isolates network parameters per task

Mitigates Catastrophic Forgetting By

Direct rehearsal of old task data

Slowing learning on important old parameters

Allocating dedicated, non-overlapping capacity

Requires Storing Raw Past Data

Memory Overhead (Typical)

Buffer size (e.g., 1-5% per task)

~0% (stores only importance weights)

Grows with number of tasks (parameters)

Computational Overhead

Moderate (trains on mixed data)

Low (adds loss term)

High (task-specific forward/backward paths)

Suitability for Online/Streaming Data

Yes (with buffer management)

Yes

Limited (requires task boundary)

Forward Transfer Potential

Medium (shared representation via data)

High (shared, constrained representation)

Low (parameters often isolated)

Primary Edge Deployment Challenge

Buffer storage & data privacy

Importance estimation stability

Unbounded parameter growth on device

Example Algorithms

Experience Replay, GEM, iCaRL

EWC, Synaptic Intelligence, LwF

Progressive Nets, HAT, PackNet

REHEARSAL-BASED METHODS

Challenges for Edge Deployment

While rehearsal-based methods are powerful for mitigating catastrophic forgetting, their practical application on edge devices introduces unique constraints related to memory, compute, and data privacy.

01

Memory and Storage Constraints

Edge devices have severely limited RAM and persistent storage. Maintaining a replay buffer of raw data or generated samples directly competes with the memory needed for the model itself and its runtime operations.

  • Buffer Size vs. Performance: A small buffer may not adequately represent past tasks, leading to forgetting. A large buffer is infeasible.
  • Data Representation: Storing compressed representations or embeddings instead of raw data is a common trade-off, but this adds complexity to the training pipeline.
  • Example: A smart camera with 512MB RAM cannot store thousands of high-resolution images for rehearsal while also running a vision model.
02

Computational Overhead

The core rehearsal operation—interleaving old and new data during training—significantly increases computational cost, which is prohibitive for on-device learning.

  • Training Cost: Each training step involves forward/backward passes on mixed batches, multiplying the compute required compared to standard inference.
  • Energy Drain: Continuous computation rapidly depletes battery-powered devices.
  • Latency Impact: The device cannot perform its primary function (e.g., object detection) if compute is monopolized by rehearsal training. Techniques like sparse rehearsal or performing updates only during idle periods are necessary.
03

Buffer Management Complexity

Selecting what to store in the fixed-size replay buffer is a critical algorithmic challenge that must operate efficiently on-device.

  • Core-Set Selection: Finding a minimal subset that best represents the data distribution is computationally expensive (e.g., requires solving a k-center problem).
  • Online Strategies: Algorithms like reservoir sampling provide a uniform random sample but do not optimize for representativeness.
  • Dynamic Prioritization: Should the buffer prioritize difficult, rare, or recent examples? Implementing these heuristics adds logic and overhead.
04

Data Privacy and Security

Rehearsal inherently involves retaining user data. On edge devices, this creates significant privacy risks and compliance challenges.

  • Local Data Persistence: Sensitive data (e.g., personal photos, location traces) stored in the buffer is vulnerable if the device is compromised.
  • Regulatory Compliance: Regulations like GDPR may treat the replay buffer as a data store, requiring mechanisms for right-to-erasure.
  • Mitigations: Using generative replay (synthetic data) or federated continual learning where only model updates are shared can reduce privacy exposure, but they introduce other technical hurdles.
05

Catastrophic Forgetting Under Resource Limits

The very problem rehearsal aims to solve—catastrophic forgetting—can be exacerbated by the stringent limits of edge deployment.

  • Insufficient Rehearsal: Due to memory limits, the model may only rehearse on a tiny, non-representative subset of past data, leading to partial forgetting.
  • Interference from Non-IID Data: Edge data streams are highly non-independent and identically distributed (Non-IID). Rapidly shifting data distributions can overwhelm a small buffer's ability to stabilize learning.
  • Example: A wearable health monitor learning new user patterns might forget older, seasonal patterns because the buffer cannot hold enough long-term history.
06

Integration with System Workflows

Deploying a rehearsal-based continual learning system requires co-design with the device's operating system, scheduler, and other applications.

  • Resource Arbitration: The system must manage conflicts when the model needs compute for rehearsal versus other processes.
  • Checkpointing and Recovery: Model and buffer state must be reliably saved to persistent storage to survive reboots or crashes, adding I/O complexity.
  • Lifecycle Management: Over-the-air updates must handle versioning for both the model architecture and the structure/content of the replay buffer.
REHEARSAL-BASED METHODS

Frequently Asked Questions

Rehearsal-based methods are a core family of techniques in continual learning designed to prevent catastrophic forgetting. They work by retaining or generating data from past tasks and 'rehearsing' it during training on new tasks. This section answers common technical questions about their implementation, trade-offs, and application on edge devices.

A rehearsal-based method is a continual learning technique that mitigates catastrophic forgetting by retaining a subset of data from previous tasks (or generating synthetic equivalents) and interleaving these 'rehearsal' samples with new task data during training. This process forces the model to jointly optimize for both new knowledge and the retention of old knowledge. The retained data is typically stored in a fixed-size replay buffer. By rehearsing past experiences, the model maintains a more stable representation, balancing the stability-plasticity dilemma inherent to sequential learning.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.