Rehearsal-Based Methods are continual learning algorithms that retain a subset of real data from previous tasks—or generate synthetic equivalents—and interleave these stored examples with new task data during training. This process of rehearsal or experience replay forces the model to repeatedly practice old patterns, directly combating the stability-plasticity dilemma by balancing the retention of prior knowledge with the acquisition of new information. The stored data is typically managed in a fixed-size replay buffer.
Glossary
Rehearsal-Based Methods

What is Rehearsal-Based Methods?
A core family of techniques in continual learning designed to mitigate catastrophic forgetting by explicitly revisiting past data during sequential training.
These methods are foundational for Edge-CL, where models must learn sequentially on devices. Key challenges include buffer management strategies like reservoir sampling for efficient memory use and the development of generative replay to create synthetic data, avoiding storage of raw sensitive information. While highly effective, rehearsal can introduce memory overhead and potential bias based on which past data is selected for retention.
Key Rehearsal Techniques
Rehearsal-based methods combat catastrophic forgetting by strategically retaining or generating data from past tasks. This card grid details the core techniques and buffer management strategies essential for implementing effective rehearsal on edge devices.
Experience Replay
Experience Replay is the foundational rehearsal technique where a subset of raw data from previous tasks is stored in a replay buffer and interleaved with new task data during training. This direct rehearsal of old examples provides strong constraints against forgetting.
- Core Mechanism: The loss function becomes a weighted sum of the loss on new data and the loss on buffered old data.
- Edge Consideration: Storing raw data can be memory-intensive, making efficient buffer management critical for edge deployment.
Generative Replay
Generative Replay (or Pseudo-Rehearsal) uses a separately trained generative model to produce synthetic data that mimics the distribution of past tasks. The main model rehearses on these generated samples instead of storing raw data.
- Core Mechanism: A Generative Adversarial Network (GAN) or Variational Autoencoder (VAE) is trained on each task's data and retained to produce samples for future rehearsal.
- Advantage: Drastically reduces memory footprint, as only model weights are stored, not data. This is highly relevant for edge-CL.
- Challenge: Requires training and maintaining a generative model, adding computational overhead.
Gradient Episodic Memory (GEM)
Gradient Episodic Memory (GEM) is an optimization-centric rehearsal method. It stores past data in an episodic memory and uses it to constrain the gradient updates for new tasks, ensuring they do not increase the loss on old tasks.
- Core Mechanism: Solves a quadratic programming problem to find a gradient direction for the new task that satisfies inequality constraints derived from the memory data.
- Outcome: Provides a formal guarantee of non-negative backward transfer, meaning new learning does not harm old performance.
- Use Case: Suitable for scenarios with strict performance guarantees on all previous tasks.
iCaRL
iCaRL (Incremental Classifier and Representation Learning) is a seminal algorithm for class-incremental learning. It combines rehearsal with a nearest-mean-of-exemplars classification rule and a distillation loss.
- Core Components:
- Bounded Replay Buffer: Stores a fixed number of exemplars per class using a herding selection algorithm.
- Distillation Loss: Preserves knowledge in the network's representations.
- Nearest-Mean Classification: At inference, a sample is classified based on the smallest distance to the mean feature vector of each class's exemplars.
- Significance: Established a strong benchmark for learning new classes over time without task identity.
Buffer Management Strategies
Buffer Management is the algorithmic strategy for selecting and maintaining which data points are stored in a fixed-size replay buffer, directly impacting rehearsal quality and efficiency.
- Reservoir Sampling: A probabilistic algorithm that maintains a uniformly random sample from an infinite or large data stream. It gives each incoming sample an equal probability of being included in the buffer.
- Core-Set Selection: Aims to select a subset of data that best represents the entire dataset, often by solving a k-center or diversity-maximization problem. This maximizes coverage of the data distribution with minimal samples.
- Ring Buffer: A simple FIFO (First-In-First-Out) buffer that overwrites the oldest entry when full. It's computationally cheap but may not retain the most informative samples.
Learning without Forgetting (LwF)
Learning without Forgetting (LwF) is a rehearsal-adjacent method that uses knowledge distillation instead of storing raw data. It encourages the model's new parameters to produce similar outputs (soft labels) for new data as the old parameters did.
- Core Mechanism: A distillation loss term is added to the new task loss, measuring the KL divergence between the current and old model's output distributions for the new task data.
- Advantage: Zero memory overhead for past data, as only the previous model's parameters are needed.
- Limitation: Performance is highly dependent on the similarity between new and old tasks; it can struggle with disjoint tasks. Often used in combination with a small replay buffer for robustness.
Rehearsal vs. Other Continual Learning Methods
A comparison of core continual learning strategies based on their mechanisms, resource requirements, and suitability for edge deployment.
| Feature / Mechanism | Rehearsal-Based Methods | Regularization-Based Methods | Architectural Methods |
|---|---|---|---|
Core Mechanism | Interleaves stored/generated past data with new data | Adds penalty to loss to constrain parameter updates | Dynamically expands or isolates network parameters per task |
Mitigates Catastrophic Forgetting By | Direct rehearsal of old task data | Slowing learning on important old parameters | Allocating dedicated, non-overlapping capacity |
Requires Storing Raw Past Data | |||
Memory Overhead (Typical) | Buffer size (e.g., 1-5% per task) | ~0% (stores only importance weights) | Grows with number of tasks (parameters) |
Computational Overhead | Moderate (trains on mixed data) | Low (adds loss term) | High (task-specific forward/backward paths) |
Suitability for Online/Streaming Data | Yes (with buffer management) | Yes | Limited (requires task boundary) |
Forward Transfer Potential | Medium (shared representation via data) | High (shared, constrained representation) | Low (parameters often isolated) |
Primary Edge Deployment Challenge | Buffer storage & data privacy | Importance estimation stability | Unbounded parameter growth on device |
Example Algorithms | Experience Replay, GEM, iCaRL | EWC, Synaptic Intelligence, LwF | Progressive Nets, HAT, PackNet |
Challenges for Edge Deployment
While rehearsal-based methods are powerful for mitigating catastrophic forgetting, their practical application on edge devices introduces unique constraints related to memory, compute, and data privacy.
Memory and Storage Constraints
Edge devices have severely limited RAM and persistent storage. Maintaining a replay buffer of raw data or generated samples directly competes with the memory needed for the model itself and its runtime operations.
- Buffer Size vs. Performance: A small buffer may not adequately represent past tasks, leading to forgetting. A large buffer is infeasible.
- Data Representation: Storing compressed representations or embeddings instead of raw data is a common trade-off, but this adds complexity to the training pipeline.
- Example: A smart camera with 512MB RAM cannot store thousands of high-resolution images for rehearsal while also running a vision model.
Computational Overhead
The core rehearsal operation—interleaving old and new data during training—significantly increases computational cost, which is prohibitive for on-device learning.
- Training Cost: Each training step involves forward/backward passes on mixed batches, multiplying the compute required compared to standard inference.
- Energy Drain: Continuous computation rapidly depletes battery-powered devices.
- Latency Impact: The device cannot perform its primary function (e.g., object detection) if compute is monopolized by rehearsal training. Techniques like sparse rehearsal or performing updates only during idle periods are necessary.
Buffer Management Complexity
Selecting what to store in the fixed-size replay buffer is a critical algorithmic challenge that must operate efficiently on-device.
- Core-Set Selection: Finding a minimal subset that best represents the data distribution is computationally expensive (e.g., requires solving a k-center problem).
- Online Strategies: Algorithms like reservoir sampling provide a uniform random sample but do not optimize for representativeness.
- Dynamic Prioritization: Should the buffer prioritize difficult, rare, or recent examples? Implementing these heuristics adds logic and overhead.
Data Privacy and Security
Rehearsal inherently involves retaining user data. On edge devices, this creates significant privacy risks and compliance challenges.
- Local Data Persistence: Sensitive data (e.g., personal photos, location traces) stored in the buffer is vulnerable if the device is compromised.
- Regulatory Compliance: Regulations like GDPR may treat the replay buffer as a data store, requiring mechanisms for right-to-erasure.
- Mitigations: Using generative replay (synthetic data) or federated continual learning where only model updates are shared can reduce privacy exposure, but they introduce other technical hurdles.
Catastrophic Forgetting Under Resource Limits
The very problem rehearsal aims to solve—catastrophic forgetting—can be exacerbated by the stringent limits of edge deployment.
- Insufficient Rehearsal: Due to memory limits, the model may only rehearse on a tiny, non-representative subset of past data, leading to partial forgetting.
- Interference from Non-IID Data: Edge data streams are highly non-independent and identically distributed (Non-IID). Rapidly shifting data distributions can overwhelm a small buffer's ability to stabilize learning.
- Example: A wearable health monitor learning new user patterns might forget older, seasonal patterns because the buffer cannot hold enough long-term history.
Integration with System Workflows
Deploying a rehearsal-based continual learning system requires co-design with the device's operating system, scheduler, and other applications.
- Resource Arbitration: The system must manage conflicts when the model needs compute for rehearsal versus other processes.
- Checkpointing and Recovery: Model and buffer state must be reliably saved to persistent storage to survive reboots or crashes, adding I/O complexity.
- Lifecycle Management: Over-the-air updates must handle versioning for both the model architecture and the structure/content of the replay buffer.
Frequently Asked Questions
Rehearsal-based methods are a core family of techniques in continual learning designed to prevent catastrophic forgetting. They work by retaining or generating data from past tasks and 'rehearsing' it during training on new tasks. This section answers common technical questions about their implementation, trade-offs, and application on edge devices.
A rehearsal-based method is a continual learning technique that mitigates catastrophic forgetting by retaining a subset of data from previous tasks (or generating synthetic equivalents) and interleaving these 'rehearsal' samples with new task data during training. This process forces the model to jointly optimize for both new knowledge and the retention of old knowledge. The retained data is typically stored in a fixed-size replay buffer. By rehearsing past experiences, the model maintains a more stable representation, balancing the stability-plasticity dilemma inherent to sequential learning.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Rehearsal-based methods are a core strategy for mitigating catastrophic forgetting in continual learning. The following terms define the specific techniques, scenarios, and metrics that contextualize this approach.
Experience Replay
A foundational rehearsal technique where a subset of past training data or their learned representations are stored in a replay buffer and interleaved with new task data during training. This forces the model to rehearse old tasks concurrently with learning new ones. It is the most direct implementation of rehearsal, often using strategies like reservoir sampling for buffer management.
Generative Replay
A rehearsal variant that uses a generative model (e.g., a Generative Adversarial Network or Variational Autoencoder) trained on past data to produce synthetic samples (pseudo-rehearsal). These generated examples mimic old experiences, allowing the main model to rehearse without storing raw data. This is critical for privacy-preserving or memory-constrained edge scenarios where storing real data is infeasible.
Replay Buffer
The fixed or dynamic memory storage component used in experience replay. Effective continual learning depends on buffer management strategies to select which past examples to retain. Common approaches include:
- Reservoir Sampling: Maintains a uniform random sample from a stream.
- Core-Set Selection: Chooses a representative subset that approximates the full data distribution.
- Herding: Selects prototypes closest to the class mean.
Catastrophic Forgetting
The core problem rehearsal methods aim to solve. It is the phenomenon where a neural network abruptly and drastically loses performance on previously learned tasks when trained on new data. This occurs due to unconstrained parameter overwriting. Rehearsal directly combats this by providing interleaved data that regularizes the loss landscape for old tasks.
Stability-Plasticity Dilemma
The fundamental trade-off in all continual learning. Stability refers to a model's ability to retain old knowledge (resisting forgetting). Plasticity is its capacity to learn new information efficiently. Rehearsal-based methods explicitly manage this trade-off: the replay of old data promotes stability, while training on new data maintains plasticity. The buffer size is a direct knob for tuning this balance.
Online Continual Learning
A strict, realistic variant of continual learning where the model receives a single, non-repeating pass through a stream of data, often one sample or a small batch at a time. This imposes severe constraints on rehearsal, as the model cannot loop over old data. Efficient buffer management and sample-efficient rehearsal strategies are paramount for success in this challenging scenario, especially on edge devices.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us