Inferensys

Glossary

Multimodal Data Augmentation

Terms related to the techniques for generating synthetic or enhanced training data that preserves cross-modal relationships. Target: [ML Researchers, Data Scientists].
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
Glossary

Multimodal Data Augmentation

Terms related to the techniques for generating synthetic or enhanced training data that preserves cross-modal relationships. Target: [ML Researchers, Data Scientists].

Multimodal Data Augmentation (MMDA)

Multimodal Data Augmentation (MMDA) is a set of techniques for artificially expanding a training dataset by applying transformations that preserve the semantic and structural relationships between different data modalities, such as text, image, audio, and video.

Cross-Modal Data Augmentation (CMDA)

Cross-Modal Data Augmentation (CMDA) is a subset of multimodal augmentation focused on generating synthetic data for one modality (e.g., an image) by using information or transformations derived from a paired, different modality (e.g., its text caption).

Synchronized Augmentation

Synchronized Augmentation is a technique where identical or semantically consistent transformations are applied to all modalities within a paired data sample to maintain their cross-modal alignment, such as cropping the same region in an image and its corresponding audio segment.

Modality Dropout

Modality Dropout is a regularization technique where one or more input modalities are randomly masked or omitted during training to force a model to learn robust, cross-modal representations that do not over-rely on any single data type.

Cross-Modal Mixup

Cross-Modal Mixup is a data augmentation method that creates new training samples by performing convex interpolations between the feature representations or raw data of two different multimodal examples, blending their modalities in a coordinated manner.

Modality Translation

Modality Translation is the process of using generative models to convert data from one modality to another while preserving its semantic content, such as generating an image from a text description or creating a textual summary from a video.

Cycle-Consistent Augmentation

Cycle-Consistent Augmentation is a technique that uses cycle-consistent generative adversarial networks (CycleGANs) to learn mappings between different data domains or modalities without requiring perfectly paired training data, enabling unpaired cross-modal translation.

Adversarial Data Augmentation

Adversarial Data Augmentation is a method that uses generative adversarial networks (GANs) or adversarial training techniques to create challenging, model-specific synthetic data points designed to improve a model's robustness and generalization.

Diffusion-Based Augmentation

Diffusion-Based Augmentation is a technique that employs diffusion models to generate high-fidelity, diverse synthetic data by iteratively denoising random noise, guided by conditions such as class labels or text prompts from other modalities.

Latent Space Interpolation

Latent Space Interpolation is an augmentation strategy that generates new data samples by linearly interpolating between the encoded latent representations of two existing samples in a model's embedding space, often within a variational autoencoder (VAE) or GAN.

Cross-Modal Consistency Loss

Cross-Modal Consistency Loss is a training objective that penalizes a model when its predictions or representations for a single concept diverge across different input modalities, enforcing semantic alignment during augmented or synthetic data training.

Paired Data Synthesis

Paired Data Synthesis is the generation of artificially created, aligned data pairs across multiple modalities (e.g., an image and its caption) to augment training datasets where such paired examples are scarce or expensive to collect.

Feature Space Mixing

Feature Space Mixing is a data augmentation approach where interpolations or combinations are performed on the intermediate feature maps or embeddings extracted by a neural network, rather than on the raw input data.

Temporal Augmentation

Temporal Augmentation refers to techniques applied to sequential or time-series data, such as video or audio, including time warping, temporal masking, speed perturbation, and frame sampling, to increase temporal robustness.

Spatial Augmentation

Spatial Augmentation encompasses geometric transformations applied to data with spatial dimensions, such as images, video frames, or 3D point clouds, including rotation, scaling, cropping, flipping, and elastic deformations.

Spectrogram Augmentation

Spectrogram Augmentation is a set of audio data augmentation techniques applied directly to time-frequency representations (spectrograms), including frequency and time masking, warping, and mixing, to improve models for speech and sound recognition.

Test-Time Augmentation (TTA)

Test-Time Augmentation (TTA) is an inference strategy where multiple augmented versions of a single input sample (e.g., flipped, rotated) are passed through a model, and their predictions are aggregated to produce a more robust and stable final output.

Automated Data Augmentation

Automated Data Augmentation is the use of algorithms, such as reinforcement learning or neural architecture search, to automatically discover optimal sequences or policies of data transformations for a specific dataset and model task.

RandAugment

RandAugment is a automated data augmentation policy that randomly selects a fixed number of transformations from a predefined set, applying each with a uniformly sampled magnitude, eliminating the need for a separate search phase.

CutMix

CutMix is an image augmentation technique that creates a new training sample by cutting and pasting a patch from one image onto another, and proportionally mixing the ground truth labels, encouraging the model to learn from partial features.

Mixup

Mixup is a data-agnostic augmentation technique that generates virtual training examples by taking a convex combination of two input samples and their corresponding labels, promoting linear behavior in neural networks between classes.

Domain Randomization

Domain Randomization is a data augmentation strategy for sim-to-real transfer, where simulation parameters (e.g., textures, lighting, object poses) are varied widely during training to force a model to learn invariant features that generalize to the real world.

Synthetic Data Fidelity

Synthetic Data Fidelity refers to the degree to which artificially generated data accurately reflects the statistical properties, semantic content, and perceptual quality of real-world data it is intended to augment or replace.

Augmentation Policy

An Augmentation Policy is a predefined set of rules or a sequence of transformation operations (e.g., rotate, color jitter, translate) that dictates how raw input data is modified during the training process to create augmented samples.

Weakly-Supervised Alignment

Weakly-Supervised Alignment in augmentation refers to techniques that learn to align data from different modalities using only loose or noisy pairing signals, such as co-occurrence in a document, rather than precise, manually annotated correspondences.

Self-Supervised Augmentation

Self-Supervised Augmentation involves creating positive and negative pairs for contrastive learning by applying different random augmentations to the same data sample, allowing models to learn representations without explicit human labels.

Curriculum Data Augmentation

Curriculum Data Augmentation is a training strategy that progressively increases the difficulty or diversity of applied data transformations throughout the learning process, analogous to a curriculum, to stabilize and improve model learning.

Hard Example Mining

Hard Example Mining is an augmentation-adjacent strategy that identifies data samples on which a model currently performs poorly and prioritizes them, or generates similar challenging samples, during subsequent training iterations.