Canonical Correlation Analysis (CCA) is a statistical technique that identifies and quantifies the linear relationships between two sets of variables by finding pairs of linear combinations, called canonical variates, that are maximally correlated. In machine learning, it is used for dimensionality reduction, feature learning, and, critically, for modality alignment—projecting data from different sources into a shared latent space where semantically similar concepts are close together. This makes it a core algorithm for multi-modal memory encoding, allowing agents to relate textual descriptions to visual or auditory inputs.
