Temporal convolution is a mathematical operation in a convolutional neural network (CNN) where a learnable filter slides across the time dimension of sequential input data to extract local temporal patterns and features. Unlike spatial convolutions for images, it operates on one-dimensional sequences—such as audio waveforms, sensor readings, or time-series—by computing the dot product between the filter weights and local segments of the input across successive time steps. This produces a feature map that highlights where specific temporal motifs occur within the sequence.
