Inferensys

Glossary

Vertical Federated Learning

Vertical Federated Learning (VFL) is a privacy-preserving machine learning paradigm where multiple parties, each holding different feature sets about the same set of entities (e.g., customers), collaboratively train a model without directly sharing their raw data.
Data engineer managing feature store on laptop, feature definitions visible, casual data engineering session.
FEDERATED LEARNING

What is Vertical Federated Learning?

Vertical Federated Learning (VFL) is a collaborative machine learning paradigm where different organizations, each holding different feature sets about the same set of entities, jointly train a model without directly sharing their raw data.

Vertical Federated Learning (VFL) is a decentralized training paradigm for scenarios where data is partitioned by features (columns) across parties, not by samples (rows). In VFL, different entities—such as a bank and an e-commerce platform—hold distinct feature sets about the same customers. They collaborate to train a model, such as a joint credit risk predictor, by aligning their data via encrypted entity matching and then computing over encrypted or masked intermediate results, ensuring raw feature data never leaves its owner's silo.

The core technical challenge in VFL is performing secure, aligned computation without data centralization. Common architectures use a split neural network, where each party computes the initial layers on its local features. Intermediate outputs (often called smashed data or embeddings) are securely aggregated at a coordinator or another party to compute the final layers and loss. Training relies on cryptographic techniques like homomorphic encryption or secure multi-party computation to compute gradients, preserving privacy. This makes VFL distinct from horizontal federated learning, where parties share the same feature space but different samples.

VERTICAL FEDERATED LEARNING

Key Characteristics of VFL

Vertical Federated Learning (VFL) is a collaborative machine learning paradigm where different organizations hold different feature sets about the same set of entities and train a model without directly sharing raw data. Its architecture is defined by several core technical characteristics.

01

Feature Partitioning by Entity

In VFL, data is partitioned vertically or by feature. Different parties (e.g., a bank and an e-commerce platform) hold different attributes (features) for the same set of users (entities or sample IDs).

  • Bank: Holds credit score, transaction history.
  • E-commerce: Holds purchase history, browsing behavior.

The goal is to train a model that utilizes this combined feature space without any party exposing its raw feature columns. This contrasts with Horizontal Federated Learning (HFL), where parties have the same feature set but different samples.

02

Sample Alignment & Cryptography

A prerequisite for VFL is Private Set Intersection (PSI) to securely identify the common set of entities across parties without revealing non-overlapping samples.

  • Process: Parties use cryptographic protocols (e.g., based on Diffie-Hellman, oblivious transfer) to compute the intersection of their ID lists.
  • Output: Only the aligned, overlapping samples are used for training. The protocol ensures no party learns the full ID list of another.

This step is computationally intensive but critical for privacy and model validity, preventing training on misaligned data.

03

Split Neural Network Architecture

The model architecture is physically split across participants. A typical setup involves:

  • Bottom Models: Each party holds a local model (e.g., a few neural network layers) that processes its private features.
  • Interactive Layer: The outputs (embeddings or smashed data) from all bottom models are sent to a guest party or a neutral coordinator.
  • Top Model: The guest/coordinator aggregates these intermediate outputs and runs the remaining layers of the network to produce the final prediction.

During backward propagation, gradients flow back through the top model to each party's bottom model for local updates, without exposing raw features.

04

Asymmetric Roles: Host & Guest

VFL typically involves asymmetric participant roles, unlike the symmetric client-server model of HFL.

  • Guest Party: The party that holds the labels (Y) for the aligned samples and usually hosts the top model. It initiates the training task and computes the final loss.
  • Host Party(ies): Parties that hold only features (X) and host bottom models. They contribute feature representations but do not possess labels.

This role distinction fundamentally shapes the protocol's communication pattern, incentive structure, and privacy considerations.

05

Privacy-Preserving Forward Pass

The forward pass is designed to prevent leakage of private features. The key mechanism is the transmission of encrypted or homomorphically encrypted intermediate results.

  • Plaintext Embeddings: In basic setups, hosts send plaintext embeddings (smashed data) to the guest. This reveals some information but not raw features.
  • Enhanced Privacy: For stronger guarantees, hosts encrypt their embeddings using Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC). The guest can perform computations on these encrypted values to continue the forward pass without decryption.

This ensures the guest cannot directly invert the embeddings to reconstruct host features.

06

Secure Gradient Exchange

The backward pass must also protect sensitive information. Gradients can leak information about the underlying training data.

  • Gradient Protection: Hosts receive gradients for their bottom models from the guest. To prevent label leakage from the guest to the hosts, these gradients may be obfuscated or computed using cryptographic techniques.
  • Aggregation without Disclosure: Protocols like Secure Aggregation can be extended to VFL, allowing the coordinator to compute the necessary aggregated gradient information without learning any party's individual contribution.

This secure exchange is crucial for maintaining the privacy guarantee for all parties throughout the training lifecycle.

FEDERATED LEARNING PARADIGMS

Vertical vs. Horizontal Federated Learning

A comparison of the two primary data partitioning schemes in federated learning, highlighting their architectural differences, use cases, and technical challenges.

FeatureVertical Federated Learning (VFL)Horizontal Federated Learning (HFL)

Data Partitioning Scheme

Features are partitioned across clients (same sample IDs, different features).

Samples are partitioned across clients (different sample IDs, same feature set).

Typical Use Case

Collaboration between organizations with complementary data on the same entities (e.g., a bank and an e-commerce site analyzing shared customers).

Training across many devices/users with similar data schemas (e.g., next-word prediction across millions of smartphones).

Sample Alignment Requirement

Privacy-Preserving Entity Resolution

Required (e.g., via Private Set Intersection) to find common samples without exposing IDs.

Model Architecture

Typically a split neural network. Clients hold bottom models for their features; a server holds the top model.

All clients train an identical, full model architecture locally.

Communication Overhead per Round

High (requires exchanging intermediate activations/gradients for each aligned sample).

Lower (exchanges only model parameters or gradients).

Scalability to Massive Client Numbers

Primary Privacy Risk

Potential leakage from intermediate activations (smashed data).

Potential leakage from shared model gradients/updates.

Common Aggregation Method

Gradient/activation aggregation from split layers.

Parameter averaging (e.g., Federated Averaging).

Handling of Non-IID Data

Inherently addresses feature-wise heterogeneity.

Challenged by sample-wise heterogeneity; requires algorithms like FedProx.

VERTICAL DATA PARTITIONING

Common Use Cases for VFL

Vertical Federated Learning (VFL) enables collaborative model training across organizations that hold different attributes (features) about the same entities. Its primary applications are in industries where data is highly sensitive, siloed, and complementary.

VERTICAL FEDERATED LEARNING

Frequently Asked Questions

Vertical Federated Learning (VFL) enables collaborative model training between organizations that hold different data features about the same entities, such as customers or patients, without sharing the raw underlying data. This FAQ addresses its core mechanisms, differences from horizontal FL, and its critical role in privacy-preserving, cross-organizational AI.

Vertical Federated Learning (VFL) is a collaborative machine learning paradigm where two or more parties, each holding a different set of features for the same set of entities (e.g., customers, patients), jointly train a model without directly exchanging their raw feature data. It works by aligning entities via privacy-preserving entity resolution (e.g., using cryptographic hashes) and then training a model where the computation is split: each party computes on its local features, and only necessary intermediate results, such as embeddings or gradients, are exchanged under encryption to complete forward and backward passes. A common architecture uses a split neural network, where the bottom layers reside with each data party and the top layers are on a coordinating server or a designated party.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.