Glossary

Federated Mirror Descent

Federated Mirror Descent is a generalization of federated optimization that performs gradient updates in a dual space defined by a Bregman divergence, offering a unified framework for handling constraints and non-Euclidean geometry.

Get in touch Learn more

Finance analyst reviewing cash flow AI optimization on laptop, charts and projections visible, home office work session.

FEDERATED OPTIMIZATION TECHNIQUE

What is Federated Mirror Descent?

Federated Mirror Descent (FMD) is a generalization of federated optimization that performs gradient updates in a dual space defined by a Bregman divergence, offering a unified framework for handling constraints and non-Euclidean geometry.

Federated Mirror Descent is a foundational optimization algorithm for federated learning that generalizes standard gradient descent. It performs updates by mapping gradients from a primal model space to a dual space via a mirror map, defined by a strictly convex Bregman divergence like KL-divergence. This allows FMD to naturally handle constraints and non-Euclidean geometries, such as the probability simplex, which are common in applications like personalized recommendation or multi-class classification.

The algorithm operates in rounds: clients compute gradients on local data and send these updates to a server. The server performs a mirror descent step in the dual space, then maps the result back to the primal model space for aggregation and redistribution. This framework unifies many federated algorithms, as Federated Averaging (FedAvg) is a special case using squared Euclidean distance. FMD provides stronger theoretical convergence guarantees, especially for non-convex problems and heterogeneous (non-IID) client data distributions.

ARCHITECTURAL PRINCIPLES

Key Features of Federated Mirror Descent

Federated Mirror Descent (FMD) generalizes federated optimization by performing updates in a dual space defined by a Bregman divergence. This provides a unified framework for handling constraints, non-Euclidean geometry, and heterogeneous client objectives.

Bregman Divergence as Geometry

The core mechanism of FMD is the use of a Bregman divergence (e.g., KL divergence, squared Euclidean distance) to define the geometry of the optimization space. Instead of a standard gradient step, the update is a mirror map from the dual (gradient) space back to the primal (model) space. This allows the algorithm to naturally adapt to the geometry of the problem, such as the simplex for probability distributions, which is crucial for tasks like personalized recommendation or classification with constraints.

Unified Framework for Constraints

A primary advantage of FMD is its inherent ability to handle constraints. By choosing a Bregman divergence whose domain matches the constraint set (e.g., the negative entropy for the probability simplex), the mirror map automatically projects updates to stay within the feasible region. This eliminates the need for separate, potentially expensive projection steps after each federated round, making constrained federated optimization—common in applications with privacy budgets or resource limits—more efficient and stable.

Generalization of FedAvg and Adaptive Methods

FMD is not a single algorithm but a meta-algorithm. Specific choices recover well-known methods:

With squared Euclidean distance, FMD reduces to standard Federated Averaging (FedAvg).
With adaptive divergences or dual-space learning rates, it generalizes FedOpt frameworks like FedAdam or FedYogi. This demonstrates FMD's role as a theoretical and practical umbrella, showing how different federated optimizers relate through the lens of Bregman geometry.

Mitigation of Client Drift

FMD can be designed to directly combat client drift. By using a Bregman divergence that acts as a strong regularizer, the local objective on each client penalizes deviation from a global reference point more effectively than a simple proximal term (as in FedProx). This regularization is geometrically aware, leading to more consistent convergence across clients with highly non-IID data and reducing the number of communication rounds required to reach a target accuracy.

Dual Averaging for Sparse Communication

A common instantiation is Federated Dual Averaging. Clients compute gradients and send them (or their dual averages) to the server. The server performs a single mirror step on the aggregated dual vector. This structure is particularly amenable to gradient compression techniques like quantization or sparsification, as the compression is applied in the dual space. The mirror map's properties can help mitigate the distortion caused by compression, maintaining convergence guarantees where simple averaging might fail.

Theoretical Convergence Guarantees

FMD enjoys strong theoretical foundations. Under standard assumptions (convexity, smoothness), convergence rates can be derived that explicitly depend on the Bregman divergence's strong convexity parameter and client heterogeneity. These rates often match or generalize those of standard federated SGD, providing formal assurance of performance. The analysis typically involves tracking the growth of the Bregman divergence between the global model and an optimal solution, a more general measure than Euclidean distance.

OPTIMIZATION ALGORITHM COMPARISON

Federated Mirror Descent vs. Other Federated Optimizers

A technical comparison of Federated Mirror Descent (FMD) against prominent federated optimization algorithms, highlighting their core mechanisms, geometric handling, and suitability for different data and system constraints.

Algorithmic Feature / Property	Federated Mirror Descent (FMD)	Federated Averaging (FedAvg)	FedProx	SCAFFOLD
Core Optimization Framework	Generalized gradient descent using Bregman divergences	Simple weighted averaging of client SGD updates	FedAvg with a proximal term for client regularization	Uses control variates to correct client drift
Underlying Geometry	Non-Euclidean (handles simplex, L1-ball, etc.)	Euclidean (L2 geometry)	Euclidean (L2 geometry)	Euclidean (L2 geometry)
Primary Design Goal	Unified framework for constraints & non-Euclidean geometry	Communication efficiency & simplicity	Stability under systems & statistical heterogeneity	Convergence speed under data heterogeneity (non-IID)
Handles Constraints Natively
Client-Side Computation	Mirror descent step (gradient in dual space)	Multiple local SGD steps	Multiple local proximal SGD steps	Multiple local SGD steps with control variate adjustment
Server-Side Aggregation	Mirror averaging or dual averaging in the primal space	Weighted averaging of parameters	Weighted averaging of parameters	Averaging of parameters and control variates
Theoretical Guarantees Under Heterogeneity	Convergence for convex/non-convex in general geometry	Convergence issues under severe non-IID data	Improved convergence with bounded client drift	Linear speedup and convergence under non-IID data
Communication Cost per Round	Same as FedAvg (transmit primal/dual parameters)	Same as FMD (transmit model parameters)	Same as FedAvg (transmit model parameters)	Higher (transmits model parameters + control variates)
Key Hyperparameter(s)	Mirror map (e.g., negative entropy), learning rate	Local epochs, client fraction, learning rate	Proximal term weight (mu), local epochs	Local/global learning rates, control variate update frequency
Typical Use Case	Learning on probability simplices (e.g., recommendation), sparse models	Baseline for image/vision models, IID-like data	Cross-device FL with varying client availability & slow networks	Cross-silo FL with stable clients and highly non-IID data

FEDERATED OPTIMIZATION

Frequently Asked Questions

Federated Mirror Descent (FMD) is a foundational optimization framework for federated learning that generalizes gradient-based updates to handle complex geometries and constraints. This FAQ addresses its core mechanisms, advantages, and practical applications.

Federated Mirror Descent (FMD) is a generalization of federated gradient descent that performs optimization updates in a dual space defined by a Bregman divergence, enabling efficient learning on non-Euclidean geometries and constrained parameter sets. It works by iterating a two-step process: first, each client computes a gradient in the primal model space; second, this gradient is used to perform a mirror map update in a dual space, which is then mapped back to the primal space via the gradient of a strictly convex potential function (like the negative entropy for the probability simplex). The server aggregates these dual updates, ensuring the global model respects the underlying geometry of the problem, such as staying within a simplex for probability distributions or on a manifold.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

FEDERATED OPTIMIZATION TECHNIQUES

Related Terms

Federated Mirror Descent is a generalization within the federated optimization landscape. These related terms define the specific algorithms, challenges, and techniques that operate within or alongside its framework.

Federated Averaging (FedAvg)

The foundational algorithm upon which most federated optimization is built. In each round, a server:

Selects a subset of clients
Sends the current global model to each
Clients perform Local SGD on their data
Clients send model updates back
The server computes a weighted average of these updates to form a new global model. FedAvg assumes simple averaging in the model's parameter space, a special case of Federated Mirror Descent using Euclidean distance.

FedOpt Framework

A generalization of the server-side aggregation step in federated learning. Instead of a simple weighted average (as in FedAvg), FedOpt applies adaptive optimizer updates (like Adam, Yogi, or Adagrad) to the aggregated client gradients. FedAdam, FedYogi, and FedAdagrad are specific instantiations. Federated Mirror Descent provides the theoretical underpinning for these adaptive updates by defining the proper dual space for the aggregation.

Client Drift

A central challenge that Federated Mirror Descent helps mitigate. Client drift occurs when clients perform multiple local update steps on statistically heterogeneous (non-IID) data, causing their local models to diverge from the global optimum. This is exacerbated by standard FedAvg. Algorithms like SCAFFOLD (which uses control variates) and FedProx (which adds a proximal term) are direct solutions to client drift, and their updates can often be interpreted through the lens of Mirror Descent with a specific Bregman divergence.

Bregman Divergence

The mathematical core defining the "mirror" in Mirror Descent. A Bregman divergence (D_\psi(x, y)) measures the difference between points (x) and (y) using a convex function (\psi).

For (\psi = \frac{1}{2}|\cdot|^2), the divergence is squared Euclidean distance, recovering standard gradient descent.
For (\psi) as the negative entropy, the divergence is the Kullback–Leibler (KL) divergence, enabling updates on the probability simplex (e.g., for training softmax layers under constraints). Federated Mirror Descent's flexibility comes from choosing an appropriate (\psi) for the problem geometry.

FedProx

A federated optimization algorithm designed for statistical and systems heterogeneity. FedProx modifies the local client objective function by adding a proximal term (\frac{\mu}{2} |\theta - \theta^t|^2), which penalizes the local model (\theta) for straying too far from the global model (\theta^t). This explicitly counteracts client drift. FedProx can be viewed as an instance of Federated Mirror Descent where the Bregman divergence is the squared Euclidean distance, linking it directly to the broader theoretical framework.

Adaptive Federated Optimization

A class of algorithms that incorporate adaptive, per-parameter learning rates into the federated process. This includes server-side adaptivity (FedOpt algorithms) and client-side adaptivity. Federated Mirror Descent provides a unified view: adaptive methods like Adam implicitly use a Bregman divergence derived from the history of squared gradients. Understanding this connection allows for the design of new adaptive methods with better convergence guarantees in the non-Euclidean geometries common in federated learning (e.g., constrained optimization on devices).

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.