Inferensys

Glossary

Federated Averaging (FedAvg)

Federated Averaging (FedAvg) is the foundational iterative optimization algorithm for federated learning, where a central server aggregates locally computed model updates from a subset of clients by taking a weighted average to produce a new global model.
ML engineer managing model training cluster on laptop, GPU utilization visible, technical deep learning setup.
FEDERATED OPTIMIZATION TECHNIQUE

What is Federated Averaging (FedAvg)?

Federated Averaging (FedAvg) is the foundational iterative optimization algorithm for federated learning, where a central server aggregates locally computed model updates from a subset of clients by taking a weighted average to produce a new global model.

Federated Averaging (FedAvg) is the canonical iterative algorithm for decentralized machine learning, enabling a global model to be trained across distributed edge devices without centralizing raw data. Each selected client performs Local Stochastic Gradient Descent (Local SGD) on its private dataset for several epochs. The server then aggregates the resulting model updates via a weighted average, typically by the number of local training examples, to form a new global model for the next round.

The algorithm's core innovation is its communication efficiency, as clients transmit only model parameter updates, not data. It directly addresses statistical heterogeneity (non-IID data) and systems heterogeneity across clients by allowing variable local computation. FedAvg establishes the foundational pattern for more advanced techniques like FedProx for stability and FedOpt for adaptive server-side optimization, forming the basis for privacy-preserving, scalable distributed AI.

ALGORITHMIC FOUNDATIONS

Key Characteristics of FedAvg

Federated Averaging (FedAvg) is the canonical optimization algorithm for federated learning. Its design is defined by several core mechanisms that enable decentralized training across heterogeneous devices while maintaining data privacy.

01

Iterative Averaging of Local Updates

FedAvg operates in synchronized communication rounds. In each round, a subset of clients receives the current global model, performs Local Stochastic Gradient Descent (Local SGD) for multiple epochs on their private data, and sends the resulting model update (or the full model) back to the server. The server then computes a weighted average of these updates, typically weighted by the number of training samples on each client, to produce a new global model. This iterative averaging approximates the gradient descent that would occur on a centralized dataset.

02

Handling of Statistical Heterogeneity (Non-IID Data)

A fundamental challenge FedAvg addresses is non-IID (Independent and Identically Distributed) data across clients. Real-world device data is inherently heterogeneous (e.g., different user typing habits, local photo libraries). FedAvg's robustness to this stems from performing multiple local update steps, allowing models to partially adapt to local distributions before aggregation. However, this can lead to client drift, where local models diverge from the global objective. Advanced variants like FedProx and SCAFFOLD introduce mechanisms to explicitly correct for this drift.

03

Partial Client Participation per Round

In practical deployments, it is infeasible and inefficient to involve all clients in every training round due to constraints like device availability, network connectivity, and battery life. FedAvg is designed for partial client participation, where the server samples a fraction of the total client population (e.g., 1-10%) in each round. This sampling is often probabilistic, sometimes weighted by client data volume. This characteristic is crucial for scalability and mirrors the real-world intermittency of edge devices.

04

Communication Efficiency Priority

The primary bottleneck in federated learning is often communication bandwidth, not computation. FedAvg is explicitly designed for communication efficiency by performing substantial local computation (many SGD steps) between each communication round. This reduces the total number of rounds required for convergence compared to sending gradients after every single batch. Further efficiency is achieved through techniques like gradient compression, quantization, and top-k sparsification, which can be layered on top of the core FedAvg protocol.

05

Decoupled Server and Client Optimization

FedAvg cleanly separates the optimization processes on the server and clients. The client's role is purely local model training via SGD. The server's role is purely aggregation via a simple weighted average. This decoupling allows for significant flexibility and innovation on both sides. For instance, the FedOpt framework generalizes the server's aggregation step to use adaptive optimizers like FedAdam or FedYogi instead of simple averaging. Similarly, clients can employ personalized techniques or different local optimizers.

06

Privacy by Architecture, Not by Default

FedAvg provides a foundational privacy-by-architecture benefit because raw training data never leaves the client device; only model updates are shared. However, these updates can potentially leak information about the underlying data. Therefore, FedAvg is typically combined with formal privacy-enhancing technologies (PETs) to provide rigorous guarantees. The most common augmentations are:

  • Secure Aggregation: Cryptographic protocols that allow the server to compute the sum/average of client updates without inspecting any individual update.
  • Differential Privacy: Adding calibrated noise to client updates before they are sent, providing a mathematical guarantee that the output does not reveal whether any individual's data was used in training.
COMPARISON

FedAvg vs. Other Federated Optimization Algorithms

A technical comparison of Federated Averaging (FedAvg) against prominent alternative algorithms, highlighting key design features, convergence properties, and suitability for different federated learning challenges.

Algorithmic Feature / MetricFedAvgFedProxSCAFFOLDFedOpt (e.g., FedAdam)

Core Innovation

Weighted averaging of client model parameters

Proximal term in local objective to limit client drift

Control variates (variance reduction) to correct client drift

Adaptive server-side optimizer (e.g., Adam, Adagrad)

Primary Goal

Foundation: Simple, communication-efficient aggregation

Stability with system & statistical heterogeneity

Fast convergence under data heterogeneity (non-IID)

Improved convergence on non-convex problems

Handles Non-IID Data

Mitigates Client Drift

Partial (via adaptive server updates)

Server Update Rule

Static weighted average: w = Σ (n_k / n) * w_k

Static weighted average of proximal-constrained updates

Static average with control variate correction: w = w - η * Σ Δ_k

Adaptive update: w = w - η_server * Optimizer(Σ Δ_k)

Client-Side Computation Overhead

Baseline (Local SGD)

Low (proximal term calculation)

Low (maintains control variate state)

Baseline (Local SGD)

Communication Cost per Round

Baseline (full model parameters)

Baseline (full model parameters)

~2x Baseline (model + control variates)

Baseline (full model parameters)

Convergence Speed (Typical vs. FedAvg on non-IID)

Baseline

Similar or slightly faster

Significantly faster

Faster, especially on complex models

Theoretical Guarantees

Under convex & IID assumptions

Convergence with bounded heterogeneity

Strong convergence rates for non-IID data

Convergence with adaptive server methods

APPLICATIONS

Common Use Cases for Federated Averaging

Federated Averaging (FedAvg) is deployed in domains where data privacy is paramount, computational resources are distributed, and regulatory compliance restricts data centralization. These use cases highlight its practical implementation.

FEDERATED AVERAGING (FEDAVG)

Frequently Asked Questions

Federated Averaging (FedAvg) is the foundational algorithm for decentralized machine learning. These questions address its core mechanics, practical challenges, and relationship to other optimization techniques.

Federated Averaging (FedAvg) is the canonical iterative optimization algorithm for federated learning, where a central server coordinates the training of a shared global model across a massive population of decentralized clients (e.g., mobile phones, IoT devices) without ever accessing their raw local data.

It works through repeated communication rounds:

  1. Server Broadcast: The central server selects a subset of available clients and sends the current global model parameters to them.
  2. Local Training: Each selected client performs multiple epochs of Local Stochastic Gradient Descent (Local SGD) on its own private dataset, starting from the global model.
  3. Update Transmission: Clients send their locally updated model parameters (or gradients) back to the server.
  4. Secure Aggregation: The server computes a weighted average of the received client models to produce a new global model. The weight for each client is typically proportional to its local dataset size. This aggregation step is the core 'averaging' operation.

The process repeats until the global model converges. This architecture provides a fundamental privacy guarantee: sensitive training data never leaves the client device.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.