Inferensys

Comparison

Byzantine-Robust Federated Learning (e.g., Krum) vs FedAvg

A security-focused analysis comparing the standard FedAvg aggregation algorithm against robust alternatives like Krum. Evaluates resilience against malicious clients versus the cost in convergence rate and model utility for enterprise multi-party AI.
ML engineer running AI model benchmarks, performance charts on multiple screens, late night home office setup.
THE ANALYSIS

Introduction

A foundational comparison of the standard FedAvg aggregation algorithm and Byzantine-robust alternatives like Krum, focusing on the critical trade-off between resilience and convergence.

FedAvg (Federated Averaging) excels at efficient convergence in trusted environments because it aggregates client model updates via a simple weighted average. This minimizes communication overhead and computational cost, leading to faster training rounds. For example, in benchmark studies with IID (Independent and Identically Distributed) data and honest participants, FedAvg achieves target accuracy with up to 30-50% fewer communication rounds compared to more complex robust aggregators, making it the default choice for collaborative research or internal corporate training where client integrity is assumed.

Byzantine-Robust algorithms like Krum take a different approach by explicitly defending against malicious clients that may submit poisoned updates. Krum's strategy involves selecting the single client update that is most similar to its neighbors, effectively filtering out statistical outliers. This results in a trade-off of higher computational cost per round and potentially slower convergence in exchange for proven resilience; Krum can tolerate up to a known fraction of Byzantine clients (e.g., f out of n) without compromising the global model's integrity, a critical requirement for open or adversarial cross-silo settings.

The key trade-off: If your priority is maximizing training speed and efficiency in a controlled, vetted network (e.g., internal departmental collaboration), choose FedAvg. If you prioritize security and model integrity in potentially untrusted or regulated multi-party environments (e.g., cross-company healthcare research under HIPAA where data provenance is uncertain), choose a Byzantine-robust aggregator like Krum. For a deeper understanding of aggregation strategies, explore our guide on Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning and the analysis of FedProx vs FedAvg for Heterogeneous Clients.

HEAD-TO-HEAD COMPARISON

FedAvg vs Krum: Byzantine-Robust Federated Learning

Direct comparison of standard federated averaging against the Krum algorithm for security and performance.

MetricFedAvg (Standard)Krum (Byzantine-Robust)

Byzantine Client Resilience

Convergence Rate (Typical)

1.0x (Baseline)

0.6x - 0.8x

Communication Cost per Round

O(n)

O(n²)

Primary Use Case

Trusted, Homogeneous Clients

Untrusted, Adversarial Environments

Model Utility (IID Data)

High

Moderate to High

Model Utility (Non-IID Data)

Moderate

Low to Moderate

Algorithm Complexity

Low

High

Byzantine-Robust FL (e.g., Krum) vs FedAvg

TL;DR Summary

A security-focused comparison of standard aggregation versus robust algorithms, evaluating resilience against malicious clients and the associated cost in convergence and performance.

02

Choose FedAvg

For cooperative, homogeneous environments prioritizing speed. FedAvg's simple weighted averaging of client updates offers fast convergence and high final accuracy when all participants are honest and data is IID or mildly non-IID. It's the baseline for most production FL systems like TensorFlow Federated or Flower due to its simplicity and low computational overhead. It provides zero defense against malicious or faulty clients.

03

Key Trade-off: Security vs. Utility

Robust Aggregation sacrifices model utility for security. Techniques like Krum, Median, or Trimmed Mean introduce bias by discarding potentially valid but outlier updates. This can reduce final accuracy by 2-10% absolute on non-IID data compared to FedAvg. The choice hinges on whether the threat model (malicious clients) poses a greater risk than a slight performance drop.

04

Key Trade-off: Computational & Communication Cost

Robust algorithms increase overhead. Krum requires O(n²) pairwise distance calculations per round, where n is the number of clients. For 1000 clients, this is ~1M comparisons, adding significant server-side compute vs. FedAvg's O(n) averaging. Secure Aggregation (SecAgg) can be combined with these methods, further increasing communication rounds. FedAvg remains the most lightweight option.

CHOOSE YOUR PRIORITY

When to Choose: Decision Guide by Role

Krum for Security Architects

Verdict: Mandatory for high-risk environments. Strengths: Krum and other Byzantine-robust algorithms (e.g., Median, Trimmed Mean) are designed to detect and filter out malicious client updates. They provide provable resilience against data poisoning and model manipulation attacks, which is critical for cross-silo collaborations with low trust, such as in competitive finance or healthcare consortia. The core trade-off is a higher communication cost per round and potentially slower convergence, but this is justified when the threat model includes adversarial participants.

FedAvg for Security Architects

Verdict: Acceptable only in fully trusted or low-risk settings. Strengths: FedAvg offers simplicity and faster convergence under ideal, non-adversarial conditions. However, it is highly vulnerable; a single malicious client can significantly skew the global model. Its use should be restricted to environments with verified, vetted participants (e.g., internal departmental training) or where other layers like Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning provide complementary protection. For architects, the choice hinges on threat modeling: if you cannot guarantee client integrity, robust aggregation is non-negotiable.

THE ANALYSIS

Final Verdict and Recommendation

A decisive comparison of standard and robust aggregation algorithms for federated learning, guiding the choice between performance and security.

FedAvg excels at efficient convergence and high model utility in trusted, homogeneous environments because it simply averages client updates. For example, in benchmark studies with IID data and honest clients, FedAvg achieves ~95% of centralized training accuracy with significantly lower computational overhead per round compared to robust methods. Its simplicity makes it the default choice for cross-device FL on millions of benign mobile devices or within a single organization's secure silos.

Byzantine-Robust algorithms like Krum take a different approach by statistically filtering or aggregating client updates to tolerate malicious actors. This strategy results in a critical trade-off: enhanced security at the cost of slower convergence and potential utility loss, especially under high attack rates. For instance, Krum may discard up to 50% of client updates per round in a severe attack scenario, which protects the global model but can increase the rounds-to-convergence by 20-30% compared to FedAvg in a clean setting.

The key trade-off is between trust assumptions and resilience. If your priority is maximizing model accuracy and training speed in a controlled, low-risk environment (e.g., internal data collaboration), choose FedAvg. It is the foundation of most production FL systems. If you prioritize security and must operate in an adversarial, cross-silo setting with untrusted participants (e.g., multi-competitor consortia), choose a Byzantine-Robust algorithm like Krum or Median. For a deeper understanding of the underlying privacy mechanisms that complement these approaches, see our analysis of Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning. Furthermore, the choice of framework significantly impacts your ability to implement these algorithms; compare production-ready options in FedML vs Flower (Flwr).

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.