A foundational comparison of the standard FedAvg aggregation algorithm and Byzantine-robust alternatives like Krum, focusing on the critical trade-off between resilience and convergence.
Comparison

A foundational comparison of the standard FedAvg aggregation algorithm and Byzantine-robust alternatives like Krum, focusing on the critical trade-off between resilience and convergence.
FedAvg (Federated Averaging) excels at efficient convergence in trusted environments because it aggregates client model updates via a simple weighted average. This minimizes communication overhead and computational cost, leading to faster training rounds. For example, in benchmark studies with IID (Independent and Identically Distributed) data and honest participants, FedAvg achieves target accuracy with up to 30-50% fewer communication rounds compared to more complex robust aggregators, making it the default choice for collaborative research or internal corporate training where client integrity is assumed.
Byzantine-Robust algorithms like Krum take a different approach by explicitly defending against malicious clients that may submit poisoned updates. Krum's strategy involves selecting the single client update that is most similar to its neighbors, effectively filtering out statistical outliers. This results in a trade-off of higher computational cost per round and potentially slower convergence in exchange for proven resilience; Krum can tolerate up to a known fraction of Byzantine clients (e.g., f out of n) without compromising the global model's integrity, a critical requirement for open or adversarial cross-silo settings.
The key trade-off: If your priority is maximizing training speed and efficiency in a controlled, vetted network (e.g., internal departmental collaboration), choose FedAvg. If you prioritize security and model integrity in potentially untrusted or regulated multi-party environments (e.g., cross-company healthcare research under HIPAA where data provenance is uncertain), choose a Byzantine-robust aggregator like Krum. For a deeper understanding of aggregation strategies, explore our guide on Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning and the analysis of FedProx vs FedAvg for Heterogeneous Clients.
Direct comparison of standard federated averaging against the Krum algorithm for security and performance.
| Metric | FedAvg (Standard) | Krum (Byzantine-Robust) |
|---|---|---|
Byzantine Client Resilience | ||
Convergence Rate (Typical) | 1.0x (Baseline) | 0.6x - 0.8x |
Communication Cost per Round | O(n) | O(n²) |
Primary Use Case | Trusted, Homogeneous Clients | Untrusted, Adversarial Environments |
Model Utility (IID Data) | High | Moderate to High |
Model Utility (Non-IID Data) | Moderate | Low to Moderate |
Algorithm Complexity | Low | High |
A security-focused comparison of standard aggregation versus robust algorithms, evaluating resilience against malicious clients and the associated cost in convergence and performance.
For high-trust environments with adversarial risk. Algorithms like Krum filter out malicious updates by calculating client similarity, providing provable security against a bounded fraction of Byzantine attackers. This is critical for cross-silo collaborations in finance or defense where data cannot be inspected. The trade-off is a ~15-30% slower convergence and potential bias if benign clients are highly heterogeneous.
For cooperative, homogeneous environments prioritizing speed. FedAvg's simple weighted averaging of client updates offers fast convergence and high final accuracy when all participants are honest and data is IID or mildly non-IID. It's the baseline for most production FL systems like TensorFlow Federated or Flower due to its simplicity and low computational overhead. It provides zero defense against malicious or faulty clients.
Robust Aggregation sacrifices model utility for security. Techniques like Krum, Median, or Trimmed Mean introduce bias by discarding potentially valid but outlier updates. This can reduce final accuracy by 2-10% absolute on non-IID data compared to FedAvg. The choice hinges on whether the threat model (malicious clients) poses a greater risk than a slight performance drop.
Robust algorithms increase overhead. Krum requires O(n²) pairwise distance calculations per round, where n is the number of clients. For 1000 clients, this is ~1M comparisons, adding significant server-side compute vs. FedAvg's O(n) averaging. Secure Aggregation (SecAgg) can be combined with these methods, further increasing communication rounds. FedAvg remains the most lightweight option.
Verdict: Mandatory for high-risk environments. Strengths: Krum and other Byzantine-robust algorithms (e.g., Median, Trimmed Mean) are designed to detect and filter out malicious client updates. They provide provable resilience against data poisoning and model manipulation attacks, which is critical for cross-silo collaborations with low trust, such as in competitive finance or healthcare consortia. The core trade-off is a higher communication cost per round and potentially slower convergence, but this is justified when the threat model includes adversarial participants.
Verdict: Acceptable only in fully trusted or low-risk settings. Strengths: FedAvg offers simplicity and faster convergence under ideal, non-adversarial conditions. However, it is highly vulnerable; a single malicious client can significantly skew the global model. Its use should be restricted to environments with verified, vetted participants (e.g., internal departmental training) or where other layers like Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning provide complementary protection. For architects, the choice hinges on threat modeling: if you cannot guarantee client integrity, robust aggregation is non-negotiable.
A decisive comparison of standard and robust aggregation algorithms for federated learning, guiding the choice between performance and security.
FedAvg excels at efficient convergence and high model utility in trusted, homogeneous environments because it simply averages client updates. For example, in benchmark studies with IID data and honest clients, FedAvg achieves ~95% of centralized training accuracy with significantly lower computational overhead per round compared to robust methods. Its simplicity makes it the default choice for cross-device FL on millions of benign mobile devices or within a single organization's secure silos.
Byzantine-Robust algorithms like Krum take a different approach by statistically filtering or aggregating client updates to tolerate malicious actors. This strategy results in a critical trade-off: enhanced security at the cost of slower convergence and potential utility loss, especially under high attack rates. For instance, Krum may discard up to 50% of client updates per round in a severe attack scenario, which protects the global model but can increase the rounds-to-convergence by 20-30% compared to FedAvg in a clean setting.
The key trade-off is between trust assumptions and resilience. If your priority is maximizing model accuracy and training speed in a controlled, low-risk environment (e.g., internal data collaboration), choose FedAvg. It is the foundation of most production FL systems. If you prioritize security and must operate in an adversarial, cross-silo setting with untrusted participants (e.g., multi-competitor consortia), choose a Byzantine-Robust algorithm like Krum or Median. For a deeper understanding of the underlying privacy mechanisms that complement these approaches, see our analysis of Secure Aggregation (SecAgg) vs Differential Privacy (DP) for Federated Learning. Furthermore, the choice of framework significantly impacts your ability to implement these algorithms; compare production-ready options in FedML vs Flower (Flwr).
Contact
Share what you are building, where you need help, and what needs to ship next. We will reply with the right next step.
01
NDA available
We can start under NDA when the work requires it.
02
Direct team access
You speak directly with the team doing the technical work.
03
Clear next step
We reply with a practical recommendation on scope, implementation, or rollout.
30m
working session
Direct
team access