Federated Learning Basics

Updated: 2026-02-17

Federated Learning (FL) has risen from a niche research topic to a mainstream solution for training machine learning models across devices while respecting local data privacy. In this guide, we’ll walk through the core concepts, architectural components, practical workflows, and real‑world deployments that make Federated Learning a compelling tool for AI practitioners and businesses alike.


1. Why Federated Learning Matters Today

  • Privacy regulations (GDPR, CCPA, HIPAA) compel organizations to keep sensitive data on-device.
  • Data sovereignty: Many jurisdictions prohibit cross‑border data transfer.
  • Bandwidth constraints: Sending raw data to the cloud is costly and may consume valuable network resources.
  • Edge intelligence: Applications such as smart phones, wearables, and industrial IoT devices need local inference without latency induced by cloud round‑trips.

Federated Learning addresses these challenges by aggregating knowledge, not raw data. It allows a central server to build a global model while each participant keeps its data locally.


2. Core Architecture of Federated Learning

Layer Component Responsibility
1 Client Devices Store local data, run forward/backward passes on the local model, and submit updated model parameters.
2 Federated Server (Aggregator) Coordinates training rounds, aggregates updates, and distributes the refined model back to clients.
3 Communication Protocol Secure and efficient protocols (e.g., HTTPS, gRPC, or custom lightweight sockets) that minimize bandwidth.
4 Security & Privacy Layer Adds differential privacy, secure multiparty computation, or homomorphic encryption to updates.

Federated training follows a round‑based schedule:

  1. Round Initialization: Server sends the current global model to selected clients.
  2. Local Computation: Each client trains on its data for a few epochs and sends only model updates (gradients or weight deltas), not raw data.
  3. Aggregation: Server applies an aggregation rule (e.g., FedAvg) to produce a new global model.
  4. Model Update: Server distributes the new global model back to the clients, and the cycle repeats until convergence.

3. Federated Learning Algorithms

While the vanilla algorithm—Federated Averaging (FedAvg)—has dominated research, several variants adapt to different scenarios:

Algorithm Key Idea When to Use
FedAvg Weighted average of client updates Baseline, many datasets
FedProx Adds proximal term to control local model drift Non‑IID data, heterogeneous devices
FedAdam Uses Adam optimizer in client updates Needs adaptive learning rates
FedOpt Client‑side optimization and server‑side refinement When clients have very different capacities
Server‑Side Gradient Aggregation Aggregates gradients instead of weights Better for certain network conditions

Example: FedAvg Update Formula

[ w_{t+1} = w_t - \eta \sum_{k=1}^{K} \frac{n_k}{N}\Delta w_k^{(t)} ]

Where ( \Delta w_k^{(t)} ) is the local update from client (k), ( n_k ) the number of local samples, (N) the total across all clients, and ( \eta ) is the server learning rate.


4. Hands‑On: Building a Simple Federated Learning Workflow

Below is a practical outline you can adapt to your environment, using Python and two popular libraries: TensorFlow Federated (TFF) and PySyft.

4.1. Environment Setup

# Clone TensorFlow Federated repository
git clone https://github.com/tensorflow/federated.git
cd federated
# Optionally install from source for updates
pip install -e federated-standalone

# Or install PySyft
pip install syft

4.2. Dataset Preparation

Federated data is typically partitioned across clients. For demonstration, we’ll emulate Federated EMNIST (handwritten digits) with 5 clients:

import tensorflow as tf
import tensorflow_federated as tff

emnist_clients = tff.simulation.datasets.emnist.load_data()
client_ids = emnist_clients.client_ids[:5]  # Use five client subsets

train_data = emnist_clients.create_tf_dataset_for_client
validation_data = emnist_clients.create_tf_dataset_for_client

4.3. Model Definition

def create_keras_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
        tf.keras.layers.Conv2D(32, 3, activation='relu'),
        tf.keras.layers.MaxPooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

def model_fn():
    keras_model = create_keras_model()
    return tff.learning.from_keras_model(keras_model,
        input_spec=train_data(client_ids[0]).element_spec,
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

4.4. Federated Training Loop

iterative_process = tff.learning.build_federated_averaging_process(model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

state = iterative_process.initialize()
for round_num in range(1, 51):
    state, metrics = iterative_process.next(state, [train_data(cid) for cid in client_ids])
    print(f'Round {round_num} - loss: {metrics["loss"]:.3f} - accuracy: {metrics["sparse_categorical_accuracy"]:.3f}')

4.5. Evaluation

def evaluate(state, client_id):
    state.model.compile(
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
    return state.model.evaluate(validation_data(client_id))

for cid in client_ids:
    eval_metrics = evaluate(state, cid)
    print(f'Client {cid} - loss: {eval_metrics[0]:.3f}, acc: {eval_metrics[1]:.3f}')

5. Practical Considerations

5.1. Client Selection Strategies

Strategy Advantages Drawbacks
Random Simplicity, fairness Ignores device capabilities
Round‑robin Guarantees participation Not scalable for thousands of devices
Weighted (by data size, connectivity) Reflects real contributions Requires accurate metadata

5.2. Heterogeneity Management

  • Non‑IID Data: Use algorithms like FedProx or FedAvgM to reduce bias.
  • Device Capability: Dynamically adjust local epochs and batch sizes.
  • Stragglers: Employ asynchronous aggregation or set timeouts.

5.3. Security & Trust

Layer Technique Effect
Cryptography Secure Aggregation (Paillier), Homomorphic Encryption Guarantees that the aggregator cannot read individual updates
Differential Privacy Noise addition to updates Limits inferential attacks on sensitive data
Model Watermarking Embedding fragile or robust watermarks Adds attribution and model integrity checks

5.4. Evaluation Metrics

  • Global Accuracy: Standard benchmark on held‑out set.
  • Per‑Client Performance: Detect performance drift across different devices.
  • Communication Cost: Bytes transferred per round.
  • Training Time: Wall‑clock duration per round.

6. Real‑World Use Cases

Industry Application Outcome
Healthcare Federated training of disease prediction on hospital data Maintained patient privacy, improved model generalization
Mobile OS Keyboard prediction models on phones Reduced data transfer, personalized suggestions
Finance Fraud detection across banks Improved detection rates while adhering to strict data sharing rules
IoT Predictive maintenance for industrial sensors Real‑time alerts, lower latency, compliance with on‑prem data laws

These deployments often combine FL with edge computing frameworks (Kubernetes on‑edge, Docker‑Compose on devices) and edge‑optimized hardware such as Qualcomm’s Snapdragon™ Neural Processing Engine.


7. Benchmark: Communication Efficiency vs. Classic Centralized Training

Method Data Sent Model Updates Network Latency Privacy Guarantee
Classic Centralized Raw data (e.g., 1 GB per user) None High, due to round‑trips No
Federated Averaging Model updates (~50 kB per client) 50 kB per round Moderate Yes

Practical Insight: On 5 G networks, the data‑centric approach can cut per‑round bandwidth by over 90 % compared to raw data uploads, enabling deployment on low‑cost satellite links.


7. Future Directions

  1. Cross‑Device FL: Aggregating across millions of highly diverse devices.
  2. Cross‑Silo FL: Orchestrating between institutions with distinct security policies.
  3. Model Compression: Pruning and quantization to reduce update size further.
  4. Federated Transfer Learning: Pre‑trained global models are fine‑tuned locally, balancing speed and privacy.

7. Quick Reference Cheat Sheet

Topic Tip
Aggregation Rule FedAvg works best when client data is IID. For Non‑IID use FedProx.
Client Update Send weight deltas, not raw gradients, to keep updates small.
Privacy Add noise scaled to the sensitivity of the update for differential privacy.
Client Strategy Use weighted random selection to favor devices with higher connectivity.
Evaluation Track per‑client accuracy to guard against privacy‑model inversion.

7. Further Reading & Resources


8. Closing Thoughts

Federated Learning bridges the gap between data‑driven AI and the practical realities of privacy, bandwidth, and device heterogeneity. By keeping data on‑device and orchestrating intelligent aggregation, FL turns data silos into a collective intelligence. As regulations tighten and edge ecosystems expand, mastering FL will become a vital competency for any modern AI practitioner.

“Privacy‑preserving intelligence is not a trade‑off; it is an opportunity for innovation.” – Igor Brtko

Happy Federating!