Federated Learning Basics

Updated: 2026-02-17

Federated Learning (FL) has risen from a niche research topic to a mainstream solution for training machine learning models across devices while respecting local data privacy. In this guide, we’ll walk through the core concepts, architectural components, practical workflows, and real‑world deployments that make Federated Learning a compelling tool for AI practitioners and businesses alike.

1. Why Federated Learning Matters Today

Privacy regulations (GDPR, CCPA, HIPAA) compel organizations to keep sensitive data on-device.
Data sovereignty: Many jurisdictions prohibit cross‑border data transfer.
Bandwidth constraints: Sending raw data to the cloud is costly and may consume valuable network resources.
Edge intelligence: Applications such as smart phones, wearables, and industrial IoT devices need local inference without latency induced by cloud round‑trips.

Federated Learning addresses these challenges by aggregating knowledge, not raw data. It allows a central server to build a global model while each participant keeps its data locally.

2. Core Architecture of Federated Learning

Layer	Component	Responsibility
1	Client Devices	Store local data, run forward/backward passes on the local model, and submit updated model parameters.
2	Federated Server (Aggregator)	Coordinates training rounds, aggregates updates, and distributes the refined model back to clients.
3	Communication Protocol	Secure and efficient protocols (e.g., HTTPS, gRPC, or custom lightweight sockets) that minimize bandwidth.
4	Security & Privacy Layer	Adds differential privacy, secure multiparty computation, or homomorphic encryption to updates.

Federated training follows a round‑based schedule:

Round Initialization: Server sends the current global model to selected clients.
Local Computation: Each client trains on its data for a few epochs and sends only model updates (gradients or weight deltas), not raw data.
Aggregation: Server applies an aggregation rule (e.g., FedAvg) to produce a new global model.
Model Update: Server distributes the new global model back to the clients, and the cycle repeats until convergence.

3. Federated Learning Algorithms

While the vanilla algorithm—Federated Averaging (FedAvg)—has dominated research, several variants adapt to different scenarios:

Algorithm	Key Idea	When to Use
FedAvg	Weighted average of client updates	Baseline, many datasets
FedProx	Adds proximal term to control local model drift	Non‑IID data, heterogeneous devices
FedAdam	Uses Adam optimizer in client updates	Needs adaptive learning rates
FedOpt	Client‑side optimization and server‑side refinement	When clients have very different capacities
Server‑Side Gradient Aggregation	Aggregates gradients instead of weights	Better for certain network conditions

Example: FedAvg Update Formula

[ w_{t+1} = w_t - \eta \sum_{k=1}^{K} \frac{n_k}{N}\Delta w_k^{(t)} ]

Where ( \Delta w_k^{(t)} ) is the local update from client (k), ( n_k ) the number of local samples, (N) the total across all clients, and ( \eta ) is the server learning rate.

4. Hands‑On: Building a Simple Federated Learning Workflow

Below is a practical outline you can adapt to your environment, using Python and two popular libraries: TensorFlow Federated (TFF) and PySyft.

4.1. Environment Setup

# Clone TensorFlow Federated repository
git clone https://github.com/tensorflow/federated.git
cd federated
# Optionally install from source for updates
pip install -e federated-standalone

# Or install PySyft
pip install syft

4.2. Dataset Preparation

Federated data is typically partitioned across clients. For demonstration, we’ll emulate Federated EMNIST (handwritten digits) with 5 clients:

import tensorflow as tf
import tensorflow_federated as tff

emnist_clients = tff.simulation.datasets.emnist.load_data()
client_ids = emnist_clients.client_ids[:5]  # Use five client subsets

train_data = emnist_clients.create_tf_dataset_for_client
validation_data = emnist_clients.create_tf_dataset_for_client

4.3. Model Definition

def create_keras_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
        tf.keras.layers.Conv2D(32, 3, activation='relu'),
        tf.keras.layers.MaxPooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

def model_fn():
    keras_model = create_keras_model()
    return tff.learning.from_keras_model(keras_model,
        input_spec=train_data(client_ids[0]).element_spec,
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

4.4. Federated Training Loop

iterative_process = tff.learning.build_federated_averaging_process(model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

state = iterative_process.initialize()
for round_num in range(1, 51):
    state, metrics = iterative_process.next(state, [train_data(cid) for cid in client_ids])
    print(f'Round {round_num} - loss: {metrics["loss"]:.3f} - accuracy: {metrics["sparse_categorical_accuracy"]:.3f}')

4.5. Evaluation

def evaluate(state, client_id):
    state.model.compile(
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
    return state.model.evaluate(validation_data(client_id))

for cid in client_ids:
    eval_metrics = evaluate(state, cid)
    print(f'Client {cid} - loss: {eval_metrics[0]:.3f}, acc: {eval_metrics[1]:.3f}')

5. Practical Considerations

5.1. Client Selection Strategies

Strategy	Advantages	Drawbacks
Random	Simplicity, fairness	Ignores device capabilities
Round‑robin	Guarantees participation	Not scalable for thousands of devices
Weighted (by data size, connectivity)	Reflects real contributions	Requires accurate metadata

5.2. Heterogeneity Management

Non‑IID Data: Use algorithms like FedProx or FedAvgM to reduce bias.
Device Capability: Dynamically adjust local epochs and batch sizes.
Stragglers: Employ asynchronous aggregation or set timeouts.

5.3. Security & Trust

Layer	Technique	Effect
Cryptography	Secure Aggregation (Paillier), Homomorphic Encryption	Guarantees that the aggregator cannot read individual updates
Differential Privacy	Noise addition to updates	Limits inferential attacks on sensitive data
Model Watermarking	Embedding fragile or robust watermarks	Adds attribution and model integrity checks

5.4. Evaluation Metrics

Global Accuracy: Standard benchmark on held‑out set.
Per‑Client Performance: Detect performance drift across different devices.
Communication Cost: Bytes transferred per round.
Training Time: Wall‑clock duration per round.

6. Real‑World Use Cases

Industry	Application	Outcome
Healthcare	Federated training of disease prediction on hospital data	Maintained patient privacy, improved model generalization
Mobile OS	Keyboard prediction models on phones	Reduced data transfer, personalized suggestions
Finance	Fraud detection across banks	Improved detection rates while adhering to strict data sharing rules
IoT	Predictive maintenance for industrial sensors	Real‑time alerts, lower latency, compliance with on‑prem data laws

These deployments often combine FL with edge computing frameworks (Kubernetes on‑edge, Docker‑Compose on devices) and edge‑optimized hardware such as Qualcomm’s Snapdragon™ Neural Processing Engine.

7. Benchmark: Communication Efficiency vs. Classic Centralized Training

Method	Data Sent	Model Updates	Network Latency	Privacy Guarantee
Classic Centralized	Raw data (e.g., 1 GB per user)	None	High, due to round‑trips	No
Federated Averaging	Model updates (~50 kB per client)	50 kB per round	Moderate	Yes

Practical Insight: On 5 G networks, the data‑centric approach can cut per‑round bandwidth by over 90 % compared to raw data uploads, enabling deployment on low‑cost satellite links.

7. Future Directions

Cross‑Device FL: Aggregating across millions of highly diverse devices.
Cross‑Silo FL: Orchestrating between institutions with distinct security policies.
Model Compression: Pruning and quantization to reduce update size further.
Federated Transfer Learning: Pre‑trained global models are fine‑tuned locally, balancing speed and privacy.

7. Quick Reference Cheat Sheet

Topic	Tip
Aggregation Rule	FedAvg works best when client data is IID. For Non‑IID use FedProx.
Client Update	Send weight deltas, not raw gradients, to keep updates small.
Privacy	Add noise scaled to the sensitivity of the update for differential privacy.
Client Strategy	Use weighted random selection to favor devices with higher connectivity.
Evaluation	Track per‑client accuracy to guard against privacy‑model inversion.

7. Further Reading & Resources

TensorFlow Federated Tutorials – https://www.tensorflow.org/federated
PySyft Documentation – https://pysyft.readthedocs.io
Federated Learning Handbook (NVIDIA) – practical guidelines for GPU‑enabled edge devices.
Open‑Source Projects – Flower, FedML, openfl.

8. Closing Thoughts

Federated Learning bridges the gap between data‑driven AI and the practical realities of privacy, bandwidth, and device heterogeneity. By keeping data on‑device and orchestrating intelligent aggregation, FL turns data silos into a collective intelligence. As regulations tighten and edge ecosystems expand, mastering FL will become a vital competency for any modern AI practitioner.

“Privacy‑preserving intelligence is not a trade‑off; it is an opportunity for innovation.” – Igor Brtko

Happy Federating!