Custom Loss Function in TensorFlow: A Practical Guide

Updated: 2026-02-17

Deep learning thrives on the flexibility of defining loss objectives that best match a problem’s nuances. While TensorFlow and Keras provide a rich set of built‑in losses, real‑world tasks—such as medical image segmentation, class imbalance handling, or ranking—often require a bespoke loss expression. This guide walks you through the mechanics of crafting custom loss functions in TensorFlow 2.x, from the simplest function to sophisticated, multi‑output scenarios, all while keeping an eye on performance and best practices.

Why Custom Loss Functions?
A tailored loss can directly encode domain knowledge, penalize undesirable predictions more heavily, or promote desirable structures in the output. By capturing these subtleties, you unlock higher accuracy and better generalization.

Why Custom Loss Functions Matter

Problem	Built‑in Losses Limitation	Custom Solution
Class imbalance (rare tumor detection)	`BinaryCrossentropy` treats all samples equally	Focal Loss down‑weights easy negatives
Pixel‑wise segmentation (semantic masks)	`SparseCategoricalCrossentropy` penalizes each pixel independently	Dice Loss maximizes overlap between predictions and ground truth
Ranking (recommendation systems)	`MeanSquaredError` ignores relative order	Pairwise Ranking Loss focuses on order rather than absolute difference
Multi‑task models (joint depth & semantics)	Single loss forces a trade‑off	Weighted sum of task‑specific losses

Custom loss functions translate these problem‑specific insights into an objective that the optimiser can minimize. They are not just a technical trick; they are a crucial part of a well‑engineered DL pipeline.

Anatomy of a Loss Function

A loss function, mathematically, converts a model’s predictions y_pred and the true labels y_true into a scalar value that represents training error. In TensorFlow, a loss must be

Differentiable – providing gradients for back‑propagation.
Tensor‑friendly – operating on tf.Tensor objects, preferably element‑wise for vectorization.

Two principal patterns exist for defining custom losses:

Pattern	How it works	Typical Use
Functional	Pass a function `custom_loss(y_true, y_pred)` to `compile()`	Quick experiments, when `tf.keras.losses.Loss` is overkill
Subclass	Derive from `tf.keras.losses.Loss`, override `__init__` & `call(y_true, y_pred)`	Reusable, configurable losses used across multiple models

Creating a Custom Loss in TensorFlow 2.x

1. Functional Approach

import tensorflow as tf

def dice_loss(y_true, y_pred, smooth=1e-6):
    y_true_f = tf.reshape(y_true, [-1])
    y_pred_f = tf.reshape(y_pred, [-1])
    intersection = tf.reduce_sum(y_true_f * y_pred_f)
    score = (2. * intersection + smooth) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + smooth)
    return 1 - score

Works great for one‑time use.
No class state; all hyper‑parameters are function arguments.

2. Subclassed Loss

class DiceLoss(tf.keras.losses.Loss):
    def __init__(self, smooth=1e-6, name='dice_loss'):
        super().__init__(name=name)
        self.smooth = smooth

    def call(self, y_true, y_pred):
        y_true_f = tf.reshape(y_true, [-1])
        y_pred_f = tf.reshape(y_pred, [-1])
        intersection = tf.reduce_sum(y_true_f * y_pred_f)
        return 1. - (2. * intersection + self.smooth) / (
            tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + self.smooth)

Encapsulates hyper‑parameters.
Can be instantiated multiple times with different smooth values.
Useful for registering the loss in model.compile().

3. Custom Gradients with `tf.GradientTape`

Sometimes the loss itself does not directly provide the gradient you want (e.g., ranking loss relying on pairwise comparisons). You can wrap a plain function with a custom gradient:

@tf.custom_gradient
def pairwise_mse(y_true, y_pred):
    # Compute pairwise difference
    diff = y_pred[:, None] - y_pred
    mse = tf.reduce_mean(tf.square(diff))
    def grad(dy):
        # Custom gradient that takes into account the pairwise nature
        return dy * 2 * (y_pred - y_true)
    return mse, grad

tf.custom_gradient gives you full control over back‑propagation, allowing you to craft non‑standard update rules.

Common Use Cases

Below is a cheat‑sheet of popular custom loss functions, their mathematics, and typical application scenarios.

Loss	Formula	When to Use
Focal Loss	`-α * (1-π)^γ * log(π)` where π = predicted probability	Highly imbalanced classification
Dice Loss	`1 - (2 *	A ∩ B
Weighted BCE	`-w * y_true * log(y_pred) - (1-w)(1-y_true)log(1-y_pred)`	Imbalanced binary classification with tunable weight
Rank Loss (e.g., RankNet)	`log(1 + exp(-s_i + s_j))`	Recommendation systems, ranking tasks
Triplet Loss	`max(0, d(a, p) - d(a, n) + margin)`	Metric learning (face verification, embedding space)

Practical Example: Dice Loss for Medical Image Segmentation

We’ll walk through a full pipeline: dataset loading, model definition, custom loss integration, and evaluation.

1. Dataset Preparation

import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras.preprocessing import image_dataset_from_directory

train_ds = image_dataset_from_directory(
    'data/medical/train',
    validation_split=0.2,
    subset="training",
    seed=42,
    image_size=(256, 256),
    batch_size=8,
    label_mode=None)

val_ds = image_dataset_from_directory(
    'data/medical/val',
    validation_split=0.2,
    subset="validation",
    seed=42,
    image_size=(256, 256),
    batch_size=8,
    label_mode=None)

Label mode set to None because we’re loading masks separately.
Use tf.data to pair each image with its mask.

def load_mask(path):
    mask = tf.io.read_file(path)
    mask = tf.image.decode_png(mask, channels=1)
    mask = tf.image.resize(mask, [256, 256]) / 255.0
    return mask

train_ds = train_ds.map(lambda img: (img, load_mask(img.path)))
val_ds = val_ds.map(lambda img: (img, load_mask(img.path)))

2. Model Definition

from tensorflow.keras import layers, models

def simple_unet(input_shape=(256, 256, 3)):
    inputs = layers.Input(shape=input_shape)

    # Encoder
    c1 = layers.Conv2D(16, (3,3), activation='relu', padding='same')(inputs)
    p1 = layers.MaxPooling2D((2,2))(c1)

    c2 = layers.Conv2D(32, (3,3), activation='relu', padding='same')(p1)
    p2 = layers.MaxPooling2D((2,2))(c2)

    # Bottleneck
    bn = layers.Conv2D(64, (3,3), activation='relu', padding='same')(p2)

    # Decoder
    u1 = layers.UpSampling2D((2,2))(bn)
    concat1 = layers.concatenate([u1, c2])
    c3 = layers.Conv2D(32, (3,3), activation='relu', padding='same')(concat1)

    u2 = layers.UpSampling2D((2,2))(c3)
    concat2 = layers.concatenate([u2, c1])
    c4 = layers.Conv2D(16, (3,3), activation='relu', padding='same')(concat2)

    outputs = layers.Conv2D(1, (1,1), activation='sigmoid')(c4)

    return models.Model(inputs, outputs)

model = simple_unet()

Keeps the architecture lightweight for demonstration.
Replace with a more sophisticated U‑Net variant for production.

3. Loss Integration

loss_fn = DiceLoss()

Instantiated once; can be reused if training multiple models.

4. Compilation & Training

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              loss=loss_fn,
              metrics=[tfa.metrics.SoftIoUScore()])

history = model.fit(train_ds,
                    validation_data=val_ds,
                    epochs=50,
                    callbacks=[
                        tf.keras.callbacks.ModelCheckpoint('checkpoints/best',
                                                           save_best_only=True,
                                                           monitor='val_loss')
                    ])

Key Points

model.compile() accepts any loss instance.
SoftIoUScore from TensorFlow Addons provides a direct metric to compare with Dice.

5. Evaluation

val_metrics = model.evaluate(val_ds)
print("Validation IoU:", val_metrics[1])  # Index 1 corresponds to SoftIoUScore

Integrating with tf.keras API

API	How Custom Loss Helps
`model.compile()`	Accepts functional or `Loss` objects; you can mix with `lambda` expressions.
`model.fit()`	Pass `y_pred` through the batch‑wise loss; automatically handles back‑prop.
Callbacks	`tf.keras.callbacks.ReduceLROnPlateau` can be triggered when validation loss stalls.

Tip: To keep the training loop concise, wrap the custom loss into a tf.keras.losses.Loss subclass. It eliminates boilerplate and ensures consistent logging.

Debugging and Tips

1. Inspecting Gradients

Use tf.GradientTape to confirm gradients are flowing:

y_true = tf.random.uniform((8, 256, 256, 1), maxval=2, dtype=tf.int32)
y_pred = tf.random.uniform((8, 256, 256, 1))

with tf.GradientTape() as tape:
    loss_value = dice_loss(y_true, y_pred)
grads = tape.gradient(loss_value, [y_pred])
print("Gradient norm:", tf.norm(grads).numpy())

If the norm is close to zero or infinite, investigate the loss expression for numerical issues.

2. Numerical Stability

Add a smooth term in Dice or Focal loss to avoid division by zero.
Clamp predictions to [ε, 1-ε] before computing log or division.
Use tf.math.log instead of math.log for back‑prop.

y_pred = tf.clip_by_value(y_pred, 1e-7, 1-1e-7)

3. Vectorization

Custom losses should be vector‑wise. Avoid explicit Python loops; use tf.reduce_sum, tf.reduce_mean, and broadcasting.

4. GPU & SIMD Utilization

Define the loss as a single tf.Tensor operation. TensorFlow’s autotuner schedules GPU kernels when the loss is tensor‑friendly and compiled with tf.function.

@tf.function
def custom_loss_tuned(y_true, y_pred):
    # Implementation identical to functional but wrapped
    return dice_loss(y_true, y_pred)

model.compile(optimizer='adam',
              loss=custom_loss_tuned,
              metrics=['accuracy'])

Advanced: Custom Loss with Multi‑Output Models

When a model has several heads (e.g., depth estimation + semantic segmentation), each head often needs a different loss. A weighted sum with task‑specific losses is the norm:

class MultiTaskLoss(tf.keras.losses.Loss):
    def __init__(self, depth_weight=1.0, seg_weight=1.0, name='multi_task_loss'):
        super().__init__(name=name)
        self.depth_weight = depth_weight
        self.seg_weight = seg_weight

    def call(self, y_true, y_pred):
        # y_true and y_pred are lists [depth, segmentation]
        depth_true, seg_true = y_true
        depth_pred, seg_pred = y_pred

        depth_loss = tf.reduce_mean(tf.square(depth_true - depth_pred))
        seg_loss = dice_loss(seg_true, seg_pred)

        return self.depth_weight * depth_loss + self.seg_weight * seg_loss

Instantiate in model.compile:

model.compile(optimizer='adam',
              loss=MultiTaskLoss(depth_weight=0.5, seg_weight=1.5),
              metrics=[...])

Performance Considerations: GPU vs CPU

Operation	GPU Throughput	CPU Throughput
Element‑wise, vectorized loss	~10‑20× faster	1×
Custom `tf.function` + `tf.GradientTape`	4‑5× faster	1×
`tf.custom_gradient` (non‑vectorized loops)	May hit kernel launch overhead	Prefer CPU to avoid GPU stalls

Rule of thumb: Keep the loss expression simple and vectorized. Avoid heavy Python loops inside losses when training on GPU.

Common Pitfalls

Non‑differentiable operations (e.g., tf.argmax, max‑pooling without gradients).
Fix: Use tf.math.softmax or tf.nn.softmax_cross_entropy_with_logits as a base.
NaNs or Infs in loss due to division by zero or log(0).
Fix: Add small epsilon, clip predictions.
Gradient mismatch – custom loss yields zero gradient because of shape mismatch.
Fix: Ensure y_true and y_pred shapes align, use tf.reshape.
Under‑parallelization due to batch‑size being 1.
Fix: Use tf.data batching and tf.keras.utils.multi_gpu_model where possible.

Wrap‑Up

Custom loss functions are a powerful tool for aligning training objectives with domain requirements. By mastering the patterns outlined above—functional, subclassed, and custom gradient—you can craft losses that:

Encode complex relationships (e.g., ranking, metrics).
Handle imbalanced data gracefully.
Scale safely across GPUs.

Always validate gradients with tf.GradientTape, ensure numerical stability, and encapsulate reusable logic in loss subclasses. With these practices, your models will not only learn faster but also smarter.

Remember:
*“In deep learning, the loss is not just an error metric—it’s the lens through which your model learns.”

Motto
“A well‑crafted loss turns data into insight and predictions into expertise.”