Reproducible runs with fastai

How to replicate results from Fastai models.
fastai
Published

October 23, 2021

The Problem

Reproducibility can end up being important when trying to isolate the impact of the changes that happen as we tweak models.

from fastai.vision.all import *

Grab the pets dataset.

path = untar_data(URLs.PETS)/'images'
def is_cat(x): return x[0].isupper()
100.00% [811712512/811706944 00:09<00:00]

Create a data loader passing in a seed. Next create a learner and fine tune the resnet34 model for 1 epoch.

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=21,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
epoch train_loss valid_loss error_rate time
0 0.129521 0.022127 0.007442 01:10
epoch train_loss valid_loss error_rate time
0 0.056711 0.023975 0.010149 01:18

We end up with an error rate of \(0.010149\).

Let’s do another round where we recreate the dataloaders, the learner and fine tune again for a single epoch. Since we have used the same seed we will get the same final result, right?

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=21,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.140996 0.024327 0.007442 01:07
epoch train_loss valid_loss error_rate time
0 0.058567 0.012324 0.004736 01:18

Wrong!

The train_loss, valid_loss and the error rate at the end of the two rounds are different.

Solution

Use fastai’s set_seed function.

set_seed(21, reproducible=True)

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.151476 0.018651 0.006766 01:42
epoch train_loss valid_loss error_rate time
0 0.042918 0.015299 0.006766 02:20

Observe that I did not pass in the seed to the ImageDataLoaders.from_name_func call.

set_seed(21, reproducible=True)

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.151476 0.018651 0.006766 01:42
epoch train_loss valid_loss error_rate time
0 0.042918 0.015299 0.006766 02:20

Bingo! Both runs end up with the same train_loss, valid_loss and the error rate.

Can we omit the call to set_seed in a subsequent run?

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.161395 0.019973 0.006766 01:42
epoch train_loss valid_loss error_rate time
0 0.070191 0.034742 0.012855 02:20

Nice try but no.

Can we omit the reproducible=True in the call to set_seed?

set_seed(21)

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.151476 0.018651 0.006766 01:43
epoch train_loss valid_loss error_rate time
0 0.042918 0.015299 0.006766 02:21
set_seed(21)

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.151476 0.018651 0.006766 01:43
epoch train_loss valid_loss error_rate time
0 0.042918 0.015299 0.006766 02:21

Seems like we can 🤷 but I would keep it since the code of the set_seed function suggests it is being used for cudnn.

Can we avoid recreating the dataloaders from scratch?

Spoiler alert: No!

set_seed(21, reproducible=True)

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.161448 0.013740 0.004060 01:42
epoch train_loss valid_loss error_rate time
0 0.048693 0.012253 0.003383 02:20

Bottomline

Use the set_seed function (pass in reproducible=True) and remember that any steps consuming random numbers from the pseudo random generators (such as using the learning rate finder) better be present otherwise you will end up seeing a different result.