Beginner Training Guide

From Upscale Wiki
This is the latest revision of this page; it has no approved revision.
Jump to navigation Jump to search

Getting Started

This guide assumes you already know how to download/clone repositories from GitHub and that you are familiar with python and pip.

Choosing a Fork

The original BasicSR fork by Xinntao has a lot of issues and in general and lacks a lot of features and bug fixes that various community forks have added and fixed. Also, if you get the latest Xinntao fork you will be training an new-arch model instead of an old-arch model. This means you will not be able to train in scales other than 4.

We highly recommend Victorca25's traiNNer over existing forks of BasicSR. It has many extra features (with many additions making it easier to use), along with all of the features from past versions. If you are reading this guide as a refresher and currently use BlueAmulet's fork, please do switch to traiNNer. BlueAmulet's fork is now unmaintained.

You can see a current list of currently maintained forks here.

Installing Dependencies

traiNNer will require a few dependencies to be installed. You can install most of them (inlcuding optional dependencies) at once by running this in the console: pip install numpy opencv-python pyyaml lmdb scipy Pillow joblib tensorboardx If you would like to use JSON files, run this: pip install PyYAML

You also need to install pytorch and torchvision, but which version you need depends on your system. If you have an NVIDIA graphics card, make sure to select the latest cuda version from the list. If you don't, training is likely not a good idea. Just select whatever stable version is currently available, pick your OS, choose to install it through pip, then run the output it gives.

Once this is done you should have all the required and optional dependencies. Some forks may require others, but these should work for traiNNer.

Creating a Dataset

All BasicSR/traiNNer/ESRGAN models are trained using low-resolution images, often called LR for short or LQ (Low Quality), and high-resolution images, often called HR for short or GT (Ground Truth). For a 4x scale model, this means that your LR images will be 4x smaller in resolution than your HR images.

It is important to create the best dataset you can for your upscale task. Many pre-existing datasets exist, such as DF2 or Manga109, but a dataset can be anything. Your HRs could be high quality frames of a TV show, for example, with the LRs being the same images scaled down by 4 using a bicubic filter. This would then create a model that is good at upscaling small images that are visually similar to the LRs you created. The dataset is one of the most important part of training a model. Without a good dataset, your model will not work well.

Datasets don't have to just be downscaled images, though. You can use images with compression artifacts, or images with noise for the LR/GT frames. This will train your model to remove such artifacts. traiNNer makes this very simple, we'll discuss it lower down in the guide.

Examples of bad datasets in general

  • Random images with no similarity to each other
  • A dataset with only 5 images
  • A dataset where every image is almost exactly the same

Examples of bad HR images

  • Images with lots of JPEG artifacts
  • Low-resolution images

Examples of good datasets

  • 800 high-quality pictures of mountains
  • 3,000 exported frames of a 1080p cartoon
  • 500 images of cropped-out faces

A few things to note:

  • Your HR images and LR images must have exactly matching names
  • Your HR images must be exactly 4x (or whatever scale you are training) the resolution of your LRs. This means you must crop your HR images so that each dimension (width and height) are multiples of 4. (traiNNer takes care of this)
  • The more images you have, the better the model will become (to an extent)

Once you have your dataset set up, you may want to create a validation set. This can just be a few images taken from your LR and HR folders and placed into separate HR and LR directories. These images are just used as a reference to see how your model is doing during training. Note: These images will NOT affect training in any way.

Configuring traiNNer

This configuration setup is based on victorca25's traiNNer and will be explaining how to modify the YAML training configs. The options that you need to change will be explained below.

First, you should know where the training configs are located. You can find them in /codes/options/sr/. Here, you will find train_sr.yml. If you will be modifying this file, I recommend making a copy of it just in case. There will be only a few changes you need to make:

This will be your model name. Typically, these also include the scale. Example: 4xBox, 2xFaithful.
This is the scale of your model. You should already know what this is if you have already created your dataset. Typically this is just 4.
This is the path to your dataset's HR folder
This is the path to your dataset's LR folder
This is the number of threads that traiNNer will use. Typically this is just the number of cores your CPU has.
This is the number of images that traiNNer will look at in each iteration. Typically this is set to the highest it will go before running out of VRAM. It seems to yield more stable results while training.
This is the resolution that traiNNer will automatically crop your dataset to. This number may be lowered to reduce VRAM usage.
dataroot_HR (validation)
This is the path to your dataset's validation HR folder (not required)
dataroot_LR (validation)
This is the path to your dataset's validation LR folder (not required)
The direct path to the directory of the repository you downloaded
The model that your model will use as a sort of base to get started with. The ones included with BasicSR originally are RRDB_ESRGAN_x4.pth or RRDB_PSNR_x4.pth, but you can use any old-arch model.
The frequency that traiNNer will run ESRGAN on your validation dataset using the latest version of your model. Typically this is set to 5000. (not required)
The frequency that traiNNer will save your model. You may desire a lower save frequency if you test the models yourself.


To start training, open your command-line interface of choice, navigate to the /codes/ folder, and type python -opt train_sr.yml, just replace train_sr.yml with whatever your training config is named. If all goes well, it should spit out a bunch of info and then start training from iteration 0, epoch 0. If you set everything up correctly, you should have a new folder in your experiments folder that is named after your model.

To pause training, press CTRL+C. It should save the latest state and model. If you spam it, press it at the wrong time, or have Powershell quick edit mode, there is a possibility it will not work properly. In this case you would fall back to the latest save/resume state which is specified on save_checkpoint_freq (ie: 7200.state). These files are saved in the experiments folder. To resume, edit the training config and remove the # before resume_state. Then link to the .state file. To continue, simply use the command from the previous paragraph.

Common errors

CUDA out of memory
This means you need to decrease your batch size. If you can't decrease your batch size any more, decrease your crop_size.
  • If you're getting this error during validation, it means your validation images are too large. Try cropping them or splitting them into multiple images.
Module not found
This means you did not install the required libraries through pip. Try again or see if your path file is pointing to a different python installation
Could not broadcast shape ____ to shape ____
This could mean a few things, most likely your LR and HR sizes are mismatched. Make sure they are clean multiples of each other.