Beginner Training Guide

From Upscale Wiki
This is the latest revision of this page; it has no approved revision.
Jump to navigation Jump to search

Getting Started

This guide assumes you already know how to download/clone repositories from GitHub and that you are familiar with python and pip.

Choosing a Fork

The original BasicSR fork by Xinntao still works fine for training. However, it lacks a lot of features and bug fixes that forks made by various community members have. Also, if you get the latest Xinntao fork you will be training a new-arch model instead of an old-arch model. This means you will not be able to train in scales other than 4.

If you are reading this guide as a refresher and currently use Victorca25's fork, please consider switching to BlueAmulets fork. It fixes many of the bugs present in Vic's fork (which is no longer being maintained) and has all the same features (and more). If this is your first time reading this guide, I still highly recommend BlueAmulets fork. However, DinJerr's fork is also a good alternative.

You can see a current list of currently maintained forks here.

Installing Dependencies

Every fork of BasicSR will require a few dependencies to be installed. You can install most of them at once by running this in the console: pip install numpy opencv-python pyyaml tensorboardx

You also need to install pytorch and torchvision, but which version you need depends on your system. If you have an NVIDIA graphics card, make sure to select the latest cuda version from the list. If you don't, I don't recommend training in the first place. Just select whatever stable version is currently available, pick your OS, choose to install it through pip, then run the output it gives.

Once this is done you should have all the required dependencies. Some forks may require others, but these should work for most of them.

Creating a Dataset

All BasicSR/ESRGAN models are trained using low-resolution images, often called LR for short or LQ (Low Quality), and high-resolution images, often called HR for short or GT (Ground Truth). For a 4x scale model, this means that your LR images will be 4x smaller in resolution than your HR images.

It is important to create the best dataset you can for your upscale task. Many pre-existing datasets exist, such as DF2K or Manga109, but a dataset can be anything. Your HRs could be high quality frames of a TV show, for example, with the LRs being the same images scaled down by 4 using a bicubic filter. This would then create a model that is good at upscaling small images that are visually similar to the LRs you created. The dataset is arguably the most important part of training a model. Without a good dataset, your model will not work well.

Examples of bad datasets

  • Random images with no similarity to each other
  • Images with lots of JPEG artifacts
  • Low-resolution images
  • A dataset with only 5 images
  • A dataset where every image is almost exactly the same

Examples of good datasets

  • 800 high-quality pictures of mountains
  • 3,000 exported frames of a 1080p cartoon
  • 500 images of cropped-out faces

A few things to note:

  • Your HR images and LR images must have exactly matching names
  • Your HR images must be exactly 4x (or whatever scale you are training) the resolution of your LRs. This means you must crop your HR images so that each dimension (width and height) are multiples of 4.
  • The more images you have, the better the model will become (generally)

Once you have your dataset set up, you need to create a validation set. This can just be a few images taken from your LR and HR folders and placed into separate HR and LR directories. These images are just used as a reference to see how your model is doing during training. Note: These images will NOT affect training in any way.

Configuring BasicSR

This configuration setup is based on BlueAmulet's fork and will be explaining how to modify the YAML training configs. A full explanation of every option is available here but I don't recommend looking at this if you're a beginner. The options that you need to change will be explained below.

First, you should know where the training configs are located. You can find them in /codes/options/train/. Here, you will find train_template.yml. If you will be modifying this file, I recommend making a copy of it just in case. However, if you are just starting out I recommend using this stripped-down config file that will be much easier to customize. Just copy and paste that text into a new .yml file and there will be only a few changes you need to make:

name
This will be your model name. Typically, these also include the scale. Example: 4xBox, 2xFaithful.
scale
This is the scale of your model. You should already know what this is if you have already created your dataset. Typically this is just 4.
dataroot_HR
This is the path to your dataset's HR folder
dataroot_LR
This is the path to your dataset's LR folder
n_workers
This is the number of threads that BasicSR will use. Typically this I just the number of cores your CPU has.
batch_size
This is the number of images that BasicSR will look at in each iteration. Typically this is set to the highest it will go before running out of VRAM.
dataroot_HR (validation)
This is the path to your dataset's validation HR folder
dataroot_LR (validation)
This is the path to your dataset's validation LR folder
root
The direct path to the directory of the repository you downloaded
pretrained_model_G
The model that your model will use as a sort of base to get started with. The ones included with BasicSR originally are RRDB_ESRGAN_x4.pth or RRDB_PSNR_x4.pth, but you can use any old-arch model.
val_freq
The frequency that BasicSR will run ESRGAN on your validation LRs using the latest version of your model. Typically this is set to 5000.
save_checkpoint_freq
The frequency that BasicSR will save your model.