Yolo, way to image recognition - Suvendra's Playground

Welcome to my blog! In this blog, I will try to create a full lifecycle Yolo machine learning project to identify chess pieces. Just a caveat, though I have used Yolo for personal fun projects, I have never tried teaching it any new models. So, this may eventually end up as a research only project, but should be fun nevertheless.

I intend to split this blog into more than one session – with each one talking about a specific aspect on this topic. For this first one, we will start with creating the image and setting up local instance of darknet project. This can be used to train the model. For the next blog, we will take the training to Google colabatory.

Setting Up

For this project, we will try to build a chess piece recognizer. We need some preparation before we can jump into the crux of development. For this we will use python3 including virtual environment module. We will also use LabelImg for creating annotations for chess pieces. Of course, we will also be using darknet for training Yolo. We will use OpenCV for testing the Yolo model that we will create. Last but not least, we will need images that we will be using for training. For this example, I downloaded some images from Google. I only downloaded about 29 images – that is way below the recommended minimum; but then again, we are not modeling a production system :).

Setting up LabelImg

LabelImg is a graphical image annotation tool and is probably the de facto standard for labeling images for Yolo. It is written in Python and uses Qt library for user interface. We can get this library from PyPi, but for this project, I just got the image cloned from tzutalin’s GitHub page into a local directory. Next, I created a virtual environment just to install dependencies.

$ git clone https://github.com/tzutalin/labelImg.git
$ cd labelImg
$ python3 -m venv venv
$ . ./venv/bin/activate
(venv) pip install pyqt5==5.13.2 lxml
(venv) make qt5py3
(venv) python3 labelImg.py

Setting up darknet

Darknet is an open source neural network framework written in C. It uses cuda for GPU computational processing. We will use this framework to train Yolo. However, at a later point when we start using google colab for our real training, we will use use a customized version of darknet. By default, darknet configuration builds only with CPU and has GPU and CUDA disabled. This is fine for now, as my Mac does not have a Nvidia CUDA enabled card.

$ git clone https://github.com/pjreddie/darknet
$ cd darknet
$ make

This will build darknet on your local. We will also need a pre trained model that can be downloaded from here. We will use this weights file at a later point in time when we start training images.

Collect Images

This is potentially a manual task unless you use a screen scraping tool to get images. For this task, I just went into Google and downloaded about 30 images. Most of these images were all chess pieces in one image, but some of them were individual pieces. I added all of the images to a directory so that I can easily process them later. So, at the end of it all, I had one directory full of chess pieces images.

All images are copyright of their owners. I do not own any of these images. I am using these in the blog for demonstration purposes only.

Image Labeling

Image Labeling is the most labor intensive task in this workflow. However, LabelImg makes it easier to do this task. Fire up LabelImg (as described in section on installation) and point it to the directory containing all the downloaded images. Next, draw bounding rectangles for all of the objects of interest in the image and mark them with respective class names. Make sure to select Yolo as output in the tool.

This is how an image will look like after we finish labeling. In this case, we have two of each chess piece. I have named the chess pieces as king, queen, bishop, knight, rook and pawn (classes). I agree those are very original names, thank you!

After we finish labeling all the images, we will end up with two different types of files in this directory. The first one generated is called classes.txt. This file keeps all class names defined during labeling.

pawn
rook
knight
bishop
queen
king

Also, the tool generates an image_name.txt file for each image. This file contains each object class and their respective location (bounding box) in the image.

5 0.121101 0.275000 0.157798 0.418000
5 0.127523 0.737000 0.159633 0.406000
4 0.288073 0.296000 0.157798 0.368000
4 0.302752 0.759000 0.157798 0.370000
3 0.440367 0.317000 0.139450 0.334000
3 0.453211 0.780000 0.143119 0.336000
2 0.596330 0.331000 0.150459 0.306000
2 0.608257 0.794000 0.144954 0.308000
1 0.755046 0.336000 0.148624 0.316000
1 0.760550 0.796000 0.144954 0.308000
0 0.900917 0.368000 0.135780 0.248000
0 0.910092 0.831000 0.139450 0.250000

Now that the hard part is complete, let’s get the rest of the necessary files.

Creating configurations

Next we create a directory to keep all of our configurations. I named it cfg (so very original). We will need to bring in the following files from Yolo website.

Configuration file

For my configuration, I copied yolov3.cfg from Yolo website, and made the following modifications.

batch=64
max_batches=12000 = (number_of_classes * 2000)
steps=9600, 10800 = (80%, 90% of max_batches)
filters=33 = ((number_of_classes + 5) * 3) :: replace all filters=255
classes=6 = (number_of_classes) :: replace all

Pre-trained weight file

We download darknet53.conv.74 from Yolo website. This includes all the convolution layers minus the fully connected layers. We use this as the starting point to train our model.

Training file

We have to create a training file list that contains a list of file locations we want to run training on. Sample provided below (train.txt).

./scaleimg/cp001.jpg
./scaleimg/cp002.jpg
./scaleimg/cp003.jpg
./scaleimg/cp004.jpg
./scaleimg/cp005.jpg
./scaleimg/cp009.jpg

If you want to get all files in the directory to a training file, here is an easy way.

$ find ./*.jpg -maxdepth 1 -print > train.txt

Test File

This is similar to the file above, but keeps a list of all files to cross check during detection (test.txt).

./scaleimg/cp007.jpg
./scaleimg/cp008.jpg

Training configuration

Now these files will be added to a training configuration that will be used by darknet (trainyolo.data).

classes=6
train=cfg/train.txt
valid=cfg/test.txt
names=scaleimg/classes.txt
backup=backup/

Running Training

After all configuration files are created, it is time to start training. We will use darknet for this.

$ ./darknet detector train ./cfg/trainyolo.data ./cfg/myyolov3.cfg \
	./cfg/darknet53.conv.74

Conclusion

The training method above on your local machine will take weeks to run. Unless, of course, you have a machine with a very nice GPU. Since most of us use regular desktops or laptops, the training portion of ML has to be delegated to Google colab. We will take up training on colabatory in our next blog. Eventually what we aim for is image recognition as shown below. This is after we completed only one third of the training, and there is still loss over 0.2.

The next part of this blog will cover how to train this model on Google Colaboratory. You can find the post here.

Ciao for now!