Transfer Learning with ml5.js - Suvendra's Playground

The Journey continues…

On my journey trying to unravel the features of ml5.js, I thought I will try out feature extraction and transfer learning from an existing model. Since we are working on trying out the algorithm we will target a simple use case where we take 3 different objects, and retrain existing MobileNet model to identify them. You can read my previous blog on Object Detection here.

Transfer Learning

One way to teach machines about image class is by giving a lot of samples and pointing out the class for each of these images. This is like teaching a kid what each object around in the world is. Eventually the kid understands and memorizes the shapes and color of these objects so as to identify them. Deep learning algorithms try to extract out patterns from all of these images and generalize them into some complex patterns – identification is done by matching object with the pre-calculated patterns. To teach something new, it is more convenient to teach a kid who has knowledge about the world around him. He can leverage his existing knowledge to remember this object by relating it to a baseline and just identifying some distinct features. So is it for deep learning. You can make it start from an already acquired ‘knowledge’ – it is like saying, you know something about how to identify images of objects, start from that point. So you make it start from already known patterns instead of a blank sheet and train it. This design methodology is known as transfer learning. The advantage of transfer learning is that it is much faster to train, and you can train using a lot fewer images.

Dimensionality Reduction

One big challenge with any machine learning algorithm is what is known as the curse of dimensionality. This means that as the dimension of the feature set increases, it becomes exponentially difficult for an algorithm to fit this into a classification model. Feature Selection and Feature Extraction are two different approaches to reduce dimensionality of the feature set.

Feature selection can be a manual of programmable way to identify less relevant features and drop them from the feature set. For example, let us assume that you collected a large variety of animals as samples and would like to identify between them. One of the identified feature may be that animals have two eyes. However this is a common knowledge that fits all objects in the animal class. So, even though it is a feature, it can be removed from this set.

Feature extraction on the other hand is a technique that is used programmatically to bring a high dimension model to a lesser dimension. In this case it may not be a subset of the features that we started with, however, will be a derivative of a function applied to each feature in that matrix. What that means is we apply a specific function to every element of the input matrix and arrive at new and concise matrix.

Retraining an existing model

So here is the inspiration. I have two classic Indian musical instruments. The first one is called a Sitar (which MobileNet thought may be a corkscrew or an electric guitar), and the second one is called a Tabla. I want to retrain a model so that it can identify these instruments with some confidence.

Before I started the project, I just downloaded 15 each images of sitar and tabla from google. I am thinking of using 10 for my training set, and another 5 for the test set. Even though this is way less than recommended number of images, I thought of starting with small numbers and I can add more if predictions are way off. Given below is my training sets for object of interest. I also have harp and flute images. All images are downloaded and copyright stays with original owners.

So we start from the very beginning and get an instance of ml5.js from CDN.

<script src="https://unpkg.com/ml5@0.5.0/dist/ml5.min.js" type="text/javascript"></script>
<script type = "text/javascript">
  const fex = ml5.featureExtractor('MobileNet', modelLoaded);
  let cf;

  function modelLoaded() {
    cf = fex.classification();
  }

Next we get featureExtractor from ml5 namespace and load MobileNet model. This is one of the models that was already talked about in a previous blog. Instead of using a classifier, in this case we want to start with some existing training, so we start with featureExtractor. The next thing we do is to let the model know if we want to do classification or regression. This is a critical piece of information as subsequent algorithm selection done by ml5 will be based on this selection.

Both classification and regression are part of supervised machine learning. The key differentiator between these two is that classification will always return a class whereas regression will return continuous numerical values.

// Training Starts Here
function addTrainingImage() {
  var txt = document.getElementById('imgclass').value;
  if (txt) {
    cf.addImage(cimg, txt, () => {
      console.log('Image for ' + txt + ' added');
    });
  }
}

Next is to add images to this model that we want to train with corresponding class. I created a text box where I can supply classname that I want to provide. Below is an example of the screen I created for loading images.

Screen to play around with transfer learning

My intent is to add about 30 images to the training set, 10 each for sitar, flute, tabla and then train this model. Training also is just calling a single line of code as shown below.

function startTraining() {
   if (fex.hasAnyTrainedClass) {
      cf.train((lossValue) => {
        if (lossValue) {
           console.log('Loss: ' + lossValue)
        } else {
           console.log('Trained');
        }
      });
    } else {
      console.log('Nothing to train...');
    }
}

I finally uploaded 10 images each of sitar, tabla and flute as seen from the image next to this block.

Trying out predictions next, I was amazed at the accuracy,

This seemed too good to be true with just 10 images per set. And I was right, here is the result for a flute (I realized flutes do look like sitars you just consider the stem of the sitar)

This was identified as a sitar. So, 10 image sample may be good for training 2 different images, definitely not more than that.

Even though this project was not entirely a success, I had all building blocks setup, so someday, when I can write a program to scrape lots of images off google and auto feed to this program, I may again come back to this to get better accuracy.

Time to go to sleep. Ciao for now 🙂