Image Classification with ml5.js

Convolutional model

Background

Image recognition and classification is a key area in the fields of deep learning problems. It is made possible by neural networks. Convolutional neural networks (CNN) is a breakthrough in image recognition making it possible to identify and classify objects in 2D space.

My interest for image recognition started when I saw face recognition being done by some of the software and identifying faces in group photos uploaded to the servers. Computers also started to ‘communicate’ with you and could help you stay more organized. It suddenly looked as if the computer has become a lot more personal now. Looking for more information I came across neural networks and especially CNNs and RNNs (Recurrent Neural Network). CNNs deal with more sparse data and hence is a good fit for image processing, whereas RNN deals with sequential data and is can be used for natural language processing.

About CNN

CNN is used for supervised learning primary for image classification where it segregates images by specific ‘features’. These features are based on filters that is run on the tensors that is provided as input (image data) and can be as simple as an edge detector to a very complicated filter that may identify an object. CNN aims to group the sparse image data into simpler models and eventually arrive at a fully connected multilayer perceptron (MLP). CNN does the following operations as described below.

It starts by running convolutional functions (filters) on the input tensor. Normally, this process will convolve over a square group of pixels applying a filter that is of the same dimension (kernel) and arrive at a new matrix. An activation function (specifically rectified linear units, or ReLU) is then applied to the matrix. Pooling (max pooling) or sub-sampling happens next that reduces the initial tensor to a smaller set of related features. This process can continue for some iterations and then finally arrives at a fully connected layer with each node having a unique label.

ml5.js and Machine Learning

This post of course does not intend to describe CNN fully, there are numerous papers on it already. My first attempt to find ways to implement it was by using tensorflow, keras and python. However, when tensorflow.js was introduced, that was a wonderful thing that can happen as now I can have image recognition run directly on my browser without any server side call. However, you will still need to understand about all the layers when dealing with tensorflow, so when I saw ml5.js, I thought of giving it a try. And of course, it made machine learning development even more fun truly bringing machine learning to the masses.

I have been dabbling for sometime with ml5.js, so I thought I will start some blog posts of the various features available. I will try to work through an image classification problem in this post, and eventually I also intend to post some that deals with object detection, retraining an existing model, creating your own model and other available features.

MobileNet and ImageNet

Before we start on the journey, it is worthwhile to at least dedicate a paragraph on MobileNet. ml5.js uses MobileNet as the base model for image classification. According to the page here:

MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices with TensorFlow Mobile.

MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature.

MobileNet is created from images that are sourced from image database that can be found at ImageNet. This contains over 14 million images with over 20,000 classes. However, modeling for all the classes would create a large tensorflow model file, so MobileNet for the web only contains a very small subset of these classes.

Classifier Test

For this sample test, I created a very basic html page and embedded javascript within it. I have some images for animals to see how they are classified.

Tiger classification

Label: tiger, Panthera tigris
with confidence: 0.89

This classification was done by just calling one line of code, as provided in the function below. A HTML image object was sent to this code for analysis.

const cl = ml5.imageClassifier('MobileNet', modelLoaded);
function startClassify() {
  if (img) {
      cl.classify(img, (err, results) => {
      if (err) {
        console.log(err);
      } else {
        console.log(results);
        console.log('Label: ' + results[0].label + 
              ' with confidence: ' + (results[0].confidence).toFixed(2));
      }
    });
  }

That definitely looks very promising. MobileNet identified that animal as a tiger with 89% confidence. So, I send across one more sample image as shown below.

Label: Arabian camel, dromedary, Camelus dromedarius with confidence: 0.23

Ah, the beautiful arabian camel grazing in the valley! This is one of the gotchas of pre-defined models. Because the models are based on limited subset and will predict based on images that were provided during training, anything not in this training set will trigger a low confidence map. In this case, AI was not very confident it was a camel (23%), but that is the animal it resembles more with. Other predictions were sorrel (3%), and Saluki, gazelle hound (3%). Just for a complete picture I got an image for a camel from Wikimedia and passed it through the classifier. Note the increase in confidence in the next image.

Label: Arabian camel, dromedary, Camelus dromedarius with confidence: 1.00

A perfect 100%! That’s what I call confidence. The reason is possibly because this is one of the images that was in the training set from ImageNet.

Wow, that became a long post. I intended to keep it short. But what I wanted to takeaway from that is pre-defined models, however good they are, have limited functionality when we need to define it for a specific purpose. In one of the later articles, I will talk about retraining the models using ml5.js and create custom models for use in our projects.

Have a blessed day!