Foreward

As human, it is so easy to identify objects in an image. If you look at the image to the right, you can identify a girl reading a book with headphones on her head, a boy carrying a backpack riding a bicycle, picnic baskets, lamppost, park bench etc.
Although this task looks trivial to us, for a computer model this is very challenging as it involves going through the image multiple times and doing object classification. At this juncture it is worthwhile to have a brief overview of the some of the operations that is possible from computer vision perspective.
Image Operations in Computer Vision
Image Classification deals with recognizing what is in the image being processed. You can view the previous blog on Image Classification here. There will be a key element in the image that computer vision is trying to classify. Consider the following image given below,

This is a single image of a pizza and if we run it through classification models, it should classify this image as a pizza. Of course, we are assuming in this case that the model has been trained to identify pizza.
Object Detection works on images that contain more than one identifiable objects in the image. Model also can locate where identified object is in the frame. This is known as Object Localization wherein the model also provides a bounding box for the object in reference.

Consider the image provided. It has tomatoes and mushrooms that we can add to the pizza above. From computer vision perspective, if we want to identify these vegetables, we will run it through an object detection model.
We can also add bounding boxes for each identified objects in the image. That would be object localization.
Object Detection in ml5.js
ML5.js provides two different models to do object detection in images: Yolo and CocoSsd. You can use Yolo is if you need faster speed. For most use case recommendation is to use CoCoSsd. In the example below we will use CocoSsd as the model.
We will start with loading the objectDetector provided by ml5.
const dmdl = ml5.objectDetector('cocossd', modelLoaded); function modelLoaded() { console.log('Model loaded'); }
Further we load an image to be processed in an HTML image object. We then pass this image object to the detection model. Ml5 will return an array of all items found with corresponding bounding boxes for all recognizable objects. We are going to draw these bounding boxes so that we know where the objects lie.
function startAnalysis() { dmdl.detect(img, gotResult); } function gotResult(err, objects) { if (err) { console.log(err); } let cnv = document.getElementById('cnv'); let ctx = cnv.getContext('2d'); ctx.font = "bold 12px Arial"; for (let i = 0; i < objects.length; i++) { let obj = objects[i]; // Draw a bounding box let clr = getRandomColor(); ctx.lineWidth = "1"; ctx.strokeStyle = clr; ctx.strokeRect(obj.x * fact, obj.y * fact, obj.width * fact, obj.height * fact); ctx.fillStyle = clr; ctx.fillRect(obj.x * fact, obj.y * fact, obj.width * fact, 16); ctx.fillStyle = "black"; ctx.fillText(obj.label + " [" + obj.confidence.toFixed(2) + "]", obj.x * fact, (obj.y * fact) + 10); } }
In this case fact is just a factor by which I had reduced the image sizes so that they can fit in smaller boundaries. You can ignore this from the codes. Provided below is a sample response.

Here is a nice family enjoying cosy time at the breakfast table. CocoSSD did a fairly good job of identifying three ‘person’ in this image and also found a ‘dining table’.
During some later video, I will work on creating one of these models. Ciao for now!
All photos are attributed to pexels.com, and the vector image on top is attributed to stockunlimited.com.