TensorflowJS models for text search & video background replacement

Swap Background

Long back, I had done some research on ml5.js. This library is built over tensorflow.js. So, I thought I will go through tensorflowjs also. Looking at this library, it seems like there is a lot of predefined models already created and ready for use.

Tensorflowjs allows you to use existing models, train new models on the browser as well as use models created on tensorflow application. You can find all pre-defined tensorflowjs models here.

TensorFlow.js models

You can find models that support image classification, object detection, generic KNN classifier, body segmentation and pose estimation, natural language question search to name a few. The first program that I tried out was to search text from Wikipedia pages. The second one I tried out was to replace background from a video image. Both of these use pre defined models. I will explain both of these applications here.

Video Background Replacer

This is the first project I did with existing models. I wanted to see if I can mask the background for video using one of the models. I checked the models and found two of them can be used for this requirement. One of them is body-pix and the other one that we could have used is deeplab.

I created a page where I have a video object capturing the video. I then process the video and remove the background based on the mask provided by body-pix.

About Body-pix model

Body-pix model is used for image segmentation. It can identify and create segmentation between human and not a human. You can create a mask of only humans by running a video stream through body-pix. When you run an image through body-pix, it will mark any human pixels as 1 and anything where there is no human will be 0.

We will use this feature to eliminate background from images. Let me first start with important parts of the html. I have pre-loaded all background images in html so that I can refer to them whenever needed.

<img hidden id='none_img' src='../shared/images/none.jpg' />
<img hidden id='forest_img' src='../shared/images/forest.jpg' />
<img hidden id='mountain_img' src='../shared/images/mountain.jpg' />
<img hidden id='office_img' src='../shared/images/office.jpg' />
<img hidden id='sunset_img' src='../shared/images/sunset.jpg' />
<img hidden id='village_img' src='../shared/images/village.jpg' />

I defined some variables we will be using globally across the project.

var streaming = false;
var bodyNet = null;
var curr_image = null;
var width = 450;
var height = 0;

const video = $('#avideo')[0];
const cnvas = $('#acanvas')[0];
const strtb = $('#startb')[0];
const endb  = $('#endb')[0];
const check = $('#backcheck');

const img_none = $('#none_img')[0];
const img_forest = $('#forest_img')[0];
const img_mountain = $('#mountain_img')[0];
const img_office = $('#office_img')[0];
const img_sunset = $('#sunset_img')[0];
const img_village = $('#village_img')[0];

We start by loading the body-pix model. This is a asynchronous call. There are multiple ways of handling async calls in javascript. We can make the function signature as async and use await on the calls. This will ensure the function does not exit before call is finished. However, this means that we are converting an async call to synchronous.

The other way is to have callback functions. In this program I have used callback functions.

Start coding

The first thing is to load up the body-pix library. I have the code below doing that. I am setting a few parameters and you can go to the model site to see what it supports. This model can support MobileNet or ResNet and takes in that as parameter.

 pixSuccess = function(value) {
   	console.log('Body pix loaded...');
   	bodyNet = value;
 }

pixFail = function(error) {
  	const err = 'Error: ' + error.message + ' ' + error.name;
  	console.log(err);
}

// Let's load bodypix
bodyPix.load({
  	architecture: 'MobileNetV1',
  	outputStride: 16,
  	multiplier: 0.75,
  	quantBytes: 2
}).then(pixSuccess).catch(pixFail);

Next I start the video. This is HTML5 getUserMedia API.

rcdSuccess = function(stream) {
  	video.srcObject = stream;
}

rcdFail = function(error) {
  	const err = 'Error: ' + error.message + ' ' + error.name;
  	console.log(err);
}

strtb.addEventListener('click', function(e) {
  	let options = { audio: false, video: { width: { min: width, ideal: width, max: width } } };
  	navigator.mediaDevices.getUserMedia(options)
    	.then(rcdSuccess).catch(rcdFail);
}, false);

Here I have a code for starting and ending the video feed from camera. After we get the video feed, I move it to a canvas. I also remove the background and substitute the preferred background there.

check.change(function() {
  	var valc = $('input[type="radio"][name="bground"]:checked').val();
  	console.log('Checked value: ' + valc);
  	var frame = null;
  	var ctx_tmp = cnvas.getContext('2d');
  	if (valc == 'none') {
    	curr_image = null;
    	ctx_tmp.drawImage(img_none, 0, 0, width, height);
  	} else {
    	var tmp_image;
    	if (valc == 'forest') {
      		img_forest.height = height;
      		img_forest.width = width;
      		tmp_image = img_forest;
    	} else if (valc == 'mountain') {
      		img_mountain.height = height;
      		img_mountain.width = width;
      		tmp_image = img_mountain;
    	}
    	ctx_tmp.drawImage(tmp_image, 0, 0, width, height);
    	curr_image = ctx_tmp.getImageData(0, 0, width, height);
  	}
});

videoFrames = async function(tmer, metadata) {
  	if (bodyNet != null) {
    	var segdata = await bodyNet.segmentPerson(video);
    	var ctx_out = cnvas.getContext('2d');
    	ctx_out.drawImage(video, 0, 0, width, height);
    	var out_image = ctx_out.getImageData(0, 0, width, height);

    	if (curr_image != null) {
      		for (x=0; x<width; x++) {
        		for (y=0; y<height; y++) {
          			var n = x + (y * width);
          			if (segdata.data[n] == 0) {
            			// Replace RGB
            			out_image.data[n * 4] = curr_image.data[n * 4];         // R
            			out_image.data[n * 4 + 1] = curr_image.data[n * 4 + 1]; // G
            			out_image.data[n * 4 + 2] = curr_image.data[n * 4 + 2]; // B
            			out_image.data[n * 4 + 3] = curr_image.data[n * 4 + 3]; // A
          			}
        		}
     		 }
    	}

    	ctx_out.putImageData(out_image, 0, 0);
    	video.requestVideoFrameCallback(videoFrames);
  	}
}

The thing of interest here is how we are replacing the background. We are copying over the preferred image in a variable called curr_image. Even though images look like a 2D shape, they are stored in a 1D array. We are calculating the RGBA pixel locations where we want the person image to be, and replace that with camera image. This will give us a overlay of segmented camera image over the preferred image. As you can see from the video above, masking is not perfect and leaves a lot to be desired. I have used the best configuration and tried out, however, even with that I do not get good masking. For a basic and fast implemenation this model definitely works great.

Next on, we will start with the NLP project.

Question Answer finder

The idea for this program is to read a page for COVID-19 from Wikipedia and then get answers from the passage. There are two learnings here. The first one is to use Mediawiki APIs to load pages. Then we will use model qna to for natural language processing and get the answers to the questions asked.

Before we go into the activity, let me give you a brief about how Mediawiki API works. Mediawiki provides a set of APIs to fetch data as RAW or JSON from Wikipedia or Wikimedia. The format for APIs are given below (taken from Mediawiki page).

API EndpointWiki
https://www.mediawiki.org/w/api.phpMediaWiki API
https://meta.wikimedia.org/w/api.phpMeta-Wiki API
https://en.wikipedia.org/w/api.phpEnglish Wikipedia API
https://nl.wikipedia.org/w/api.phpDutch Wikipedia API
https://commons.wikimedia.org/w/api.phpWikimedia Commons API
https://test.wikipedia.org/w/api.phpTest Wiki API
Examples of Wikimedia Wiki Endpoints

We will be using the wikipedia API here to extract data about COVID-19. The API that we are going to use is as follows.

https://en.wikipedia.org/w/api.php?origin=*&format=json&action=query&prop=revisions&titles=COVID-19&rvslots=*&rvprop=content&formatversion=2

This will return me a JSON object that looks like the following.

Mediawiki response
Start coding

I created a very simple screen for this project. I am printing the messages in a text area as and when we do something. This gives a visual feedback without having to log to the console. The screen that I built is as shown below.

If you see the screen above, it printed messages when QNA was loaded as well as when the WIKI page has been retrieved. Unfortunately I could not make this work for large volume of text, only for smaller texts. When putting through a question with large volume of text, I got the error below on both Windows and MacBook.

tf.min.js:17 High memory usage in GPU: 1017.71 MB, most likely due to a memory leak
tf.min.js:17 Uncaught (in promise) Error: Failed to compile fragment shader.

However, since it did work on smaller texts, I would assume that the program itself is correct.

// Success routine for WIKI load
wikiLoadSuccess = function(data) {
  	addMessage('wiki loaded successfully....');
  	wikiLoaded = true;
  	wikiText = data.query.pages[0].revisions[0].slots.main.content.replace( /\s|\[|\]|\{|\}/g, ' ');
  	console.log(wikiText);
}

// Failure routine for WIKI load
wikiLoadFail = function(xhr, err) {
	addMessage('wiki load failed...');
}

// Success routin for QNA load
qnaLoadSuccess = function(value) {
	addMessage('qna loaded successfully....');
	model = value;

	// Load Wiki into the model
	addMessage('Started WIKIpedia page load...');
	$.ajax({
		url: "https://en.wikipedia.org/w/api.php?origin=*&format=json&action=query&prop=revisions&titles=COVID-19&rvslots=*&rvprop=content&formatversion=2",
		success: wikiLoadSuccess,
		error: wikiLoadFail
	});
}

// Failure routine for QNA load
qnaLoadFail = function(err) {
	addMessage('qna load failed...');
}

The functions defined above are used as success and failure routines when loading qna model and text from Wikipedia. The function used to initiate qna load is as given below,

qna.load()
  	.then(qnaLoadSuccess)
  	.catch(qnaLoadFail);

We initiate the model (qna) load on application start. This ensures that the library is available when needed. After the model is loaded, we start loading the Wikimedia page content. One caveat that I noticed is that qna takes a long time to load.

We are ready to ask questions after qna and wiki loads. To facilitate this we have a text box that takes the question from users.

getAnswer = function(data) {
  	console.log(data);
  	if (data.length > 0) {
    	addAnswer('A: ' + data[0].text + '\n', false);
  	} else {
    	addAnswer('A: ' + 'NOT FOUND' + '\n', false);
  	}
}

getAnswerError = function(err) {
  	console.log(err);
}

askIt.addEventListener('click', function(e) {
  	var qst = tqust.val();
  	addAnswer('Q: ' + qst + '?', true);
  	model.findAnswers(qst, wikiText).then(getAnswer).catch();
});

As given in the code above, we call model.findAnswer to get the answer to asked question. This will give an array of responses and we can select one with the highest score.

Conclusion

This blog has become a lot longer than I wanted as we covered two different models in it. Hope you find them useful. As always code can be found on my github page. Ciao for now!