— Neural Networks, Deep Learning, Computer Vision, TensorFlow, Machine Learning, JavaScript — 3 min read
Share
TL;DR Learn how to use TensorFlow’s Object Detection model (COCO-SSD) to detect intruders from images and webcam feeds
Automated surveillance has always been a goal for a variety of good/bad actors around the globe. With the advance of Machine Learning, this might’ve become a lot easier.
You’re not interested in all that. You have one simple goal. You just want to sleep at night, when far away from home. A simple burglar detection system that notifies you if something is off will do. Your webcam is laying around, waiting patiently. How can you use it?
Here’s what you’ll learn:
Run the complete source code for this tutorial right in your browser:
Live demo of the Burglar Alarm app
The ultimate “Life Hack” when designing and training a Machine Learning model is to NOT DO IT. Turns out, you can do it! Companies with lots of resources (think data and compute power) share some of their trained models. And you can use them for free!
You skip some important and hard steps - figuring out the architecture of your model, finding appropriate hyperparameters (learning rates, momentums, regularizers) and waiting for the training to complete. This can be helpful when prototyping a solution and you want some quick and good results.
Our first task is to find people in images/videos. The general problem is known as object detection and deals with detecting different types of objects in images and videos.
One of the largest datasets that include data for our task is Common Objects in Context(COCO). TensorFlow.js offers a pre-trained COCO-SSD model. SSD stands for Single Shot MultiBox Detection. The model is capable of detecting 90 classes of objects.
Let’s take the model for a quick spin. Start by installing the dependencies:
1yarn add @tensorflow/tfjs @tensorflow-models/coco-ssd
Let’s load the model:
1import * as cocoSsd from "@tensorflow-models/coco-ssd"23const model = await cocoSsd.load({ base: "mobilenet_v2" })
The base option controls the CNN model that we’re going to use for detecting objects. We’re using MobileNet v2. It is heavy and slow but provides the best accuracy.
Let’s detect some objects on this image:
1const predictions = await model.detect(image)
The detected result looks like this:
1;[2 {3 bbox: [4 72.00384736061096,5 90.03258740901947,6 158.9742362499237,7 138.44870698451996,8 ],9 class: "tv",10 score: 0.9097887277603149,11 },12 {13 bbox: [14 -3.011488914489746,15 70.23924046754837,16 440.5377149581909,17 376.6170576810837,18 ],19 class: "person",20 score: 0.8925227522850037,21 },22]
Each detected object has a class, score (how certain our model is) and a bounding box. The bounding box shows the coordinates of the smallest rectangle that we can draw around the object.
That looks cool. Except that one of the classes should be “Evil Snail”.
We’ll build a simple React app that can detect intruders from 2 sources - predefined image and webcam feed. We’ll display a simple notification when an intruder is detected. You can replace that by sending an email or any other notification method.
The pre-trained model will do most of the work for us. We’ll have to look for persons in the detected classes (when the confidence is high enough, of course) and send a notification.
We already know that the COCO-SSD model works when there’s something to detect on an image. But imagine you’re laying comfortably at the beach in Hawaii and suddenly receive a burglar alarm notification. How frustrating would it be if it is a false one, and you receive those 2-3 times a day?
Let’s start with an image of something strange:
You left your house. All of a sudden, this balloon started floating around. There are no intruders on this image. Luckily, our model thinks the same way. Feel free to try out with your photos.
Remember that you must filter out classes that are not “person” like so:
1const personDetections = predictions.filter(p => p.class === "person")
Most surveillance systems receive their input from cameras. We’ll do the same thing - use your webcam for surveillance (scared yet?).
Luckily, HTML5 and TensorFlow.js make using your webcam stream easy. Let’s load the webcam stream:
1const loadCamera = async () => {2 if (3 navigator.mediaDevices.getUserMedia ||4 navigator.mediaDevices.webkitGetUserMedia5 ) {6 const stream = await navigator.mediaDevices.getUserMedia({7 video: true,8 audio: false,9 })10 window.stream = stream11 videoRef.current.srcObject = stream12 }13}
This will request permission to use the webcam. Once you have the permission, our HTML5 video element will show the webcam feed.
Once everything is loaded, the function detectFromVideoFrame()
is called. Let’s have a look at its implementation:
1const detectFromVideoFrame = async video => {2 try {3 const predictions = await objectDetector.detect(video)45 const personDetections = predictions.filter(p => p.class === "person")67 showDetections(video, personDetections)8 requestAnimationFrame(() => {9 detectFromVideoFrame(video)10 })11 } catch (error) {12 console.log("Couldn't start the webcam")13 console.error(error)14 }15}
Essentially, we’re chopping the video into images (frames) and detecting persons on each one.
Let’s have a look at the final result:
Of course, you might replace the on-screen notification with something more sophisticated. Have a look at the complete source code for the bounding box drawing logic.
Congratulations, you’ve just used a pre-trained model to detect intruders from images and video. Here’s what you’ve learned:
Imagine that your sweet grandma passes in front of the screen. You wouldn’t want to consider her an intruder, right? We’ll handle this problem in the next part.
Run the complete source code for this tutorial right in your browser:
Live demo of the Burglar Alarm app
Share
You'll never get spam from me