# Curiousily

## Face Detection on Custom Dataset with Detectron2 and PyTorch using Python

Deep Learning, PyTorch, Machine Learning, Computer Vision, Object Detection, Face Detection, Python5 min read

Share

TL;DR Learn how to prepare a custom Face Detection dataset for Detectron2 and PyTorch. Fine-tune a pre-trained model to find face boundaries in images.

Face detection is the task of finding (boundaries of) faces in images. This is useful for

• security systems (the first step in recognizing a person)
• autofocus and smile detection for making great photos
• detecting age, race, and emotional state for markering (yep, we already live in that world)

Historically, this was a really tough problem to solve. Tons of manual feature engineering, novel algorithms and methods were developed to improve the state-of-the-art.

These days, face detection models are included in almost every computer vision package/framework. Some of the best-performing ones use Deep Learning methods. OpenCV, for example, provides a variety of tools like the Cascade Classifier.

In this guide, you’ll learn how to:

• prepare a custom dataset for face detection with Detectron2
• use (close to) state-of-the-art models for object detection to find faces in images
• You can extend this work for face recognition.

Here’s an example of what you’ll get at the end of this guide:

## Detectron 2

Detectron2 is a framework for building state-of-the-art object detection and image segmentation models. It is developed by the Facebook Research team. Detectron2 is a complete rewrite of the first version.

Under the hood, Detectron2 uses PyTorch (compatible with the latest version(s)) and allows for blazing fast training. You can learn more at introductory blog post by Facebook Research.

The real power of Detectron2 lies in the HUGE amount of pre-trained models available at the Model Zoo. But what good that would it be if you can’t fine-tune those on your own datasets? Fortunately, that’s super easy! We’ll see how it is done in this guide.

### Installing Detectron2

At the time of this writing, Detectron2 is still in an alpha stage. While there is an official release, we’ll clone and compile from the master branch. This should equal version 0.1.

Let’s start by installing some requirements:

1!pip install -q cython pyyaml==5.1
2!pip install -q -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

2!pip install -q -e detectron2_repo

At this point, you’ll need to restart the notebook runtime to continue!

1!pip install -q -U watermark
2%watermark -v -p numpy,pandas,pycocotools,torch,torchvision,detectron2
1CPython 3.6.9
2IPython 5.5.0
3
4numpy 1.17.5
5pandas 0.25.3
6pycocotools 2.0
7torch 1.4.0
8torchvision 0.5.0
9detectron2 0.1
1import torch, torchvision
2import detectron2
3from detectron2.utils.logger import setup_logger
4setup_logger()
5
6import glob
7
8import os
9import ntpath
10import numpy as np
11import cv2
12import random
13import itertools
14import pandas as pd
15from tqdm import tqdm
16import urllib
17import json
18import PIL.Image as Image
19
20from detectron2 import model_zoo
21from detectron2.engine import DefaultPredictor, DefaultTrainer
22from detectron2.config import get_cfg
23from detectron2.utils.visualizer import Visualizer, ColorMode
25from detectron2.evaluation import COCOEvaluator, inference_on_dataset
26from detectron2.structures import BoxMode
27
28import seaborn as sns
29from pylab import rcParams
30import matplotlib.pyplot as plt
31from matplotlib import rc
32
33%matplotlib inline
34%config InlineBackend.figure_format='retina'
35
36sns.set(style='whitegrid', palette='muted', font_scale=1.2)
37
38HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#ADFF02", "#8F00FF"]
39
40sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))
41
42rcParams['figure.figsize'] = 12, 8
43
44RANDOM_SEED = 42
45np.random.seed(RANDOM_SEED)
46torch.manual_seed(RANDOM_SEED)

## Face Detection Data

Our dataset is provided by Dataturks, and it is hosted on Kaggle. Here’s an excerpt from the description:

Faces in images marked with bounding boxes. Have around 500 images with around 1100 faces manually tagged via bounding box.

1!gdown --id 1K79wJgmPTWamqb04Op2GxW0SW9oxw8KS

Let’s load the file into a Pandas dataframe:

Each line contains a single face annotation. Note that multiple lines might point to a single image (e.g. multiple faces per image).

## Data Preprocessing

The dataset contains only image URLs and annotations. We’ll have to download the images. We’ll also normalize the annotations, so it’s easier to use them with Detectron2 later on:

1os.makedirs("faces", exist_ok=True)
2
3dataset = []
4
5for index, row in tqdm(faces_df.iterrows(), total=faces_df.shape[0]):
6 img = urllib.request.urlopen(row["content"])
7 img = Image.open(img)
8 img = img.convert('RGB')
9
10 image_name = f'face_{index}.jpeg'
11
12 img.save(f'faces/{image_name}', "JPEG")
13
14 annotations = row['annotation']
15 for an in annotations:
16
17 data = {}
18
19 width = an['imageWidth']
20 height = an['imageHeight']
21 points = an['points']
22
23 data['file_name'] = image_name
24 data['width'] = width
25 data['height'] = height
26
27 data["x_min"] = int(round(points[0]["x"] * width))
28 data["y_min"] = int(round(points[0]["y"] * height))
29 data["x_max"] = int(round(points[1]["x"] * width))
30 data["y_max"] = int(round(points[1]["y"] * height))
31
32 data['class_name'] = 'face'
33
34 dataset.append(data)

Let’s put the data into a dataframe so we can have a better look:

1df = pd.DataFrame(dataset)
1print(df.file_name.unique().shape[0], df.shape[0])
1409 1132

We have a total of 409 images (a lot less than the promised 500) and 1132 annotations. Let’s save them to the disk (so you might reuse them):

### Data Exploration

Let’s see some sample annotated data. We’ll use OpenCV to load an image, add the bounding boxes, and resize it. We’ll define a helper function to do it all:

1def annotate_image(annotations, resize=True):
2 file_name = annotations.file_name.to_numpy()[0]
4
5 for i, a in annotations.iterrows():
6 cv2.rectangle(img, (a.x_min, a.y_min), (a.x_max, a.y_max), (0, 255, 0), 2)
7
8 if not resize:
9 return img
10
11 return cv2.resize(img, (384, 384), interpolation = cv2.INTER_AREA)

Let’s start by showing some annotated images:

1img_df = df[df.file_name == df.file_name.unique()[0]]
2img = annotate_image(img_df, resize=False)
3
4plt.imshow(img)
5plt.axis('off');

1img_df = df[df.file_name == df.file_name.unique()[1]]
2img = annotate_image(img_df, resize=False)
3
4plt.imshow(img)
5plt.axis('off');

Those are good ones, the annotations are clearly visible. We can use torchvision to create a grid of images. Note that the images are in various sizes, so we’ll resize them:

1sample_images = [annotate_image(df[df.file_name == f]) for f in df.file_name.unique()[:10]]
2sample_images = torch.as_tensor(sample_images)
1sample_images.shape
1torch.Size([10, 384, 384, 3])
1sample_images = sample_images.permute(0, 3, 1, 2)
1sample_images.shape
1torch.Size([10, 3, 384, 384])
1plt.figure(figsize=(24, 12))
2grid_img = torchvision.utils.make_grid(sample_images, nrow=5)
3
4plt.imshow(grid_img.permute(1, 2, 0))
5plt.axis('off');

You can clearly see that some annotations are missing (column 4). That’s real life data for you, sometimes you have to deal with it in some way.

## Face Detection with Detectron 2

It is time to go through the steps of fine-tuning a model using a custom dataset. But first, let’s save 5% of the data for testing:

2
3IMAGES_PATH = f'faces'
4
5unique_files = df.file_name.unique()
6
7train_files = set(np.random.choice(unique_files, int(len(unique_files) * 0.95), replace=False))
8train_df = df[df.file_name.isin(train_files)]
9test_df = df[~df.file_name.isin(train_files)]

The classical train_test_split won’t work here, cause we want a split amongst the file names.

The next parts are written in a bit more generic way. Obviously, we have a single class - face. But adding more should be as simple as adding more annotations to the dataframe:

1classes = df.class_name.unique().tolist()

Next, we’ll write a function that converts our dataset into a format that is used by Detectron2:

1def create_dataset_dicts(df, classes):
2 dataset_dicts = []
3 for image_id, img_name in enumerate(df.file_name.unique()):
4
5 record = {}
6
7 image_df = df[df.file_name == img_name]
8
9 file_path = f'{IMAGES_PATH}/{img_name}'
10 record["file_name"] = file_path
11 record["image_id"] = image_id
12 record["height"] = int(image_df.iloc[0].height)
13 record["width"] = int(image_df.iloc[0].width)
14
15 objs = []
16 for _, row in image_df.iterrows():
17
18 xmin = int(row.x_min)
19 ymin = int(row.y_min)
20 xmax = int(row.x_max)
21 ymax = int(row.y_max)
22
23 poly = [
24 (xmin, ymin), (xmax, ymin),
25 (xmax, ymax), (xmin, ymax)
26 ]
27 poly = list(itertools.chain.from_iterable(poly))
28
29 obj = {
30 "bbox": [xmin, ymin, xmax, ymax],
31 "bbox_mode": BoxMode.XYXY_ABS,
32 "segmentation": [poly],
33 "category_id": classes.index(row.class_name),
34 "iscrowd": 0
35 }
36 objs.append(obj)
37
38 record["annotations"] = objs
39 dataset_dicts.append(record)
40 return dataset_dicts

We convert every annotation row to a single record with a list of annotations. You might also notice that we’re building a polygon that is of the exact same shape as the bounding box. This is required for the image segmentation models in Detectron2.

You’ll have to register your dataset into the dataset and metadata catalogues:

1for d in ["train", "val"]:
2 DatasetCatalog.register("faces_" + d, lambda d=d: create_dataset_dicts(train_df if d == "train" else test_df, classes))
4

Unfortunately, evaluator for the test set is not included by default. We can easily fix that by writing our own trainer:

1class CocoTrainer(DefaultTrainer):
2
3 @classmethod
4 def build_evaluator(cls, cfg, dataset_name, output_folder=None):
5
6 if output_folder is None:
7 os.makedirs("coco_eval", exist_ok=True)
8 output_folder = "coco_eval"
9
10 return COCOEvaluator(dataset_name, cfg, False, output_folder)

The evaluation results will be stored in the coco_eval folder if no folder is provided.

Fine-tuning a Detectron2 model is nothing like writing PyTorch code. We’ll load a configuration file, change a few values, and start the training process. But hey, it really helps if you know what you’re doing 😂

For this tutorial, we’ll use the Mask R-CNN X101-FPN model. It is pre-trained on the COCO dataset and achieves very good performance. The downside is that it is slow to train.

Let’s load the config file and the pre-trained model weights:

1cfg = get_cfg()
2
3cfg.merge_from_file(
4 model_zoo.get_config_file(
6 )
7)
8
9cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
11)

Specify the datasets (we registered those) we’ll use for training and evaluation:

1cfg.DATASETS.TRAIN = ("faces_train",)
2cfg.DATASETS.TEST = ("faces_val",)

And for the optimizer, we’ll do a bit of magic to converge to something nice:

1cfg.SOLVER.IMS_PER_BATCH = 4
2cfg.SOLVER.BASE_LR = 0.001
3cfg.SOLVER.WARMUP_ITERS = 1000
4cfg.SOLVER.MAX_ITER = 1500
5cfg.SOLVER.STEPS = (1000, 1500)
6cfg.SOLVER.GAMMA = 0.05

Except for the standard stuff (batch size, max number of iterations, and learning rate) we have a couple of interesting params:

• WARMUP_ITERS - the learning rate starts from 0 and goes to the preset one for this number of iterations
• STEPS - the checkpoints (number of iterations) at which the learning rate will be reduced by GAMMA

Finally, we’ll specify the number of classes and the period at which we’ll evaluate on the test set:

3
4cfg.TEST.EVAL_PERIOD = 500

Time to train, using our custom trainer:

1os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
2
3trainer = CocoTrainer(cfg)
5trainer.train()

## Evaluating Object Detection Models

Evaluating object detection models is a bit different when compared to evaluating standard classification or regression models.

The main metric you need to know about is IoU (intersection over union). It measures the overlap between two boundaries - the predicted and ground truth one. It can get values between 0 and 1.

$\text{IoU}=\frac{\text{area of overlap}}{\text{area of union}}$

Using IoU, one can define a threshold (e.g. >0.5) to classify whether a prediction is a true positive (TP) or a false positive (FP).

Now you can calculate average precision (AP) by taking the area under the precision-recall curve.

Now AP@X (e.g. AP50) is just AP at some IoU threshold. This should give you a working understanding of how object detection models are evaluated.

I suggest you read the mAP (mean Average Precision) for Object Detection tutorial by Jonathan Hui if you want to learn more on the topic.

I’ve prepared a pre-trained model for you, so you don’t have to wait for the training to complete. Let’s download it:

2!mv face_detector.pth output/model_final.pth

We can start making predictions by loading the model and setting a minimum threshold of 85% certainty at which we’ll consider the predictions as correct:

1cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
3predictor = DefaultPredictor(cfg)

Let’s run the evaluator with the trained model:

1evaluator = COCOEvaluator("faces_val", cfg, False, output_dir="./output/")

## Finding Faces in Images

Next, let’s create a folder and save all images with predicted annotations in the test set:

1os.makedirs("annotated_results", exist_ok=True)
2
3test_image_paths = test_df.file_name.unique()
1for clothing_image in test_image_paths:
2 file_path = f'{IMAGES_PATH}/{clothing_image}'
4 outputs = predictor(im)
5 v = Visualizer(
6 im[:, :, ::-1],
8 scale=1.,
9 instance_mode=ColorMode.IMAGE
10 )
11 instances = outputs["instances"].to("cpu")
13 v = v.draw_instance_predictions(instances)
14 result = v.get_image()[:, :, ::-1]
15 file_name = ntpath.basename(clothing_image)
16 write_res = cv2.imwrite(f'annotated_results/{file_name}', result)

Let’s have a look:

1annotated_images = [f'annotated_results/{f}' for f in test_df.file_name.unique()]
2
3plt.imshow(img)
4plt.axis('off');

2
3plt.imshow(img)
4plt.axis('off');

2
3plt.imshow(img)
4plt.axis('off');

2
3plt.imshow(img)
4plt.axis('off');

Note that some faces have multiple bounding boxes (on the second image) with different degrees of certainty. Maybe training the model longer will help? How about adding more or augmenting the existing data?

## Conclusion

Congratulations! You now know the basics of Detectron2 for object detection! You might be surprised by the results, given the small dataset we have. That’s the power of large pre-trained models for you 😍

You learned how to:

• prepare a custom dataset for face detection with Detectron2
• use (close to) state-of-the-art models for object detection to find faces in images
• You can extend this work for face recognition.

Share

## Want to be a Machine Learning expert?

Join the weekly newsletter on Data Science, Deep Learning and Machine Learning in your inbox, curated by me! Chosen by 10,000+ Machine Learning practitioners. (There might be some exclusive content, too!)

You'll never get spam from me

© 2020 Curiousily by Venelin Valkov