Curiousily

Deploy BERT for Sentiment Analysis as REST API using PyTorch, Transformers by Hugging Face and FastAPI

01.05.2020 — Deep Learning, NLP, REST, Machine Learning, Deployment, Sentiment Analysis, Python — 3 min read

TL;DR Learn how to create a REST API for Sentiment Analysis using a pre-trained BERT model

In this tutorial, you’ll learn how to deploy a pre-trained BERT model as a REST API using FastAPI. Here are the steps:

Initialize a project using Pipenv
Create a project skeleton
Add the pre-trained model and create an interface to abstract the inference logic
Update the request handler function to return predictions using the model
Start the server and send a test request

Project setup

We’ll manage our dependencies using Pipenv. Here’s the complete Pipfile:

1[[source]]
2name = "pypi"
3url = "https://pypi.org/simple"
4verify_ssl = true
5
6[dev-packages]
7black = "==19.10b0"
8isort = "*"
9flake8 = "*"
10gdown = "*"
11
12[packages]
13fastapi = "*"
14uvicorn = "*"
15pydantic = "*"
16torch = "*"
17transformers = "*"
18
19[requires]
20python_version = "3.8"
21
22[pipenv]
23allow_prereleases = true

The backbone of our REST API will be:

FastAPI - lets you easily set up a REST API (some say it might be fast, too)
Uvicorn - server that lets you do async programming with Python (pretty cool)
Pydantic - data validation by introducing types for our request and response data.

Some tools will help us write some better code (thanks to Momchil Hardalov for the configs):

Black - code formatting
isort - imports sorting
flake8 - check for code style (PEP 8) compliance

Building a skeleton REST API

Let’s start by creating a skeleton structure for our project. Your directory should look like this:

1.
2├── Pipfile
3├── Pipfile.lock
4└── sentiment_analyzer
5    ├── api.py

We’ll start by creating a dummy/stubbed response to test that everything is working end-to-end. Here are the contents of api.py:

1from typing import Dict
2
3from fastapi import Depends, FastAPI
4from pydantic import BaseModel
5
6app = FastAPI()
7
8
9class SentimentRequest(BaseModel):
10    text: str
11
12
13class SentimentResponse(BaseModel):
14
15    probabilities: Dict[str, float]
16    sentiment: str
17    confidence: float
18
19
20@app.post("/predict", response_model=SentimentResponse)
21def predict(request: SentimentRequest):
22    return SentimentResponse(
23        sentiment="positive",
24        confidence=0.98,
25        probabilities=dict(negative=0.005, neutral=0.015, positive=0.98)
26    )

Our API expects a text - the review for sentiment analysis. The response contains the sentiment, confidence (softmax output for the sentiment) and all probabilities for each sentiment.

Adding our model

Here’s the file structure of the complete project:

1.
2├── assets
3│   └── model_state_dict.bin
4├── bin
5│   └── download_model
6├── config.json
7├── Pipfile
8├── Pipfile.lock
9└── sentiment_analyzer
10    ├── api.py
11    ├── classifier
12    │   ├── model.py
13    │   └── sentiment_classifier.py

We’ll need the pre-trained model. We’ll write the download_model script for that:

1#!/usr/bin/env python
2import gdown
3
4gdown.download(
5    "https://drive.google.com/uc?id=1V8itWtowCYnb2Bc9KlK9SxGff9WwmogA",
6    "assets/model_state_dict.bin",
7)

The model can be downloaded from my Google Drive. Let’s get it:

1python bin/download_model

Our pre-trained model is stored as a PyTorch state dict. We need to load it and use it to predict the text sentiment.

Let’s start with the config file config.json:

1{
2    "BERT_MODEL": "bert-base-cased",
3    "PRE_TRAINED_MODEL": "assets/model_state_dict.bin",
4    "CLASS_NAMES": [
5        "negative",
6        "neutral",
7        "positive"
8    ],
9    "MAX_SEQUENCE_LEN": 160
10}

Next, we’ll define the sentiment_classifier.py:

1import json
2
3from torch import nn
4from transformers import BertModel
5
6with open("config.json") as json_file:
7    config = json.load(json_file)
8
9
10class SentimentClassifier(nn.Module):
11    def __init__(self, n_classes):
12        super(SentimentClassifier, self).__init__()
13        self.bert = BertModel.from_pretrained(config["BERT_MODEL"])
14        self.drop = nn.Dropout(p=0.3)
15        self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
16
17    def forward(self, input_ids, attention_mask):
18        _, pooled_output = self.bert(input_ids=input_ids, attention_mask=attention_mask)
19        output = self.drop(pooled_output)
20        return self.out(output)

This is the same model we’ve used for training. It just uses the config file.

Recall that BERT requires some special text preprocessing. We need a place to use the tokenizer from Hugging Face. We also need to do some massaging of the model outputs to convert them to our API response format.

The Model provides a nice abstraction (a Facade) to our classifier. It exposes a single predict() method and should be pretty generalizable if you want to use the same project structure as a template for your next deployment. The model.py file:

1import json
2
3import torch
4import torch.nn.functional as F
5from transformers import BertTokenizer
6
7from .sentiment_classifier import SentimentClassifier
8
9with open("config.json") as json_file:
10    config = json.load(json_file)
11
12
13class Model:
14    def __init__(self):
15
16        self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
17
18        self.tokenizer = BertTokenizer.from_pretrained(config["BERT_MODEL"])
19
20        classifier = SentimentClassifier(len(config["CLASS_NAMES"]))
21        classifier.load_state_dict(
22            torch.load(config["PRE_TRAINED_MODEL"], map_location=self.device)
23        )
24        classifier = classifier.eval()
25        self.classifier = classifier.to(self.device)
26
27    def predict(self, text):
28        encoded_text = self.tokenizer.encode_plus(
29            text,
30            max_length=config["MAX_SEQUENCE_LEN"],
31            add_special_tokens=True,
32            return_token_type_ids=False,
33            pad_to_max_length=True,
34            return_attention_mask=True,
35            return_tensors="pt",
36        )
37        input_ids = encoded_text["input_ids"].to(self.device)
38        attention_mask = encoded_text["attention_mask"].to(self.device)
39
40        with torch.no_grad():
41            probabilities = F.softmax(self.classifier(input_ids, attention_mask), dim=1)
42        confidence, predicted_class = torch.max(probabilities, dim=1)
43        predicted_class = predicted_class.cpu().item()
44        probabilities = probabilities.flatten().cpu().numpy().tolist()
45        return (
46            config["CLASS_NAMES"][predicted_class],
47            confidence,
48            dict(zip(config["CLASS_NAMES"], probabilities)),
49        )
50
51
52model = Model()
53
54
55def get_model():
56    return model

We’ll do the inference on the GPU, if one is available. We return the name of the predicted sentiment, the confidence, and the probabilities for each sentiment.

But why don’t we define all that logic in our request handler function? For this tutorial, this is an example of overengeneering. But in the real world, when you start testing your implementation, this will be such a nice bonus.

You see, mixing everything in the request handler logic will result in countless sleepless nights. When shit hits the fan (and it will) you’ll wonder if your REST or model code is wrong. This way allows you to test them, separately.

The get_model() function ensures that we have a single instance of our Model (Singleton). We’ll use it in our API handler.

Putting everything together

Our request handler needs access to the model to return a prediction. We’ll use the Dependency Injection framework provided by FastAPI to inject our model. Here’s the new predict function:

1@app.post("/predict", response_model=SentimentResponse)
2def predict(request: SentimentRequest, model: Model = Depends(get_model)):
3    sentiment, confidence, probabilities = model.predict(request.text)
4    return SentimentResponse(
5        sentiment=sentiment, confidence=confidence, probabilities=probabilities
6    )

The model gets injected by Depends and our Singleton function get_model. You can really appreciate the power of abstraction by looking at this!

But does it work?

Testing the API

Let’s fire up the server:

1uvicorn sentiment_analyzer.api:app

This should take a couple of seconds to load everything and start the HTTP server.

1http POST http://localhost:8000/predict text="This app is a total waste of time!"

Here’s the response:

1{
2    "confidence": 0.999885082244873,
3    "probabilities": {
4        "negative": 0.999885082244873,
5        "neutral": 8.876612992025912e-05,
6        "positive": 2.614063305372838e-05
7    },
8    "sentiment": "negative"
9}

Let’s try with a positive one:

1http POST http://localhost:8000/predict text="OMG. I love how easy it is to stick to my schedule. Would recommend to everyone!"

1{
2    "confidence": 0.999932050704956,
3    "probabilities": {
4        "negative": 1.834999602579046e-05,
5        "neutral": 4.956663542543538e-05,
6        "positive": 0.999932050704956
7    },
8    "sentiment": "positive"
9}

Both results are on point. Feel free to tryout with some real reviews from the Play Store.

Summary

You should now be a proud owner of ready to deploy (kind of) Sentiment Analysis REST API using BERT. Of course, you’re missing lots of stuff to be production-ready - logging, monitoring, alerting, containerization, and much more. But hey, you did good!

You learned how to:

Initialize a project using Pipenv
Create a project skeleton
Add the pre-trained model and create an interface to abstract the inference logic
Update the request handler function to return predictions using the model
Start the server and send a test request

Go on then, deploy and make your users happy!

References

Want to be a Machine Learning expert?

Join the weekly newsletter on Data Science, Deep Learning and Machine Learning in your inbox, curated by me! Chosen by 10,000+ Machine Learning practitioners. (There might be some exclusive content, too!)

You'll never get spam from me

Hacker's Guide to Neural Networks in JavaScript

Build Machine Learning models (especially Deep Neural Networks) that you can easily integrate with existing or new web apps. Think of your ReactJs, Vue, or Angular app enhanced with the power of Machine Learning models.

Get SH*T Done with PyTorch

Learn how to solve real-world problems with Deep Learning models (NLP, Computer Vision, and Time Series). Go from prototyping to deployment with PyTorch and Python!

Hacker's Guide to Machine Learning with Python

This book brings the fundamentals of Machine Learning to you, using tools and techniques used to solve real-world problems in Computer Vision, Natural Language Processing, and Time Series analysis. The skills taught in this book will lay the foundation for you to advance your journey to Machine Learning Mastery!

Hands-On Machine Learning from Scratch

This book will guide you on your journey to deeper Machine Learning understanding by developing algorithms in Python from scratch! Learn why and when Machine learning is the right tool for the job and how to improve low performing models!