Simple Machine Learning Model Deployment using FastAPI

Deploy your machine learning model into REST API using FastAPI

Faisal Malik Widya Prasetya
8 min readJul 16, 2020

Most people who want to start learning about data science or machine learning usually looking for data on Kaggle and do some experiments inside the Kaggle notebook. But sometimes after they’re finished with the experiment and produce a great model, they don’t know what to do with the model. There are a bunch of reasons why people do machine learning, it can be just for generating insights so they don’t have to deploy the model, but most likely, the model can be deployed in production so that it can be useful. There are some approaches to deploy machine learning models. Here’s some of them:

  1. REST API
  2. Shared DB
  3. Streaming

Those approaches are compared on a blog post on how to deploy the machine learning model by ChristoperGS. But I’m here to share more about the practical approach of the first approach (REST API). The limitation of this REST API Machine Learning deployment is that the model cannot be trained via REST API, but in the mean time, I’ll manage to do that, because it’s actually possible as a proof of concept.

Get the Model First

I will not focus on model creation here, so let’s pick an existing model on Kaggle. You can pick any kernel on Kaggle as long as it creates a machine learning model, but I’ll pick this kernel because it performs well. Even we just want to deploy the model, we still have to understand the topics, so I’ll explain a little bit about the model.

The idea of this model is to detect if the news is fake or real using supervised machine learning classification. This kernel uses LSTM on Keras library as the model, other kernels might use other libraries for the model, which might have different syntax or methods to execute the process.

Here’s a to-do list after you pick your kernel.

  1. Copy and Edit the Notebook
  2. Run all the cells and wait until the process finished
  3. Add a code cell and export the model to “/kaggle/working/” directory. Below is an example of how to export the Keras model.
model.save('/kaggle/working/model.keras')

Kaggle only allow you to access your temporary output file inside “/kaggle/working/” directory, so you don’t have much choices. You can simply google on how to export the machine learning model, each library has its own approach, make sure you also learn how to import/load the model because you’ll have to do that on the deployment.

Sometimes you have to export the tokenizer for the NLP problem because the tokenizer also learns from the data, just like the kernel that I work on.

import picklewith open('/kaggle/working/tokenizer.pkl', 'wb') as f:
pickle.dump(tokenizer, f, pickle.HIGHEST_PROTOCOL)

Now, after you have exported those necessary files, you can download them and put them inside your project folder.

Note: If you directly export the model and the tokenizer like above, it might not the optimal model and tokenizer, because it only learns from training data which is not all the data. You can modify the code so that it can learn from the training and test data, but you have to do it yourself.

FastAPI

FastAPI is a web framework to create API. It’s called “Fast” because it’s fast to code, fast to deploy, you can learn it fast and also it’s high-performance. If you already have python and pip, you can install it easily. You’ll also need an ASGI server for production, I recommend to use uvicorn.

pip install fastapi
pip install uvicorn

But if you prefer a well-managed environment way, I prefer to use Conda. You learn on how to install conda and with addition to jupyter lab configuration from my previous story. For the FastAPI and Uvicorn installation would be the same, just change the package name.

conda install fastapi uvicorn -y

You can check FastAPI documentation to learn more about it. But for the sake of this tutorial, you just have to learn some basic of REST API which include the requests and responses. The typical characteristic of REST API is that the client send the requests to the server then the server handle those requests and send the responses according to the requests. The client then retrieve that responses. The requests are sent using HTTP methods, there are some HTTP methods but the most common methods are:

  1. GET, to read the data from the server
  2. POST, to add new data to the server
  3. PUT, to modify the data on the server
  4. DELETE, to remove the data on the server

The detailed yet simple explanation of those methods can be seen on Wikipedia page about HTTP. You actually able to create an API that receive POST request but doesn’t add new data at all to the server or other methods for other usage, but off course that’ll be a bad practice because it’ll confuse the clients.

This is not really a best practice to use the FastAPI, especially for a huge project, but as it is not really a big project, I’ll not use any standard project structures or skeletons, just to make sure that the project is actually running well. Here’s the code to initialize the FastAPI, you can play with the title, description and version. The version can be anything, but I prefer you learn about semantic versioning here for better practice of versioning.

from fastapi import FastAPIdescription = """
A REST API to load predict if the news is fake or not using LSTM with GloVe Embeeded Text Data
"""
app_config = {
'title': 'FakeNewsAPI',
'description': description,
'version': '0.0.1'
}
app = FastAPI(**app_config)

The Process Flow

It’s not a well formatted diagram, but at least it can help us understand about what we want to develop. here’s the diagram that describe the process flow of this API.

The Process Flow of Machine Learning Deployment using REST API

Here’s some explanation of the diagram.

  1. Client send request to the server
  2. Server retrieve the request as a Payload
  3. Server load the machine learning model
  4. Server preprocess the request into an acceptable input for the model
  5. Server predict the input using the model
  6. Server format the output into JSON as the response
  7. Server send the response to the client
  8. Client retrieve the response from the server

That’s quite a process, let’s jump to each one of them.

Client send request to the server

This part handled by the client, so what we need to process is that what client expect the server to be retrieving. We make as simple as the dataset of the original kernel, which only contain the title, text, subject and date, but in the kernel, it only use the title and the text which later be merged. So, the input from the client will just be the title and text. We also have to aware that the client only want to retrieve information, so the method they’ll use is the GET method. FastAPI provide it and we just have to aware about it and use it for the next process.

Server retrieve the request

As explained before, the client is expected to send a request via GET method which include the title and the content of the news. So, we have to retrieve it properly.

@app.get('/predict')
async def predict(title: str, content: str):
pass

as you can see, the app accept request via GET method on predict endpoint and the predict function is defined after async prefix. I can explain about this, but I think you can get better explanation about it from it’s official documentation, plus the explanation on FastAPI documentation. Predict function accept two parameters which are the title and the content, in the GET method, the parameters of the function will be the request body, so it’ll be match with the client request body. For other methods like POST, the request body will be in JSON format and the input should be handled using pydantic BaseModel, but I’ll not include it in here. For the pass part, it’ll the main program.

Server load the machine learning model

The Machine Learning model needs to be loaded to the system for prediction purpose. As the model is written in Keras, we can use a build in function by Keras to load the model. Other libraries may have their own way to load the model, it can be a pickle, joblib or any other approach.

from tensorflow.keras.models import load_model
import pickle
import os
model_path = os.path.abspath('models/model.keras')
model = load_model(model_path)
tokenizer_path = os.path.abspath('models/tokenizer.pkl')
with open(tokenizer_path, 'rb') as f:
tokenizer = pickle.load(f)

Server preprocess the request into an acceptable input for the model

Data Preprocessing is one of the most time consuming part of any machine learning development. This preprocessing script is gathered from the source kernel notebook, we just have to make the preprocessing part into a single function to simplify the main function.

from nltk.corpus import stopwords
from bs4 import BeautifulSoup
import string, re
stop = set(stopwords.words('english'))
punctuation = list(string.punctuation)
stop.update(punctuation)
def strip_html(text):
soup = BeautifulSoup(text, 'html.parser')
return soup.get_text()
def remove_between_square_brackets(text):
return re.sub('\[[^]]*\]', '', text)
def remove_url(text):
return re.sub(r'http\S+', '', text)
def remove_stopwords(text):
final_text = []
for i in text.split():
if i.strip().lower() not in stop:
final_text.append(i.strip())
return " ".join(final_text)
def denoise_text(text):
text = strip_html(text)
text = remove_between_square_brackets(text)
text = remove_url(text)
text = remove_stopwords(text)
return text
maxlen = 300def fake_news_preprocess(title, content):
text = content + ' ' + title
text = denoise_text(text)
tokenized_text = tokenizer.texts_to_sequences(np.array([text]))
vector = sequence.pad_sequences(tokenized_text, maxlen=maxlen)
return vectorvector = fake_news_preprocess(title, content)

Server predict the input using the model

To predict the data, we can use the internal method of the model object. As the preprocessed data already suitable with the model, we don’t have to perform any other action before prediction.

prediction = model.predict(vector)

Server format the output into JSON as the response

The prediction output from the predict method of the Keras model is in a nested numpy array data type. Because the predict method is designed to predict multiple data and in this REST API, we just need to predict one data. So, we have to take the first index of the first index of the prediction. To make the data JSON readable, we have to convert the data into python float data type. This REST API intended to check if the news is fake or not and in this case the prediction 1 means 100% real news and prediction 0 means 100% fake. So, we have to invert the prediction to get the fake probability. After that we put the fake probability into python dictionary as a response.

real_probability = float(prediction[0][0])
fake_probability = 1 - real_probability
response = {
'fake_probability': fake_probability
}

Server send the response to the client

After we get the response, we have to send it to the client. FastAPI already handle this process by returning the main function as the response, so I’ll not explain any further.

return response

Client retrieve the response from the server

This part handled by the client and we don’t need to do anything because we just focus on the REST API. In case we want to make the full stack application, we need to consider the front end part, but that is outside the scope of this article.

Complete Script

Finally, here’s the complete script of the FakeNewsAPI, of course it is not the best practice to bundle all the script inside one file, but as this is not a big project, I think I don’t really need a proper project structure of skeleton.

--

--

Faisal Malik Widya Prasetya

Data Engineer with experiences on various startups and consulting.