How to Create and Deploy a Pre-Trained Word2Vec Deep Learning REST API
In production, standard methods of developing computationally expensive programs are too slow for programmers to reliably use. Development on a laptop or local machine can be sluggish to train the machine learning model for deep learning engineers, often taking hours or days for a single build. Thus, it is the industry standard to make use of cloud resources with more compute hardware to both train and subsequently run our machine learning models. This is good practice since we abstract complex computation and instead make AJAX requests as necessary. In this tutorial, we will make a pre-trained deep learning model named Word2Vec available to other services by building a REST API from the ground up with Alibaba Cloud Elastic Compute Service (ECS).
- A Unix-based machine such as Alibaba Cloud Elastic Compute Service (ECS) instances, preferably with more compute power.
- Understanding of
- Knowledge of how to use the Linux operating system to create/navigate/edit folders and files
An Introduction to Word Vectors
Word Vectors have recently been shaking up the deep learning world due to their flexibility and ease of training. Word embeddings has revolutionized the field of NLP.
At its core, word embeddings are word vectors that each correspond to a single word such that the vectors “mean” the words. This can be demonstrated by certain phenomena such as the vector for king — queen = boy — girl. Word vectors are used to build everything from recommendation engines to chatbots that actually understand the English language.
Another point worth considering is how we obtain word embeddings as no two sets of word embeddings are the same. Word embeddings aren’t random; they’re generated by training a neural network. A recent powerful word embedding implementation comes from Google named Word2Vec which is trained by predicting words that appear next to other words in a language. For example, for the word “cat”, the neural network will predict the words “kitten” and “feline”. This intuition of words appearing “near” each other allows us to place them in vector space.
However, it is an industry standard to use the pre-trained models of other large corporations such as Google in order to quickly prototype and to simplify deployment processes. In this tutorial we will download and use Google’s Word2Vec pre-trained word embeddings. We can do this by running the following command in our working directory:
Setting Up Python Environment
Setting up a Python environment is obviously a crucial component in developing a machine learning application. However, this process is often under-looked. Best practice when using Python dependencies is to use a virtual environment in tandem with an explicit
requirement.txt file. This makes managing libraries easier for both deployment and development across multiple machines and environments.
First, we install
virtualenv, a Python module that allows us to separate our working directories such that libraries don't interfere with one another.
pip3 install virtualenv
Next, we create a virtual environment named
venv. Note that it is important to both specify and consistently use the same python version. It is recommended that you use Python 3 for best support. The
venv folder will contain all the python modules specified in requirements.txt
virtualenv -p python3 venv
Although we’ve created a virtual environment, we haven’t activated yet. Whenever we want to use the project and it’s dependencies, we must source it using
source. The file we actually want to call
source on is named
activate located in a folder named
Once we are finished with our project, or we want to switch virtual environments, we can use the deactivate command to exit the virtual environment.
Installing the Magnitude Package
The word embedding model we downloaded is in a
.magnitude format. This format allows us to query the model efficiently using SQL, and is therefore the optimal embedding format for production servers. Since we need to be able to read the
.magnitude format, we'll install the
pymagnitude package. We'll also install
flask to later serve the deep learning predictions made by the model.
pip3 install pymagnitude flask
We’ll also add it to our dependency tracker with the following command. This creates a file named
requirements.txt and saves our Python libraries so we can re-install them at a later time.
pip3 freeze > requirements.txt
Making Model Predictions
To begin, we’ll create a file to handle opening and querying the word embeddings.
Next, we’ll add the following lines to
model.py to import Magnitude.
from pymagnitude import Magnitude
vectors = Magnitude('GoogleNews-vectors-negative300.magnitude')
We can play around with the
gensim package and the deep learning model by using the
query method, providing an argument for a word.
cat_vector = vectors.query('cat')
However, for the core of our API, we will define a function to return the different in meaning between two words. This is the backbone for most deep learning solutions for things such as recommendation engines (i.e. showing content with similar words).
We can play around with this function by using the
We implement the similarity calculator as follows. This method will be called by the Flask API in the next section. Note that this function returns a real value between 0 and 1.
def similarity(word1, word2):
return vectors.similarity(word1, word2)
Wrapping The Model in a REST API
We’ll create our server in a file named
service.py with the following contents. We will use a server framework named Flask to serve our contents. Although other web-based server frameworks exist such as Django, we will use Flask due to its minimal overhead, easy integration and support within the deep learning community.
We will create a file named
service.py with the following contents. We import
request to handle our server capabilities and we import the
similarity engine from the module we wrote earlier.
from flask import Flask, request
from model import similarityapp = Flask(__name__)@app.route("/", methods=['GET'])
return "Welcome to our Machine Learning REST API!"@app.route("/similarity", methods=['GET'])
word1 = request.args.get("word1")
word2 = request.args.get("word2")
return str(similarity(word1, word2))if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000, debug=True)
Our server is rather bare bones, but can easily be extendable by creating more routes using the
Dockerizing the Application
Docker is a useful tool to containerize applications. A container is a self-sufficient application that contains all the dependencies it needs to operate. In addition to making development and testing easier, this is especially convenient for deployment, when we often use multiple machines. Docker containers are lightweight since they only virtualize at the operating system layer, not the hardware layer that heavier virtualization uses such as virtual machines.
To begin the containerization process, we will begin by creating a Dockerfile. A Dockerfile is the entry point for the entire Docker process. We use this file to define dependencies, access files, set environment variables and to run our application.
Next, we will add a command for Docker to be aware that our current directory is the directory container the Dockerfile. Then, we will install our Python dependencies for the server.
ADD requirements.txt /
RUN pip install -r requirements.txt
Next, we will install
wget so that we can download the word embeddings. We'll rename them to match the convention we use in our Flask server using the
MV Docker command.
RUN apt install wget
RUN wget http://magnitude.plasticity.ai/word2vec/GoogleNews-vectors-negative300.magnitude
Finally, we can start our server by adding the final line to our Dockerfile. This runs our Flask server.
CMD [ "python", "./service.py" ]
Running the Dockerized Application
Now, we can build our model into its own standardized container by using the Docker to create an image: a standardized set of instructions to instantly create an instance of our model.
We will first run the
docker build command, specifying a
-t flag to create a name for our image and a
. to tell Docker our Dockerfile is in our current directory.
docker build -t model .
Finally, we’ll run our image using the
docker run command, specifying a
-p flag to bind our model to port 8000 (the port our Flask server is running on) and expose it to port 8000 (the port we want to use on our localhost).
docker run -p 8000:8000 model
Making API Calls
Our server will now be available at localhost:8000. We can query our database at localhost:8000/similarity?word1=cat&word2=dog and we view the response either in our browser or through another AJAX client.
Another option to test our API is using the command line. While we can use the browser (i.e. Chrome or Safari) to test our API by using GET routes, we are limited by being unable to use POST requests. An alternative is to use the
curl tool, which comes bundled with Unix operating systems.
curl to specify both the
word2 arguments and to view the response in the command line.
curl -X GET 'http://localhost:8000/similarity?word1=dog&word2=cat'
In our terminal, we should be able to see our response accurately classified.
This entire process can run either on your local machine or on a cloud services provider such as Alibaba Cloud. However, a benefit of Docker is the ability to develop our model on a local machine (i.e. creating our
service.pyfile) and subsequently running our model on our server by using Docker. This ensures that we can have both fast development and fast deployment.