Last post, I gave an introduction into programming a deep neural network with TensorFlow. The model worked quite well (98% accuracy on the test set) with only 150 lines of code, but it was arguably a bit complex.
The problem was we had to really dig into the nitty-gritty details of how we wanted our model to work. But a lot of times, we do not need to deal with that level of detail and the complexity that comes with it. This kind of problem occurs often in software engineering, and it is generally solved with a convenient library.
Keras is such a library: it does a great job of taking the complexity out of building a neural network, so you can focus on the interesting parts of training and utilizing the model. In this post I’ll walk through some of the basics of Keras and we will rebuild our MNIST handwritten-digit classifier in a much simpler program.
Keras: The Model Abstraction
In Keras, the fundamental abstraction is the
Model object. We can design,
train, and evaluate the
Model without necessarily knowing the exact details.
In this example, TensorFlow will be the backend that Keras will utilize behind
the scenes, but Keras can actually function agnostic of its specific backend and
run with TensorFlow, Theano, or CNTK.
There are a few different model types, but the one we will utilize is the
Sequential model. The
Sequential model view the network architecture as a
sequence of layers strung together, one after another. This is exactly the
architecture we used in our previous convolutional neural network. With Keras,
we can stack our network layers like individual building blocks to create our
To add new layers to a Keras model, we simply call the
add() function and pass
in the layer we want to use. To recreate our previous convnet, we’ll need main
kinds of layers:
Dense for our fully connected layers, and
Conv2D for our
two dimensional convolutional layers. We’ll also need
layers to utilize max pooling and dropout. Finally, a
Flatten layer will be
used to convert between our convolutional and fully connected layers.
There’s a few things we should not here. The first layer we
add() needs to
take an additional argument:
input_shape. This tells Keras the size of the
inputs that we will feed into our model (in our case, a 28x28 pixel image for
MNIST). For the
Conv2D layers, the first argument represents the number of
filters, followed by the dimensions of our convolution. The
Dense layers take
an argument that represents the number of neurons in that layer. We can also
specify the activation function we want to use by a keyword argument, as we did
here. Alternatively, we could have added an
Compiling and Training the Model
Now that we have defined what our convnet will look like by stacking all of our
layers into our
model, we can get ready to start training our model on the
data set. However, first we need to
compile the model. Since Keras serves as a
high-level wrapper of other machine learning libraries, it needs to convert our
Keras-defined model into a model of our backend. Additionally, we will need to
specify some other attributes of our training procedure.
Here we specify that training will use the Adadelta optimizer, our loss function is defined by the cross entropy of the output (since this is a classification task), and we want to optimize over the accuracy of the model.
Next, we can get our training data ready. Luckily, Keras even has some common data sets built in.
For the inputs, we need to convert the arrays into the right shape, and scale
the values between . The outputs get converted to binary one-hot
vectors by Kera’s
to_categorical utility function.
Now we can finally train our model. This is done by the
epochs argument will determine how many passes through the data training
will make. The
batch_size determines how many samples to train with for each
weight update. Keras will output its progress as it works, updating you on which
epoch is running, approximately how long it will take, and the current loss in
Evaluating the Results
Once our model is trained, we can see how accurate it is at predicting on novel
data. To see how our model stacks up against the test set, use the
So in just a few lines of Python, we were able to create a high performing MNIST classifier! Using Keras is really straightforward, and allows us to avoid the nitty-gritty details of programming complex deep neural networks. Instead, we can work on other interesting aspects of our models and keep the implementation from hindering our ideas. And when Keras is too high level, we can even use it as a simplified interface to TensorFlow. As a deep learning researcher, Keras takes a lot of the hassle out of programming deep neural networks.