MNIST is often considered the “Hello World!” of deep learning. I have hesitated making a post using it because there is so much content already available on the web using the 1998 digit-based version of MNIST. Fashion MNIST is an upgraded, more complex version released in 2017. It is based on 10 categories of clothing (like dresses and sneakers) instead of 10 digits. I completely adore this data set as well as the story and the technological prowess of the start-up behind it!

In this post, we cover how to build a Convolutional Neural Network with Keras and a TensorFlow backend to classify the input by inference. Keras is a library created by Francois Chollet. He works at Google and is also the author of Deep Learning published by Manning. It is great for prototyping and also for beginners since it abstracts away much of the complexity in creating a neural network.

Note that this post is a work in progress and I will be updating it as I write as it is getting quite long.

About the Dataset

The research paper describing Fashion MNIST can be found on Arxiv here. The paper and data set are relatively recent as they were released in September 2017. It is a short read at 6 pages compared to the original MNIST paper which is 46 pages.

The data set was created to be accessible and easy to use like MNIST, but with more complexity to provide more of a challenge.

The figure below is taken from the same paper (Xiao et al, 2018). It shows the 10 classes the data set is made up of as well as the accompanying labels to facilitate model training.

MNIST Fashion Samples

A sample of the Fashion MNIST pieces (Xiao et al, 2018).

The Creators: Zalando Research

I was curious about the origin and intent of the data set. It turns out that it was created by some very smart people at Zalando Research in Berlin. They are a small unit of Zalando Technology who work in research for Data Science, ML and AI.

They are part of an umbrella company called Zalando which is a gargantuan (they have billions of dollars in revenue) European e-commerce fashion retailer. They were a small start-up which launched in 2008 and have rapidly grown to now serve 17 markets and have been publicly traded since 2014. Technology and research into new methods are core parts of their operations and success. An overview of the company is given here.

The front-look images in the data set come from Zalando’s own catalogue. They were preprocessed and formatted to become drop-in replacements for the original MNIST. You can actually replace the data set and use exactly the same code with both data sets. I tried it and it worked. The performance considerations are the main difference.

Structural Details

The Original MNIST

We can look at the original data set for a comparison of the changes and upgrades to the fashion based one.

The well-known MNIST data set is composed of 70,000 images of handwritten digits ranging from 0-9. The purpose of it is to reduce the preprocessing time needed to test on real-world data sets by providing one which is fast to test on and already prepared. Data preparation can take up to 90% of a Data Scientist’s time. It is this convenience that has led it to become the “Hello World” of deep learning.

In the paper Gradient-Based Learning Applied to Document Recognition where the data set is described, it was used to read digits on cheques in commercial applications.

Source of the Data

MNIST stands for Modified National Institute of Standards and Technology Database. This is because it was made from NIST’s Special Database 1 (SD 1) and Special Database 3 (SD 3). NIST is a US measurement agency made to support innovation by standardizing measurements. SD 1 was collected from high school students and SD 2 was collected from US Census Bureau employees (Geron, 2017; LeCun, 2017).

The Data set

The training set consists of 60,000 images while the test set has 10,000 images. Each set is split equally between SD 1 and SD 2 and labelled with the appropriate digit. Each image measures 28 x 28 pixels. Each of the images has 28 x 28 = 784 features where each pixel is a feature with an intensity value. The intensity values range from 0 to 255 where 0 is white and 255 is black.

The image below is taken from the MNIST paper and shows a sample of the digits (LeCun, et al, 1998).

MNIST digits from paper.

A sample of the MNIST Digits.
Viewing the Data

Google Colab
The code for this example can be run in Google’s Colab which gives you a free NVIDIA GPU to use. This is great for training deep neural networks. It also comes with many of the Python Data Science packages installed already like TensorFlow and Keras.

Just remember to turn the GPU on as it is not enabled by default. Steps to do so are shown below.

Turn on the GPU in Colab.

  1. Click Edit then Notebook Settings. Colab Notebook Settings

  2. Select GPU from the dropdown for Hardware Accelerator. Colab GPU

A Note on Tensor Processing Units
I noticed recently that a TPU (A Tensor Processing Unit) is also available from the dropdown menu. These are specifically made for deep learning by Google and can significantly speed up training time. Specific code has to be used in TensorFlow to take advantage of its benefits. I’m not sure how these work with Keras yet.

Colab TPU

Getting and Displaying the Data
Many Machine Learning frameworks have convenience functions for importing MNIST because of its popularity. The data is imported from Keras below and plotted with Matplotlib.

Appending _r to the cmap property inverts the colors so it corresponds to the values in the data set.

# imports
import matplotlib.pyplot as plt
from keras.datasets import mnist

# load the data
(X_train, y_train), (X_test, y_test) = mnist.load_data('/tmp/mnist.npz')

# Changing the number in the square brackets will plot a different image
plt.imshow(X_train[4], cmap='gray_r')

MNIST plot from dataset

Below we plot the values for one image with X_train[4]. The data is sparse in the sense that most of the space is 0 or white. Later it will be shown how to normalize these numbers to facilitate training for Fashion MNIST.

There are 28 rows and 28 numbers per row representing the image pixels.

X_train[4]
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  55,
        148, 210, 253, 253, 113,  87, 148,  55,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  87, 232,
        252, 253, 189, 210, 252, 252, 253, 168,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   4,  57, 242, 252,
        190,  65,   5,  12, 182, 252, 253, 116,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  96, 252, 252, 183,
         14,   0,   0,  92, 252, 252, 225,  21,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0, 132, 253, 252, 146,  14,
          0,   0,   0, 215, 252, 252,  79,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0, 126, 253, 247, 176,   9,   0,
          0,   8,  78, 245, 253, 129,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  16, 232, 252, 176,   0,   0,   0,
         36, 201, 252, 252, 169,  11,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  22, 252, 252,  30,  22, 119, 197,
        241, 253, 252, 251,  77,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  16, 231, 252, 253, 252, 252, 252,
        226, 227, 252, 231,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  55, 235, 253, 217, 138,  42,
         24, 192, 252, 143,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         62, 255, 253, 109,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         71, 253, 252,  21,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0, 253, 252,  21,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         71, 253, 252,  21,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        106, 253, 252,  21,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         45, 255, 253,  21,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0, 218, 252,  56,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  96, 252, 189,  42,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  14, 184, 252, 170,  11,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  14, 147, 252,  42,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

Fashion MNIST

An apt description of the data set is given in the abstract for the paper:

We present Fashion-MNIST, a new dataset comprising of 28 × 28 grayscale images of 70, 000 fashion products from 10 categories, with 7, 000 images per category. The training set has 60, 000 images and the test set has 10, 000 images. Fashion-MNIST is intended to serve as a direct dropin replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits." ~ (Xiao et al, 2017)


Thus, in a similar vein to MNIST, Fashion-MNIST consists of 70,000 images of fashion products. As mentioned before there are also 10 categories. The training set has 60,000 images and the test set also has 10,000 images. The image sizes and data formats are also the same.

I did find it to be true that Fashion MNIST can simply replace the original MNIST in the same code as stated before. Try it yourself with the code below.

As mentioned in the paper’s abstract, the data set is available on Github for free here.

The major difference between the data sets is in performance. Fashion MNIST does not perform as well as MNIST because it is more complex. A detailed breakdown of performance comparisons using different models and hyperparameters can be seen in the paper from pages 3-6.

The Code

I ran this in Google’s Colab. Instructions on how to set it up are given earlier in this post. I also ran it locally to make using TensorBoard a bit easier.

from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout
from keras.utils.np_utils import to_categorical
from keras.callbacks import TensorBoard
import keras.backend as K
from keras.datasets import fashion_mnist
Using TensorFlow backend.

Imports

Here we pull in all the imports we need. Only Keras and its modules are really necessary at this point. I normally import pandas and numpy also in case I need it afterwards. They are all already installed in Colab. Huzzah!

Models and Layers

We start with the Sequential model via from keras.models import Sequential. There are two types of models in Keras. The Sequential model and the Model class with the functional API. It is possible to subclass these models to build a custom model. See more here.

This helps us to create a model as a stack of layers which are introduced below. Keras comes with several layers built-in. Some of these layers are imported via from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout. These are elaborated on when we build the model below.

Utils

The Keras utils package contains some helper functions to make working with Keras more convenient. Here we import to_categorical which takes an integer vector of shape (60000,) for the training data with all the 10 classes numbered from 0-9 and converts it to a one-hot encoded matrix with shape (60000, 10). More on this later.

Another util that is commonly used is utils.print_summary. This is a shortcut for model.summary() which is commonly used to view the structure and parameters of the network. An example is shown below:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 6, 64)          18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               73856     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 93,962
Trainable params: 93,962
Non-trainable params: 0
_________________________________________________________________

See more of these helper functions here.

TensorBoard

We import TensorBoard so it can be used to show graphs to monitor training. As an alternative, Matplotlib can be used. TensorBoard can also be used for t-SNE (T Distributed Stochastic Neighbour Embedding) which is very cool. This is renedered with the embedding projector. The sample below is taken from the Fashion MNIST GitHub page. I may recreate this at a later date.

drawing

t-SNE embeddings of Fashion MNIST from Zalando Research

The Data Set

Like MNIST, Fashion MNIST is also conveniently built-in for use in keras.datasets and is imported with from keras.datasets import fashion_mnist. There are several other well-known data sets which are also built-in like CIFAR10 and CIFAR100, IMDB Movie Reviews, Reuters Newswire Topics and Boston Housing Prices. Fetching them is done similarly and they are loaded with the load_data() function as shown in the next section. See the TensorFlow Docs and the Keras Docs for more information.

Backends

After running the cell the output confirms that the TensorFlow backend is being used. Keras is described as being a model-level library in its documentation. This is to say, it offers a high-level API to multiple different lower-level libraries. Currently it can perform lower-level operations like convolutions and optimization using TensorFlow, CNTK and Theano.

Another way to view and access properties on the backend in shown below. This way you can get functions from TensorFlow like placeholder(), ones(), and constant() etc. K can also be used to clear the graph to make sure you are starting fresh. This is covered when we define the model.

import keras.backend as K
K.backend()
'tensorflow'

The default backend used is stored and can be changed by looking in the keras.json configuration file.

cat ~/.keras/keras.json
{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

More information on the backend functions is available here.

Load the data set

After importing the necessary packages the next step is to load the data set. When we run fashion_mnist.load_data() a call to an AWS server is made to retrieve the data. This step can take some time depending on the size of the data set and the connection.

The 60,000 training images are moved into X_train and the 60,000 training labels numbered 0-9 go to y_train. Similarly the 10,000 test images and labels are stored in X_test and y_test respectively.

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 1s 0us/step
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step

Exploring the Data and the Labels

The Training Images

If we explore the shape of the training data with X_train.shape, we see the 60,000 images of shape 28 x 28 in a similar fashion to MNIST.

X_train.shape
(60000, 28, 28)

X_train[0] gives a matrix of 28 rows with 28 values per row. Again, as with MNIST above, change the index to get a different image.

X_train[0]
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,
          0,   0,  13,  73,   0,   0,   1,   4,   0,   0,   0,   0,   1,
          1,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
          0,  36, 136, 127,  62,  54,   0,   0,   0,   1,   3,   4,   0,
          0,   3],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   6,
          0, 102, 204, 176, 134, 144, 123,  23,   0,   0,   0,   0,  12,
         10,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0, 155, 236, 207, 178, 107, 156, 161, 109,  64,  23,  77, 130,
         72,  15],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   0,
         69, 207, 223, 218, 216, 216, 163, 127, 121, 122, 146, 141,  88,
        172,  66],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   1,   1,   0,
        200, 232, 232, 233, 229, 223, 223, 215, 213, 164, 127, 123, 196,
        229,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        183, 225, 216, 223, 228, 235, 227, 224, 222, 224, 221, 223, 245,
        173,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
        193, 228, 218, 213, 198, 180, 212, 210, 211, 213, 223, 220, 243,
        202,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   3,   0,  12,
        219, 220, 212, 218, 192, 169, 227, 208, 218, 224, 212, 226, 197,
        209,  52],...
  

To plot one of these images, use plt.imshow(). As before the colors are inverted with cmap='gray'. This index gives a T-shirt.

plt.imshow(X_train[1], cmap='gray_r')

drawing

The Training Labels

Exploring the training labels we can see a 1D vector with numbers from 0-9 representing the numerical class mappings for the categories with 60,000 values.

y_train
array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)
y_train.shape
(60000,)

More to come…!

 References

  1. Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. “ O’Reilly Media, Inc.”, 2017.

  2. Goodfellow, Ian, et al. Deep learning. Vol. 1. Cambridge: MIT press, 2016.

  3. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.

  4. LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.

  5. Xiao, Han, Kashif Rasul, and Roland Vollgraf. “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.” arXiv preprint arXiv:1708.07747 (2017).