Our first fully connected neural network in TensorFlow/Keras¶

This example notebook provides a small example how to implement and train a fully connected neural network via TensoFlow/Keras on the MNIST handwritten digits dataset.

%tensorflow_version 2.x

import numpy as np
import matplotlib.pyplot as plt

from tensorflow import keras

%matplotlib inline

Load MNIST data, check its dimensions and let's look at a few random examples¶

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

x_train.shape, y_train.shape, x_test.shape, y_test.shape

((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

def show_train_imgs(n=8, m=5):
    for i in range(m):
        for j in range(n):
            idx = np.random.randint(len(y_train))
            plt.subplot(int('1' + str(n) + str(j+1)))
            plt.imshow(x_train[idx], cmap='gray')
            plt.title(y_train[idx], fontsize=30)
            plt.axis('off')
        plt.show()

plt.rcParams['figure.figsize'] = (15, 5)
show_train_imgs(8)

x_train.min(), x_train.max()

(0, 255)

Normalize data & reshape to a 1D array instead of 2D matrix.¶

x_train = x_train.reshape(60000, 28*28)/255
x_test = x_test.reshape(10000, 28*28)/255

x_train.shape, x_test.shape, x_train.min(), x_train.max()

((60000, 784), (10000, 784), 0.0, 1.0)

y_train[:5]

array([5, 0, 4, 1, 9], dtype=uint8)

Conversion of the labels to one-hot encoded labels¶

y_train_oh = keras.utils.to_categorical(y_train)
y_test_oh = keras.utils.to_categorical(y_test)
y_train_oh[:5]

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)

We will use the so called Sequential API.¶

This API let's us to build neural networks with the following limitation: each layer's input is the output of the previous layer to build more flexible neural networks we will use the Functional API¶

The Sequential API...

builds up layer-by-layer
you can pass activation functions as an argument to most of the layers
or you can create activation layer
.summary()
model needs to be compiled before training
- you need to set the loss function
- optimizer
- metrics
- any callbacks (functions to run after/before epochs, batches, etc)
after compiling you may train your model
- #epochs
- batch size
- train data
- validation data can be provided
you can also generate predictions with a trained model

model = keras.Sequential()
model.add(keras.layers.Dense(784, activation='relu', input_dim=784))
model.add(keras.layers.Dense(512, activation='relu'))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 784)               615440    
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_3 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,182,874
Trainable params: 1,182,874
Non-trainable params: 0
_________________________________________________________________

784*784+784, 784*512+512, 512*256+256, 256*128+128, 128*10+10

(615440, 401920, 131328, 32896, 1290)

model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.SGD(lr=1e-2), metrics=['accuracy'])

With GPU 1 epoch ~3s.¶

If you feel your model is much slower activate GPU on Google Colab via Runtime $\to$ Change runtime type $\to$ Hardware acceleraton $\to$ GPU During training the most importrant summary is shown. You can also save trianing history.

history = model.fit(x=x_train, y=y_train_oh, batch_size=64, epochs=15, validation_data=(x_test, y_test_oh))

Epoch 1/15
938/938 [==============================] - 3s 3ms/step - loss: 0.7563 - accuracy: 0.8127 - val_loss: 0.3102 - val_accuracy: 0.9127
Epoch 2/15
938/938 [==============================] - 3s 3ms/step - loss: 0.2813 - accuracy: 0.9187 - val_loss: 0.2352 - val_accuracy: 0.9310
Epoch 3/15
938/938 [==============================] - 3s 3ms/step - loss: 0.2201 - accuracy: 0.9361 - val_loss: 0.1909 - val_accuracy: 0.9423
Epoch 4/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1820 - accuracy: 0.9474 - val_loss: 0.1770 - val_accuracy: 0.9488
Epoch 5/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1552 - accuracy: 0.9545 - val_loss: 0.1462 - val_accuracy: 0.9556
Epoch 6/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1343 - accuracy: 0.9609 - val_loss: 0.1324 - val_accuracy: 0.9609
Epoch 7/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1185 - accuracy: 0.9656 - val_loss: 0.1183 - val_accuracy: 0.9644
Epoch 8/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1046 - accuracy: 0.9694 - val_loss: 0.1085 - val_accuracy: 0.9671
Epoch 9/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0936 - accuracy: 0.9731 - val_loss: 0.1078 - val_accuracy: 0.9691
Epoch 10/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0841 - accuracy: 0.9759 - val_loss: 0.1029 - val_accuracy: 0.9672
Epoch 11/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0762 - accuracy: 0.9782 - val_loss: 0.1005 - val_accuracy: 0.9678
Epoch 12/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0686 - accuracy: 0.9810 - val_loss: 0.0888 - val_accuracy: 0.9727
Epoch 13/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0622 - accuracy: 0.9820 - val_loss: 0.0877 - val_accuracy: 0.9728
Epoch 14/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0565 - accuracy: 0.9839 - val_loss: 0.0841 - val_accuracy: 0.9735
Epoch 15/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0514 - accuracy: 0.9859 - val_loss: 0.0782 - val_accuracy: 0.9755

plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='val loss')
plt.xlabel('epochs', fontsize=15)
plt.legend(fontsize=20)
plt.show()
plt.plot(history.history['accuracy'], label='train accuracy')
plt.plot(history.history['val_accuracy'], label='val accuracy')
plt.xlabel('epochs', fontsize=15)
plt.legend(fontsize=20)
plt.show()

>97%, not too bad, but why not 100%?¶

Let's check the predictions, where the model goes wrong. Errorneous predictions are highlighted with a red dot. Also, from the learning curves above we can see, that the model is still not fully trained, the results are still improving.

def show_predictions(n=5, m=5):
    for j in range(m):
        idx_start = np.random.randint(len(x_test) - n)
        preds = model.predict(x_test[idx_start:idx_start+5])
        true_labels = y_test[idx_start:idx_start+5]

        for i in range(n):
            plt.subplot(int('1' + str(n) + str(i+1)))
            predstr = 'pred: ' + str(preds[i].argmax()) + ', prob: ' + str(int(np.round(preds[i].max()*100,0))) + '%'
            plt.title(predstr + ' / true: ' + str(true_labels[i]),fontsize=10)
            plt.imshow(x_test[idx_start+i].reshape(28, 28)*255, cmap='gray')
            if(preds[i].argmax() != true_labels[i]):
                plt.scatter([14], [14], s=500, c='r')
            plt.axis('off')
        plt.show()

show_predictions(m=20)

What accuracy can you achieve with a 5-layer fully connected neural network?¶