Please work on Google Colab.

1. Load the MNIST dataset and create a CNN model

  • load the MNIST dataset from the tensorflow/keras built-in dataset (just like last time)
  • use the original train/test split!
  • divide each pixel's value by 255 and now do not reshape, leave it as is (2D matrix (28x28) )
  • eg for the test set you will have a (10000, 28, 28) shaped vector
  • train the following network on the training set and generate prediction for the 10.000 test images:

      input (28, 28)
      conv2D, 16 kernels, kernel size = 3, valid padding, relu activation
      conv2D, 16 kernels, kernel size = 3, valid padding, relu activation
      maxpooling kernel size = 2*2
      conv2D, 32 kernels, kernel size = 3, valid padding, relu activation
      conv2D, 32 kernels, kernel size = 3, valid padding, relu activation
      maxpooling kernel size = 2*2
      flatten
      dense, 10 neurons, softmax activation
    
    • pay attention to channel format, you will need to expand dims!
    • how many parameters do we have for each layer?
    • use Adam optimizer with default parameters
    • use categorical crossentropy as loss function
    • compile the model
    • print out a summary of the model
    • train the CNN on the training data for 5 epochs with batch size of 32
    • use the test data as validation data
  • calculate the categorical cross-entropy loss and the accuracy! Hint: you should get at least ~98% accuracy

  • show the confusion matrix of the predictions (predicted values vs actual labels)
  • where does the model make mistakes? Where does it improve compared to fully connected nets?

2. Download the Street View House Numbers (SVHN) Dataset

  • source: http://ufldl.stanford.edu/housenumbers/
  • use the cropped dataset!
  • to get the dataset use eg. wget and keep the original splitting, so download train and test matrix files
  • preprocess the downloaded data to be able to use it for training and testing, so shapes should be same (except image sizes) as it was in ex 1.
  • how many classes do we have in the dataset? how many train and test examples do we have?
  • what is the dimension of the images?
  • show 5 images from the dataset
  • make one-hot encoding for the labels

3. Train the CNN model seen in the 1st exercise for this dataset

  • create a convolutional neural network
  • the network should have the following layers:

      input (32, 32, 3)
      conv2D, 16 kernels, kernel size = 3, valid padding, relu actvation
      conv2D, 16 kernels, kernel size = 3, valid padding, relu actvation
      maxpooling kernel size = 2*2
      conv2D, 32 kernels, kernel size = 3, valid padding, relu actvation
      conv2D, 32 kernels, kernel size = 3, valid padding, relu actvation
      maxpooling kernel size = 2*2
      flatten
      dense, 10 neurons, softmax activation
      how many parameters do we have for each layer?
    
    
    • use Adam optimizer with default parameters
    • use categorical crossentropy as loss function
    • compile the model
    • print out a summary of the model
    • train the CNN on the training data for 15 epochs with batch size of 32
    • use the test data as validation data
  • calculate the categorical cross-entropy loss and the accuracy! Hint: you should get at least ~80-90% accuracy

4. Evaluate performance

  • plot the training and the validation loss on the same plot!
  • plot the training and the validation accuracy on the same plot!
  • do we overfit?
  • show the confusion matrix of the predictions (predicted values vs actual labels)
  • where does the model make mistakes?

5. Train an other CNN

  • as we can see the previous architecture can be further improved
  • come up with an architecture that can achieve more than 91% accuracy on the test set
  • print out the summary for this model!
  • plot the loss and accuracy curves for this model too!