# COMP2211 PA2 Image Classification using CNN

## Introduction

In this assignment, you will build a CNN image classification model using the Keras library,
which is a simplified set of API<a href="#API"><sup>1</sup></a> on top of the famous TensorFlow library.
The links to the documentation of each task's relevant function are provided in the task description.
You can generally refer to them to see how to call each Keras functions.

In addition, Keras provides multiple coding paradigms and API patterns for some functions.
And some are not comprehensively documented!
(E.g. string shorthands for activation functions, kernel initializers, losses, etc.)

Nonetheless,
they provided many official code examples, where many alternative paradigms and underdocumented API shorthands are demonstrated:
* [Simple MNIST convnet](https://keras.io/examples/vision/mnist_convnet/) (MNIST is another image dataset on handwritten digits)
* [Basic image classification](https://www.tensorflow.org/tutorials/keras/classification)
* [Image classification](https://keras.io/examples/vision/image_classification_from_scratch/)
* [Intro to Keras for engineers](https://keras.io/getting_started/intro_to_keras_for_engineers/)


## In this programming assignment..

* All your mandatory tasks go to `pa2.py`.
    * But this notebook executes your codes and presents the entire work.
    * You can follow this notebook and go to `pa2.py` when encountering a task.
    * After re-writing the functions and saving the file, go back to this notebook and proceed.
    * After you modify and save `pa2.py`, re-running the relevant code cells in this notebook will use the updated codes. (This is not standard Python behavior, but notebook-specific magic of the `autoreload` extension.)
* There is one function for each task.
* We don't expect you as a pro in defining Python functions. You only need to:
    * Utilize only the parameters provided in the function definition
    * Write your code between the comments *START YOUR CODE HERE* and *END YOUR CODE HERE*.
    * When writing each task, imagine the provided parameters in the function and the imported libraries are all you need.
    * Don't modify the parameter list of each function.
    Or you will confuse Zinc when we call these functions to evaluate your tasks.
* There are some dummy codes in each task already.
    * They are just to ensure the notebook will run without errors even if you do nothing.
    * Feel free to remove and replace them, or refer to them (e.g., the expected shape, how to call `keras.Sequential`, etc.)
    * The yellow `print` statements are merely informational.
    We will *not* grade by checking the presence of these `print`s.
    Feel free to remove them or keep them (if you don't feel bothered).

---------

<a id="API">1</a>: API Stands for *Application Programming Interfaces*. Here we refer to all the functions, classes, etc., provided by libraries like Keras & TensorFlow.


The following block download relevant files.

[pa2.py](./pa2.py), 
[draw.npz](./draw.npz), 
[draw_example.zip](./draw_example.zip)

In [None]:
# Provided: notebook bootstrapping
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers, initializers
import matplotlib.pyplot as plt

# These two lines allows reloading of specific imported modules
# https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html
%load_ext autoreload
%autoreload 2
import pa2

# This line sets a random seed, making all random behaviors deterministic and reproducible
keras.utils.set_random_seed(2211)


## Quick Draw Dataset

[Quick, Draw! Dataset](https://github.com/googlecreativelab/quickdraw-dataset) is a very large dataset of users' drawing in the game [Quick, Draw!](https://quickdraw.withgoogle.com/) (345 categories * ~100k images per category).

The dataset we use is a randomly drawn subset containing **8 categories and 5000 images per category**. Each image is **28 * 28 grayscale (not RGB)**.

### Peaking the dataset

We first provide you 4 exemplar images for each label, stored in png file format.
We also provide you with a one-line code demonstrating the use of `keras.utils.image_dataset_from_directory`
to batch load an entire folder of image data.

This code is for demonstration and peeking at the dataset only.
We will load the entire dataset in traditional NumPy arrays in the following tasks.

You can refer to [the function documentation](https://keras.io/api/data_loading/image/#image_dataset_from_directory-function)
and [the `tf.data.Dataset` object documentation it returns](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#take)
for more information, and use these data loading utilities in your own future projects.
It takes care of managing batch sizes, shuffling at each epoch,
splitting validation sets if needed, and much more.
Note that the returning dataset object contains both x & y,
which may further lead to different usages of `model.fit`, `model.evaluate`, etc.

In [None]:
# Provided
example_set = keras.utils.image_dataset_from_directory(
    "draw_example",
    label_mode="categorical", # This makes label one-hot
    color_mode="grayscale",   # This decides how many channels
    shuffle=False,            # Turn off dataset shuffling for demo purpose
    batch_size=None,
    image_size=(28,28),
)

for image, label in example_set:
    print('single image.shape', image.shape, 'single label shape', label.shape)
    break

plt.figure(figsize=(15,5))
for i, (image, label) in enumerate(example_set):
    plt.subplot(4, 8, (i % 4) * 8 + (i // 4) + 1)
    plt.imshow(tf.squeeze(image), cmap='gray')
    plt.axis(False)
    plt.title(str(label.numpy().astype(int).tolist()))
plt.tight_layout()
plt.show()

### Loading the entire dataset

Recall: the dataset we use is a randomly drawn subset containing **8 categories and 5000 images per category**. Each image is **28 * 28 grayscale (not RGB)**.

All images are deliberately stored in NumPy arrays instead of image files to save space. We wrote the NumPy file loading part for you.

* `x_train_raw` is 2D containing the pixel values of training dataset images. Each row is an image and has a length of 784 (28*28).
* `y_train_raw` is 1D containing the ground-truth label of those images, each being an integer in `[0, 7]`.
* Similar for `_val_raw` & `_test_raw`.
* `y_names` is an array of 8 strings containing the human-readable names for each label value from 0 to 7.

In [None]:
# Provided: load numpy file
np_data = np.load("draw.npz")
x_train_raw = np_data["x_train"]
y_train_raw = np_data["y_train"]
x_val_raw = np_data["x_val"]
y_val_raw = np_data["y_val"]
x_test_raw = np_data["x_test"]
y_test_raw = np_data["y_test"]
y_names = np_data["y_names"]
N_labels = len(y_names)  # Number of labels in total
print("x_train_raw.shape", x_train_raw.shape)
print("x_val_raw.shape", x_val_raw.shape)
print("x_test_raw.shape", x_test_raw.shape)
print("y_train_raw.shape", y_train_raw.shape)
print("y_val_raw.shape", y_val_raw.shape)
print("y_test_raw.shape", y_test_raw.shape)
print("y_names: ", y_names)
print("There are", N_labels, "labels in total.")


### Optional task: Explore the dataset

Count the number of each label in the train/val/test set in the code block below.
This task will not be graded,
but is a great chance to practice your NumPy indexing and vectorization technique.
See if you can use *no `for` loop* in this task?

You should see **each set containing equal numbers of all labels**.

In [None]:
# *OPTIONAL* TODO

### *OPTIONAL* START YOUR CODE HERE
train_count = np.zeros(8)
val_count = np.zeros(8)
test_count = np.zeros(8)
### *OPTIONAL* END YOUR CODE HERE

print(train_count, val_count, test_count, sep='\n')

### Task 1: Reshape images

Note that each image is in an 1D array of length 784.
It follows that each of `x_<dataset>_raw` is 2D.

Intuitively we want an image to be a 3D array like **height\*width\*channels**. In our case, the images are all grayscale. Therefore the channel dimension should be 1.

Implement a function that reshapes 2D `x_train_raw`, `x_val_raw`, `x_test_raw`, respectively,
into 4D NumPy arrays like **\<num_data\>\*height\*width\*channels**.
It returns 3 4D arrays.

In [None]:
# TODO: Go to `pa2.py` and implement reshape_x.

x_train, x_val, x_test = pa2.reshape_x(x_train_raw, x_val_raw, x_test_raw)
print("x_train.shape", x_train.shape)
print("x_val.shape", x_val.shape)
print("x_test.shape", x_test.shape)


### Task 2: One-hot encode labels

Note that each label is an integer in $[0, 7]$.
It follows that each of `y_<dataset>_raw` is 1D.

Like previous labs, we need to transform a single integer to a one-hot encoding vector of length `N_labels`, which is the total number of possible labels.

Implement a function that creates and returns one-hot encodings for `y_train_raw`, `y_val_raw`, `y_test_raw`, respectively.
The returned array shape is like **<num_data>\*N_labels**.

Here we use [`tf.one_hot`](https://www.tensorflow.org/api_docs/python/tf/one_hot#for_example) to encode
for the maximum code compatibility and minimum library imports.

> It returns a `tf.Tensor` type, in case you notice and worry about the type,
> imagine it as simply a TensorFlow-accelerated NumPy array with *mostly* compatible syntax.
> The documentation claims that the first parameter it accepts is also of type `tf.Tensor`. However, you can directly pass NumPy arrays as the first parameter.

In [None]:
# TODO: Go to `pa2.py` and implement encode_y.

y_train, y_val, y_test = pa2.encode_y(y_train_raw, y_val_raw, y_test_raw, N_labels)
print("y_train.shape", y_train.shape)
print("y_val.shape", y_val.shape)
print("y_test.shape", y_test.shape)


The following block will call that function
and display the first 9 images and their labels in the training set.

In [None]:
# Provided: plot the first 9 images in the training set
image_9 = x_train[:9, :, :, :]
label_9 = y_train[:9, :]
plt.figure(figsize=(10, 10))
for i in range(9):
    image = image_9[i, :, :, 0]
    label = label_9[i, :]
    label = 'Unknown' if np.sum(label) != 1 else y_names[np.argwhere(label == 1)[0, 0]]
    ax = plt.subplot(3, 3, i + 1)
    ax.imshow(image_9[i, :, :, 0], cmap="gray")
    ax.set_title(label)
    ax.set_axis_off()
plt.tight_layout()


## Data augmentation

Data augmentation is to randomly perturb our image data to generate more input data.

### Task 3: Create data augmentation layers

Implement a function that creates and returns a Keras model for data augmentation.
The model should do the following augmentation in sequence.

1. Randomly flip (or not) an image horizontally
2. Randomly rotate an image within the range `[-0.1 * 2π，0.1 * 2π]`
and fill those points outside the boundary with a constant value of zero (black background)

You may refer to the documentation for
[`RandomFlip`](https://keras.io/api/layers/preprocessing_layers/image_augmentation/random_flip/),
[`RandomRotation`](https://keras.io/api/layers/preprocessing_layers/image_augmentation/random_rotation/) and
[`Sequential`](https://keras.io/guides/sequential_model/).

> Note: For those who run tf locally *using macOS and Apple's metal driver accelerator*, some data augmentation layers may raise internal errors.


In [None]:
# TODO: Go to `pa2.py` and implement AugmentationLayer.


We can now plot one image followed by the same augmentation 8 times on this image. Note that there is randomization in the augmentation process, so it can produce 8 different outputs on the same image.

In [None]:
# Provided: plot the Augmented
augmentation_layer = pa2.AugmentationLayer()
plt.figure(figsize=(10, 10))
image = x_train[0, :, :, :]
label = y_train[0]
label = 'Unknown' if np.sum(label) != 1 else y_names[np.argwhere(label == 1)[0, 0]]
plt.suptitle(label + " augmentation")
plt.subplot(3, 3, 1)
plt.imshow(image[:, :, 0], cmap="gray")
plt.axis(False)
for i in range(1, 9):
    ax = plt.subplot(3, 3, i + 1)
    augim = augmentation_layer(image, training=True)
    augim = augim[:, :, 0].numpy()
    ax.imshow(augim, cmap="gray")
    ax.set_axis_off()
plt.tight_layout()


## The CNN Model

### Task 4: Build a model

Before building the main model architecture,
we need to include the data augmentation layer first<a href="#extend-preprocess"><sup>1</sup></a>.
Note that a Keras Sequential model can contain a nested Sequential model.
Thus, you can directly call `AugmentationLayer()` you defined above
as one layer of your main model below.

After augmentation, we need to standardize the model like what we have done in PA1.
Here all pixel values in the images are `[0, 255]` integers.
We simply need to rescale each pixel value to `[0, 1]` decimal numbers.
You can refer to the documentation on
[`Rescaling`](https://keras.io/api/layers/preprocessing_layers/image_preprocessing/rescaling/)
for this layer.

In summary, you should create a `Sequential` model of the following architecture:
* Your data augmentation layer
* A proper `Rescaling` layer described above
* A convolutional layer with 16 3*3 kernels, ReLU activation & ["He uniform"](https://keras.io/api/layers/initializers/#heuniform-class) kernel initializer
* A 2*2 max pooling layer
* A convolutional layer with 32 3*3 kernels, ReLU activation & ["He uniform"](https://keras.io/api/layers/initializers/#heuniform-class) kernel initializer
* A 2*2 max pooling layer
* A convolutional layer with 64 3*3 kernels, ReLU activation & ["He uniform"](https://keras.io/api/layers/initializers/#heuniform-class) kernel initializer
* A 2*2 max-pooling layer
* A flatten layer to squash the 3D data to 1D
* A dropout layer with a 0.2 probability
* A dense layer with output dimension equal to `N_labels` and softmax activation

Please refer to [their documentation](https://keras.io/api/layers/) for detailed usage.

Usually, we pass in the `input_shape` parameter to the first layer.
But since the first layer is our customized function receiving no parameters,
we tell the model the input shape by a `model.build` function after definition.
This line is provided for you.

> Note that in lecture notes and many other demonstrations codes,
> we can provide a string name shorthand for
> initializers and activation functions,
> as long as we use them *with their default parameters*.
> In this case, you can use either string shorthands or
> the longer version in PA.

-----

<a id="extend-preprocess">1</a>: Extended reading: [Preprocessing data before the model or inside the model](https://keras.io/guides/preprocessing_layers/#preprocessing-data-before-the-model-or-inside-the-model)


In [None]:
# TODO: Go to `pa2.py` and implement build_model.

model = pa2.build_model(N_labels)
model.summary()


### Task 5: Compile the model

In Keras' terms, compiling a model is to set
the loss function,
the optimizer (a.k.a. the learning rate and other related stuff), and
the evaluation metrics (e.g., accuracy).

Refer to the [`model.compile`](https://keras.io/api/models/model_training_apis/#compile-method) documentation.
Implement a function that **receives a model and a learning rate as parameters** and compiles the model using
* [`categorical_crossentropy`](https://keras.io/api/losses/probabilistic_losses/#categoricalcrossentropy-class) as loss function,
* [`SGD`](https://keras.io/api/optimizers/sgd/) **with given learning rate**, and
* only [`accuracy`](https://keras.io/api/metrics/accuracy_metrics/#accuracy-class) metric.

> Note that in lecture notes and many other demonstrations codes,
> we can provide a string name shorthand for
> losses, metrics, and optimizers,
> as long as we use them *with their default parameters*.
> In this case, you can use either string shorthands or
> the longer version in PA.

<br/>

> If you found problems using `optimizers.SGD(…)` or `from optimizers import SGD`, try to use `keras.optimizers.SGD(…)`.
> Not sure why this import issue happens.


In [None]:
# TODO: Go to `pa2.py` and implement compile_model.


### Task 6: Train the model

Refer to [`model.fit`](https://keras.io/api/models/model_training_apis/#fit-method) documentation.
Implement a function that trains a given model using
given `x_train` & `y_train` in a batch size of 32 for given #epochs.
Also, provide `x_val` & `y_val` as validation data, with a validation batch size of 32 as well.


In [None]:
# TODO: Go to `pa2.py` and implement train_model.


### Optional Task: Parameter tuning 1

This section gives you the first taste of parameter tuning.
For tuning on more parameters, you can explore the Internet for a deeper understanding.

Let's try training a model with a learning rate of `1` for 5 epochs.

Note that if the loss doesn't go down (or even increases) during training, then probably lr is too large.

You may call your previously implemented functions by `pa2.compile_model(...)`, and `pa2.train_model(...)`.


In [None]:
# *OPTIONAL* TODO: Write your codes below
model = pa2.build_model(N_labels) # Build new model with newly initialized weights

# Compile and train the model
### *OPTIONAL* START YOUR CODE HERE

### *OPTIONAL* END YOUR CODE HERE

### Optional Task: Parameter tuning 2

Let's also try training the model with a learning rate of 0.0001 for 5 epochs.

Note that if the loss goes down too slowly then probably it's too small.


In [None]:
# *OPTIONAL* TODO: Write your codes below
model = pa2.build_model(N_labels) # Build new model with newly initialized weights

# Compile and train the model
### *OPTIONAL* START YOUR CODE HERE

### *OPTIONAL* END YOUR CODE HERE

So far, you can have a basic idea of tuning the learning rate parameter.
The significance of the learning rate also depends on the optimizer we choose.

In fact, a learning rate of 0.001 is good for our task. Let's use 0.001 and train for 20 epochs.

In [None]:
# Provided
keras.utils.set_random_seed(2211) # Reset seed, you should get the same model since this code cell
model = pa2.build_model(N_labels) # Build new model with newly initialized weights
pa2.compile_model(model, 0.001)
pa2.train_model(model, 20, x_train, y_train, x_val, y_val)

### Task 7: Evaluate the model on the test dataset

Refer to [`model.evaluate`](https://keras.io/api/models/model_training_apis/#evaluate-method) documentation.
Implement a function that evaluates the model given `x_test` & `y_test` in a batch size of 32.

Please make sure that your model does reach a test accuracy > 60%.

In [None]:
# TODO: Go to `pa2.py` and implement evaluate_model.
# TODO: Make sure test accuracy > 60%,
#       or you may have something wrong in previous steps.

pa2.evaluate_model(model, x_test, y_test)


## Play with the model on our data

### Task 8: Predict labels ourselves

The `model.evaluate` function calculates accuracies given an already known test dataset x & y.
But now, we'd like to use this model on our own data to predict y!

Refer to [`model.predict`](https://keras.io/api/models/model_training_apis/#predict-method)<a href="extended-predict"><sup>1</sup></a> documentation.
Implement a function that receives a NumPy array as a batch of images in shape `(num_images, 28, 28, 1)` and returns the predicted labels for those images in shape `(num_images,)`.

Note that `model.predict` returns a NumPy array of shape `(num_images, N_labels)`

-----

<a id="extended-predict"></a>1: Extended reading: [What's the difference between Model methods `predict()` and `__call__()`?](https://keras.io/getting_started/faq/#whats-the-difference-between-model-methods-predict-and-call)

In [None]:
# TODO: Go to `pa2.py` and implement predict_images.


The codes below take 2 images from each category in the test set and use your function to predict the label.

In [None]:
# Provided
test_N_per_cat = x_test.shape[0] // N_labels
plt.figure(figsize=(12, 4))
for i in range(2):
    images = x_test[i::test_N_per_cat, :, :, :]
    labels = y_test[i::test_N_per_cat, :]
    preds = pa2.predict_images(model, images)
    for j in range(8):
        image = images[j, :, :, 0]
        pred_label = y_names[preds[j]]
        true_label = labels[j, :]
        true_label = 'Unknown' if np.sum(true_label) != 1 else y_names[np.argwhere(true_label == 1)[0, 0]]
        ax = plt.subplot(2, 8, i * 8 + j + 1)
        ax.imshow(image, cmap="gray")
        ax.set_title(f"True: {true_label}\nPred: {pred_label}")
        ax.set_axis_off()
plt.tight_layout()


## Saving the model

This part is provided to you.


In [None]:
model.save("draw_model.h5")


## What's Next?

If you want to explore more, here is something you can try.
But please keep in mind that they are not required,
or even not expected in your PA submission.
Make a copy of your work before you venture on!

* Practice NumPy indexing and vectorization techniques
    * Write your *own* code to calculate the confusion matrix on `x_test` and `y_test`.
    Use *only* `model.predict` and *no* `tf.math.confusion_matrix`.
    * Implement a convolution operation (given a piece/batch of data and a kernel) yourself, using as few for-loops as possible.
* Towards a better model
    * Experiment with other model architectures.
    * Experiment with other optimizers like [`Adam`](https://keras.io/api/optimizers/adam/)
    * Tune other parameters. You may check lecture notes, Keras API documentation
    and online materials to see what parameters you can tune.
    * Try out Keras callbacks like [`ReduceLROnPlateau`.](https://keras.io/api/callbacks/reduce_lr_on_plateau/)
* Keep track of model statistics
    * Try out [Tensorboard callback](https://keras.io/api/callbacks/tensorboard/).
* Play with other datasets
    * Explore other popular datasets.
    [TensorFlow](https://www.tensorflow.org/datasets/catalog/overview?hl=zh-cn#image_classification) and
    [Keras](https://keras.io/api/datasets/)
    also ship with some most popular datasets already.
* Engineer a better ML pipeline
    * Explore other [keras callbacks](https://keras.io/api/callbacks/)
    * Try to save the images in the NumPy arrays as separate `png` files in a hierarchy preferred by
    [`keras.utils.image_dataset_from_directory`](https://keras.io/api/data_loading/image/#image_dataset_from_directory-function)
    and do the following parts of model building, compiling, training & evaluation.
    See what needs to be changed in the following parts.
    * Try to wrap the NumPy array data in a `tf.data.Dataset` object using [`tf.data.Dataset.from_tensor_slices`](https://www.tensorflow.org/tutorials/load_data/numpy).
    See how to specify dataset shuffling & batching in this case (and their order?),
    and what needs to be changed in the following parts.