# **COMP 2211 Exploring Artificial Intelligence**
## Convolutional Neural Network

![fruit.jpg](https://cdn-images-1.medium.com/fit/t/1600/480/1*WaVMfzHIKyEvfRwvz84eGA.jpeg)

## **Lab Tasks Procedure**
1. Data preprocessing **(Task1)**
2. Build the model **(Task2)**
3. Compile the model
4. Train the model
5. Save the model

**Check your Colab open the GPU accelerator:**

1. 'Edit' -> 'Notebook settings':

![gpu1.png](https://drive.google.com/uc?export=view&id=19RK_MicAY8J4BIY5g0i7bw6sr3WEFaDz)


2. Set 'Hardware accelerator':

![gpu2.png](https://drive.google.com/uc?export=view&id=1kTK1oZ-UWdIr0hxbXVT8GQDHMGHXVbLI)

In [None]:
# check your Colab device
import tensorflow as tf  # Import tensorflow library
import pprint            # Import pprint library for better print format
device_name = tf.config.list_physical_devices()  # A list of divece name, which could contain CPU and GPU
pprint.pprint(device_name)                       # Print the device_name

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## **ZINC Submission**
- **Task1**: Copy the ``data_preprocessing`` function to the given ``preprocessing.py`` file.
- **Taks2**: Save your trained model as ``model_lab8.h5``.

Zip these two files: ``preprocessing.py`` and ``model_lab8.h5``, to a single file named ``lab8_tasks.zip`` and **submit the ``.zip`` file.**

## Download dataset

In [None]:
"""
    Download neccesary files for sanity check
"""
username = input("Please enter your username: ")
import getpass
password = getpass.getpass("Please enter your password: ")
url = f'https://{username}:{password}@course.cse.ust.hk/comp2211/labs/lab8/task_data.zip'
!wget $url -O task_data.zip
!unzip -q task_data.zip -d .

Please enter your username: zraoac
Please enter your password: ··········
--2022-04-17 23:19:23--  https://zraoac:*password*@course.cse.ust.hk/comp2211/labs/lab8/task_data.zip
Resolving course.cse.ust.hk (course.cse.ust.hk)... 143.89.41.176
Connecting to course.cse.ust.hk (course.cse.ust.hk)|143.89.41.176|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Enter Your CSD PC/Unix Password"
Reusing existing connection to course.cse.ust.hk:443.
HTTP request sent, awaiting response... 200 OK
Length: 71825930 (68M) [application/zip]
Saving to: ‘task_data.zip’


2022-04-17 23:19:30 (12.2 MB/s) - ‘task_data.zip’ saved [71825930/71825930]



## Dataset: **Fruit Recognition**
---
- Training set size: 15178.
- Number of classes: 33.
- Image size: 100 x 100 pixels.

In [None]:
import os # Import os library

data_dir = './task_data'
# os.list() returns the list of subfolder's name
# sorted() rearranges the order of the list
category_list = sorted(os.listdir(data_dir)) 

# Create a dict mapping the category name to the class index
# The number of label should be 33 (0 to 32)
cate2Idx = {}
for i in range(len(category_list)):
  cate2Idx[category_list[i]] = i
print(cate2Idx)

{'Apple Braeburn': 0, 'Apple Granny Smith': 1, 'Apricot': 2, 'Avocado': 3, 'Banana': 4, 'Blueberry': 5, 'Cactus fruit': 6, 'Cantaloupe': 7, 'Cherry': 8, 'Clementine': 9, 'Corn': 10, 'Cucumber Ripe': 11, 'Grape Blue': 12, 'Kiwi': 13, 'Lemon': 14, 'Limes': 15, 'Mango': 16, 'Onion White': 17, 'Orange': 18, 'Papaya': 19, 'Passion Fruit': 20, 'Peach': 21, 'Pear': 22, 'Pepper Green': 23, 'Pepper Red': 24, 'Pineapple': 25, 'Plum': 26, 'Pomegranate': 27, 'Potato Red': 28, 'Raspberry': 29, 'Strawberry': 30, 'Tomato': 31, 'Watermelon': 32}


In [None]:
# Import all the required libraries
import cv2
import numpy as np

from sklearn.model_selection import train_test_split
import keras
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout, Flatten

## 1. Data preprocessing

We need to load the data and store them in the appropriate format.

### Task 1

Complete the following code.

1. Load images.
2. Resize images from 100 x 100 to 28 x 28.
3. Save the image data in **x**.
4. Save the corresponding class index in **y**.

**Here are some useful functions that might be useful for you:**

- cv2.imread(filepath): 
  - ***Input*** filepath -- A string of image path.
  - ***Return*** -- A numpy array.
  ```python
  img = cv2.imread("example.png"). # Load the data of example.png
  ```
- cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB): Convert the color space from BGR to RGB
  - ***Input*** img_data -- A numpy array of image data.
  - ***Return*** -- A numpy array.
  ```python
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB). # Convert the color space of image from BGR To RGB
  ```
- cv2.resize(img_data, size):
  - ***Input*** img_data -- A numpy array of image data.
  - ***Input*** size -- A tuple of integers.
  - ***Return*** - A numpy array.
  ```python
  img = cv2.resize(img, (100, 80)). # Resize the image to 100 x 80
  ```

In [None]:
# Input: data_dir(str)  -- the path of data.
#        cate2Idx(dict) -- mapping the category name to class index.
# Return: x(array) -- the images data, the shape in this task should be (15178, 28, 28, 3).
#         y(array) -- the label of images, the shape in this task should be (15178,).
# Here are some useful functions that might be useful for you:
#     cv2.imread()   -- read image data.
#     cv2.cvtColor() -- convert the color space.
def data_preprocessing(data_dir, cate2Idx):
  x = None
  y = None
  #### TODO HERE



  #### END TODO
  return x, y

In [None]:
x, y = data_preprocessing(data_dir, cate2Idx)
print(x.shape, y.shape)

(15178, 28, 28, 3) (15178,)


In [None]:
# Split the dataset to train and test parts based on the ratio of 0.2
# x_train is a NumPy array of RGB image data with shape (12142, 28, 28, 3)
# y_train is a NumPy array of digit labels (in range 0-32) with shape (12142,)
# x_test is a NumPy array of RGB image data with shape (3036, 28, 28, 3)
# y_test is a NumPy array of digit labels (in range 0-32) with shape (3036,)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# There are 33 classes and classes are represented as unique integers(0 to 32).
# Transform the integer into a 33 element binary vector.
y_train = np_utils.to_categorical(y_train, len(category_list))
y_test = np_utils.to_categorical(y_test, len(category_list))

## 2. Build the model

### Task 2

Complete the following code. You need to built your own model with at least 3 convolutional layers and 2 dense layers. 

In [None]:
# - Only use Conv2D, MaxPooling2D, Dense, Dropout and Flatten.
# - At least 3 convolutional layers and 2 dense layers.
def custom_model():
  model = None
  #### TODO HERE



  #### END TODO
  return model

In [None]:
# Create the model
model = custom_model()
model.summary()

## 3. Compile the Model

In [None]:
# Compile the model
# Use crossentropy loss function since there are two or more label classes
# Use adam algorithm (a stochastic gradient descent method)
# Use accuracy as metric
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

## 4. Train the model

You can also try different parameters.

In [None]:
# Fit the model, i.e., train the model
# Specify training data and labels
# Specify batch size, i.e., number of samples per gradient update
# Specify validation data, i.e., data on which to evaluate the loss
model.fit(x_train, y_train, batch_size=128, epochs=10, validation_data=(x_test, y_test))

## 5. Save the model

Save your model and submit it to ZINC

In [None]:
# Save the mdoel to an HDFS file
model_name = 'model_lab8.h5'              # Define model name
model.save(model_name, save_format='h5')  # Save the model