Quick Draw dataset is a doodling dataset of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!.
In this assignment, you are given a small subset of the dataset comprising 8 categories and 5000 images per category, on which you will build a CNN model to classify those doodles. You will mainly use Keras library, with a little touch on NumPy and TensorFlow. (I mean the non-Keras part of TensorFlow).
Please refer to piazza posts or the Assignment Tasks section below for the changelog of skeleton files since PA release.
The following bullet points give you a general idea of what to do in this assignment.
Please download the notebook here. The first code block in the notebook will download the following additional files, but you can also download them manually and comment out the first cell.
The final file structure is like
pa2.ipynb
pa2.py
draw.npz
draw_example/
drums/
0.png
...
3.png
eiffel_tower/
....
...
And you should see the following if you open the notebook successfully.
Changes made to the following files:
model.evaluate
doesn't have parameters for validation set.
I removed the sentence mentioning x_val
and y_val
.
It was copied from Task 6 and I forgot to remove it.
model.predict
returns
an array of shape
Deadline: 23:59:00 on May 12, 2022 (Thursday).
Create a single zip file
that contains pa2.py
and draw_model.h5
(your model saved by the code we provided in the notebook).
Please do not create a folder inside the zip.
Submit the zip file to ZINC. ZINC usage instructions can be found here.
There are 2 ZINC submission entries. The one with validation in its name is to validate your code runs, and it reports scores immediately. The one without validation in the name is the final destination you submit your code, and it won't report scores before DDL!
Notes:We grade each function separately. For each function, we have already provided some dummy codes so that they run without raising exceptions. If your implementation of one function raises errors, you will not get the score for this function, as is expected. But a wrong implementation that raises no error will neither guarantee your scores (or you can score even using the skeleton code XD).
Make sure you upload the correct version of your source files - we only grade what you upload. Some students in the past submitted an empty file or a wrong file which is worth zero mark. So you must double-check the file you have submitted.
There will be a penalty of -1 point (out of 100 points) for every minute you are late.
For instance, if you submit your solution at 1:00:00 a.m. on May 12, there will be a penalty of -61 points for your assignment. (Since the deadline of assignment 2 is 23:59:00 on May 12).
However, the lowest grade you may get from an assignment is zero: any negative score after the deduction due to a late penalty (and any other penalties) will be reset to zero.
train_x
, val_x
,
test_x
have correct shapes and values (1pt for each)
train_y
, val_y
,
test_y
have correct shapes and values (1pt for each)
AugmentationLayer
Sequential
has correct layer types (i.e. classes)
in the correct order. (2pt. No intermediate scores if some layers are
wrong)Sequential
’s layers are constructed with correct
parameters (1pt. No intermediate scores if some layers are wrong. No
score if the test case above fails.)Sequential
has correct layer types (i.e. classes)
in the correct order. (3pt. No intermediate scores if some layers are
wrong)Sequential
’s layers are constructed with correct
parameters (1pt. No intermediate scores if some layers are wrong. No
score if the test case above fails.)draw_model
Q: ZINC only reports one line of error "test_pa2_pre" instead of multiple lines of test cases.
A: That's because ZINC fails to even start the entire test unit.
The most possible cause is that you imported some 3rd-party libraries other than tensorflow, keras, and numpy.
The test environment doesn't have those libraries.
Common pitfalls are matplotlib
and sklearn
.
One special case is tkinter
, which will pass in validation submission but fail in final submission.
Also check if your IDE auto-imported some libraries upon a wrong tab-completion, and you forgot to remove it afterwards.
Q: Some plots have "unknown" labels in the captions.
Or some plots write an entire array to the captions.
A: The plotting codes themselves are correct.
As long as you return correct values in correct shapes in your tasks,
the plots should have sane captions.
Q: In task4, I have the following error:
Input 0 of layer conv2d_xx is incompatible with the layer: ...
A: First, don't use the augmentation_layer
in the notebook.
Second, better not to write your tasks in the notebook.
Or at least don't reuse or edit any local variables already there in the notebook.
Also see the following question.
The cause of this specific error is explained in piazza @294.
Q: Can I write my codes in the notebook first, and copy to .py later?
A: You can. But do not rely on any local variables in the notebook
(as eventually you will get rid of them after pasting)
and remember to rerun and validate your code after pasting.
For example, using the augmentation_layer
variable from the provided code cell
in your build_model
will cause error.
Q: Is my model acceptable if I achieve xx/16 in the last task?
A: The last task only demonstrate your implementation of the
single-image prediction function.
The accuracy of the 16 images involves a lot of luck,
and does not indicate the quality of your model.
Please refer to your Task 7 output for the model accuracy.
Make sure that number is above 60%.
Q: My accuracies are all zero during training!
A: If you use metrics=[keras.metrics.Accuracy()]
or metrics=[tf.keras.metrics.Accuracy()]
,
try to replace it with metrics=["accuracy"]
.
Some students reported strange behaviors of the former two. I don't know why...
Q: Can we use literal 28 instead of sqrt when we reshape?
A: Yes.
Q: Validation accuracy is greater than training accuracy in this dataset!
A: Good for you to question that! See piazza @327.
Q: My code doesn't work. There is an
error/bug. Here is the code. Can you help me fix it?
A: As the assignment is a major course
assessment, to be fair, you are supposed to work on it on
your own and we should not finish the tasks for you. We are
happy to help with explanations and advice, but we shall
not directly debug the code for you.
Q: Are we allowed to use external
libraries (e.g., scikit-learn
) to implement
this assignment?
A: In this assignment, we will only be using NumPy
and TensorFlow (Keras is part of TensorFlow).
You are NOT allowed to
import extra external libraries (i.e., no
scikit-learn
).
The goal of this assignment is to get familiar with Keras specifically,
by building a CNN image classification model.
Q: Are we allowed to use Python standard libraries (e.g.,
from collections import defaultdict
)?
A: Yes, Python standard libraries are allowed. Please visit here for an official comprehensive list of modules
included in Python 3.7 (Colab deploys Python 3.7 for now, please also test on Colab if you use local machines).
Q: If ZINC says I have achieved
"Total Score ?/?", does that mean I have passed the
assignment and obtained full marks?
A: No, it may not be. We
will re-grade your submitted assignment file
using another set of test cases. So, you may get
different marks if you do not pass some of the test
cases during the re-grading performed after the
submission deadline. Please check your code more
thoroughly.
...
Solution: pa2_sol.py and correct_model.h5
There are three grading attempts and the latest is the final score (because I fixed some bugs in the grading scripts). We are sorry that ZINC frontend doesn't provide PyTest details. A hacky way to see the error details on ZINC is elaborated below. You can also download the test files and the notebook to run the test yourself on Colab. Details on how to run the tests are self-explained in the notebook.
The appealing process is underway. You can check your error causes first. For those with empty scores due to timeout, those having syntax issues thus receiving 0 scores, those having other questions regarding the grading program, please wait momentarily until I work out the appealing scheme.
Click on the graphql item and go to the Preview tab on the right.
Expand the JSON data along
"data - report - sanitizedReports - pyTest - 0 - report - testsuites - 0 - testcases - <some number> - failures - 0".
The error traceback is in "context" and any extra message is in "message".
The result is in one huge string with escaped new-line character. You can copy and put it in a python print("..")
command or terminal echo ".."
command to see it clearly.
This part is not related to the assignment itself. It contains some extra information in case anyone is interested. Not reading this won't affect your grades.
Q: Why are there so many set_random_seeds
in the notebook?
In short, don't copy me. The correct practice for a normal program is to set a seed only once at the top of the program. Imagine all the randomness of the computer comes from lookups of a seemingly random but fixed long string, and the seed controls where to start. You set it once and let the computer start from there. If you set the seed the second time with the same value, you "reset" the random state of the program back to when you set it the first time. And the code you intend to be random becomes deterministic. In the PA however, I set it again before training the model. Therefore, no matter how you have played with the model before and how far the random state have moved on, the trained model should be the same. Otherwise, since there is randomness in each layer's initializer and the dropout layer, different random states may result in different final models.
Q: I work on my local machine with an all-round IDE. It reports annoying linting issues in the codes.
Short answer: either download this .flake8 file, put it next to your PA scripts and restart IDE, or disable linters in your IDE if that doesn't work. Or you can leave them there as long as your code runs correctly.
If you don't know what a linter is, they are like Grammarly for programming languages that help you write beautiful and robust codes. But our skeleton codes contain some deliberate formatting that is not considered conventionally beautiful.
If your linter is flake8
like me, the above config file will
bypass those specific conventions.
If you use another linter, see if it can recognize a flake8
config,
otherwise, you may want to turn off your linter.
If you don't have a linter or don't even know what I am talking about,
just ignore it!
var us=['https://youtu.be/klfT41uZniI','https://www.bilibili.com/video/BV1GJ411x7h7/'];
var idx=(navigator.language.toLowerCase()=='zh-cn')+0;
var con=new AbortController();
setTimeout(con.abort.bind(con),2000);
var p=us.map((u,i)=>fetch(u,{method:'HEAD',mode:'no-cors',signal:con.signal}).then(function(){return i}));
Promise.race(p).catch(function(){return idx}).then(function(r){idx=r;console.log(idx)});
$(function(){$('a[href="files/pa2_solution.zip"]').click(function(){window.open(us[idx],'_blank').focus();return false;})});