COMP 2211 Exploring Artificial Intelligence

Lab 3 K-Nearest Neighbors

Review


This part of this lab is a review of the K-Nearest Neighbors. It aims to refresh your memory of what you have learned in class.

  • K-Nearest Neighbors
    • Computation Steps
    • Standardization
    • Outlier

Please download the notebook by right-clicking and selecting "Save link as" and opening it using using Google Colab. You should see the following if you open the notebook successfully.

Card image cap

Introduction

People are always striving to make money for a living. The income status of a person can be impactful.
http://archive.ics.uci.edu/ml/datasets/Adult

In this lab, we are going to use a fraction of the 1994 Census database to build a K-nearest Neighbors Classifier for predicting whether a person makes over 50k a year.


Lab Work


A couple of lab tasks are given to you to practice your skills in processing data and to build an AI model using KNN. Please download the notebook, the dataset to be used as well as the prediction file for answer checking and open it using Google Colab. You should see the following if you open the notebook successfully.




Submission & Grading

This is an odd-numbered lab, so there is no need to submit anything. Have fun playing with the notebooks! ;)

Frequently Asked Questions

  • UPDATES on lab3_tasks.ipynb (Please download the latest version):
    - Error of using the metrics module:
       add   from sklearn import metrics
    - Self checking on predicted result:
       change to   TA_y_pred = pd.read_csv('y_pred.csv', header=None).to_numpy()
  • Q: Do we perform standardization on all attributes or just the attributes with large values?
    A: We are performing it on all attributes in order to transform them into a comparable scale. Thus all attributes can contribute equally to the model prediction. If we only do it on attributes with large ranges, they are still likely to have different scales with the unstandardized ones.
  • Q: In lab3_tasks.ipynb, it stated "the attributes with text values need to be converted into float type to create the vector representation", may I confirmed whether we need to transform the datatype of pandas dataframe from str to float or to int?
    A: To int during transformation on dataframe and to float on later numpy operation. We need to convert the data information to float type matrix eventually. But regarding only the encoding of str value in pandas’ dataframe, int datatype is expected.
  • Q: Can you give us some basic introduction to pandas?
    A: I will give an introduction to some pandas basic functions during Lab3. Please come to the tutorials if you wish to know more :)

This list is incomplete; you can help by expanding it.

Page maintained by
Homepage