Project

YOU HAVE REACHED THIS PAGE IN ERROR.

IT IS FROM A PREVIOUS YEAR AND IS NOT RELEVANT.

Overview

The project this year is to predict XXX from YYY; see README.txt for details. The format of the project is a competition, with live leaderboards (see below for more details).

Project Rules and Requirements

Rules and Policies

  • You CANNOT download or harvest any additional training data from the internet. Both of these will be considered cheating and the penalty will be very harsh. Please, you MUST ONLY use the data we provide you with. We will test your final classifier, and if it is clear that we cannot replicate your performance because you use additional data, you will get a ZERO for the project.
  • Except when specified otherwise, you are allowed to download additional code or toolboxes from the internet, however, you must cite everything you use in your final project report. We don't want you to reinvent the wheel. If you're unsure of what extra resources are allowed, please ask us.
  • You must work in groups of 2-3 people. No single competitors or groups with more than 3 people will be allowed.
  • Before you can submit to the leaderboard, you need to register your team using turnin (described below).
  • In the competition, you need to reach a certain absolute score to get full credit. Placing particularly well will increase your project grade even further. First place gets 10%, second 8%, third 7%, and the rest of the top 10 teams 5% extra credit added to the project grade. The top 3 teams will also get awesome prizes -- or at least public recognition.
  • You will need to ensure that your code will run from start to finish on our server, so that we can reproduce your result for grading. We will provide a utility so that you can make sure we can run your code successfully. See below for details.

Overall requirements

The project is broken down into a series of checkpoints. There are four mandatory checkpoints and a final writeup which is due Dec. 11th. The leaderboards will be operating continuously so you can monitor your progress against other teams and towards the score based checkpoints. All mandatory deadlines are midnight. So, the deadline “Nov. 19th” means you can submit anytime before the 19th becomes the 20th.

  • 1% - Nov. 21, run turnin -c cis520 -p proj_groups group.txt to let us know your team name. The file should contain a single line: the team name. In order to post scores to the leaderboard, you must have submitted your team name and have 2–3 members of the team total.
  • 9% - Nov. 27, Checkpoint: Beat the baseline 1 by any margin.
  • 20% - Dec. 6, Checkpoint: Beat the baseline 2 by any margin.
  • 50% - Dec. 7, By the final submission (Dec 7), implement (or download and adapt an implementation of) at least one model from each of the 4 following groups. By 11:59 PM, submit your implemented models on Gradescope. We should be able to immediately and manually run both training and testing for each of your 4 algorithms easily. Please include a README or PDF briefly describing the models you trained and documenting which files correspond to each algorithm and instructions on how to run and test with them. (You do not need to submit the final report at this time.). From these 4 or more models submitted to Gradescope, they do not necessarily have to include the model used for your leaderboard, but you may!
NOTE: There is only ONE submissions expected on Dec. 7th to Gradescope (check under the section header Submitting your code on Gradescope for submission instructions).
  • A generative method (NB, HMMs, k-means clustering, GMMs, etc.)
  • A discriminative method (logistic regression, decision trees, SVMs, etc.)
  • An instance-based method (kernel regression, k-nearest neighbors, etc.)
  • Something with a novel component (e.g. different kernel, regularization, ...)
  • 20% - Dec 10 Submit the final report as a PDF to Gradescope by 11:59 PM . The final report should be between 3 and 5 pages and should include all of the following.
    • A table with results for each method you tried (try to use checkpoints to get test set accuracy for each of your methods)
    • Analysis of your experiments. What worked and what didn’t work? Why not? What did you do to try to fix it? Simply saying “I tried XX and it didn’t work” is not enough. How did you handle the cost function?
  • Extra credit - In the competition, placing well will increase your project grade. First place gets 10%, second 8%, third 7%, and the rest of the top 10 teams 5% extra added to the project grade.

Evaluation

Error metric

Your predictions will be evaluated based on a weighted loss function, specified in the attached notes.

Requirements for Each Checkpoint

For the checkpoints, you must submit to the leaderboard(s). For the final checkpoint, you must submit ALL of your code via turnin to the correct project folder. Make sure that you submit any code that you used in any way to train and evaluate your method. We will be opening up an autograder that will check the validity of your code to ensure that we'll be able to evaluate it at the end.

Detailed Instructions

Download the starter kit

You will be able to download the starter kit here: https://canvas.upenn.edu/files/72891513/download?download_frd=1

You will be submitting your code to the auto-grader, which will execute your code on the test set and generate a vector of predictions. The auto-grader then will compare the predictions with the ground truth. You will receive an email enclosing the accuracy and the accuracy will be recorded on the leaderboard.

Register your team name

Before you can get results on the leaderboard, you need to submit your team name. Everyone on your team is required to do this. Simply create a text file on eniac with your team name as follows:

$ echo "My Team Name" > group.txt
$ turnin -c cis520 -p proj_groups group.txt

This group.txt file should be plain text and contain only a single line. Do not submit PDFs, word documents, rich text, HTML, or anything like that. Just follow the above commands. If you have a SEAS email address, then you will get an email confirmation.

Submit to the leaderboard

Submit all your code files including predict_labels.m and all supporting files needed (models):

turnin -c cis520 -p leaderboard *

Your team can submit once every 3 hours, so use your submissions wisely. Your submission will be checked against the reference solutions and you will get your score back via email. This score will also be posted to the leaderboard so everyone can see how awesome you are.

You can view the current leaderboard here: http://www.seas.upenn.edu/~cis520/fall18/leaderboard.html

Competition ends at Dec 7 11:59PM. You will still be able to submit to autograder and test your codes for building project report, but they will not be considered anymore for the top position prizes.

Submitting your code on Gradescope

Please submit a .zip file containing the following items by Dec 7 11:59PM on Gradescope. Only one submission per group, any team member can submit. Please add your teammates in a group on Gradescope.

Files to be included:

  • 1) group.txt -- the same file submitted for leaderboard containing team name and team members
  • 2) matlab code files for 4 or more models implemented, and any auxiliary files
  • 3) a README or PDF briefly describing the models you trained and documenting which files correspond to each algorithm and instructions on how to run and test with them

Submitting your project report on Gradescope

Submit one pdf file of your group's project as described above by Dec 10 11:59PM. One submission per group, add your teammates in a group on Gradescope.

The final rankings will be released on the day of the prize ceremony, Dec. 10.