#!/usr/local/bin/php
Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /cgihome/cis520/html/dynamic/2016/wiki/pmwiki.php on line 691
Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /cgihome/cis520/html/dynamic/2016/wiki/pmwiki.php on line 694
Warning: Use of undefined constant MathJaxInlineCallback - assumed 'MathJaxInlineCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 84
Warning: Use of undefined constant MathJaxEquationCallback - assumed 'MathJaxEquationCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 88
Warning: Use of undefined constant MathJaxLatexeqrefCallback - assumed 'MathJaxLatexeqrefCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 94
Project /
Genre ClassificationOn this page… (hide) OverviewThis project involves developing a “genre classification” system that will predict the genre of a song from its lyrics. A song’s genre is a categorical description of the “type” of music it is, and examples of genre include “hip hop,” “jazz,” or “rock.” Internet radio services such as last.fm and Spotify use genre prediction to create a station that plays songs from a user-supplied genre. The dataset includes the lyrics and genre of over 12,000 songs in a bag-of-words format. Each song belongs to one of 10 genres, and your job is to predict the genre for each song in the test set. There are several features of this project that make it particularly interesting and fun for you: Transductive setting. You are given (limited) access to some of the test data ahead of time, allowing you to incorporate the statistics of the test data into your methods if you so desire. For example, you can run PCA on the word frequencies for the entire dataset, not just on the training data. Additional features. In addition to the lyric features, we are releasing the audio features of each song. Using audio in conjunction with the lyric features may give you an edge over the other groups. A quick description of each audio feature is given below. Music is awesome. The format of the project is a competition, with live leaderboards (see below for more details). Project Rules and RequirementsRules and Policies
Overall requirementsThe project is broken down into a series of checkpoints. There are four mandatory checkpoints (Nov. 14th, Nov. 16th, Nov. 27th, and Nov. 30th). The final writeup is due Dec. 3rd. The leaderboards will be operating continuously so you can monitor your progress against other teams and towards the score based checkpoints. All mandatory deadlines are midnight. So, the deadline “Nov. 16th” means you can submit anytime before the 16th becomes the 17th.
EvaluationError metricYour predictions will be evaluated based on their mean reciprocal rank. Your code should produce an Nx10 matrix of ranks. Each row {$\hat{\mathbf{y}}^{(i)} $} is a ranking of the {$ K=10 $} genre labels in decreasing order of confidence for example {$i$}. If the position of the true genre in your ranking vector is given by {$r_i$}, the mean rank loss over your classifier’s predictions is: {$ \mbox{rank loss} = \frac{1}{N}\sum_{i=1}^N 1-\frac{1}{r_i}$} The intuition is that for a given true genre, even if you don’t predict it exactly, it’s better for your prediction to give a high rank to the true class. An example of how to generate the ranking matrix is given in the starter kit. Final evaluationWe have partitioned the dataset into three subsets: training, quiz, and test. The starter kit includes the features and labels for the training set, and the features for the quiz set. However, we are not releasing the final test set! For the final submission, you will submit your code, and we will evaluate your classifier on the withheld test data to determine your final overall performance. Requirements for Each CheckpointFor the second and third checkpoints, you must only submit to the leaderboard(s). For the final checkpoint, you must submit ALL of your code via turnin to the correct project folder. Make sure that you submit any code that you used in any way to train and evaluate your method. After the first checkpoint, we will be opening up an autograder that will check the validity of your code to ensure that we’ll be able to evaluate it at the end. Each code submission should have a file named Constraints on checkpoint submissions:
Be careful that you include in your submission all files you need for your code to run. If you download the starter kit and work entirely out of the code directory, you should be fine. For example, if your algorithms require libsvm, you must ensure that libsvm exists inside the code directory. You can submit your code as often as you’d like in order to check its correctness, but you will only be able to submit to the leaderboard once every 5 hours. Detailed InstructionsDownload the starter kitYou can download the starter kit here: http://alliance.seas.upenn.edu/~cis520/fall12/project_starter_kit.zip Take a look at Register your team nameBefore you can get results on the leaderboard, you need to submit your team name. Everyone on your team is required to do this. Simply create a text file on $ echo "My Team Name" >> group.txt $ turnin -c cis520 -p proj_groups group.txt This Submit to the leaderboardTo submit to the leaderboard, you should submit the file Once you have your
Your team can submit once every 5 hours, so use your submissions carefully. Your submission will be checked against the reference solutions and you will get your score back via email. This score will also be posted to the leaderboard so everyone can see how awesome you are. You can view the current leaderboard here: http://www.seas.upenn.edu/~cis520/fall12/leaderboard.html Submit your code for the final checkpoint or to test correctnessYou must submit your code for the final checkpoint. You can do so with the following:
You will receive feedback from the autograder, exactly like the homework. Audio FeaturesIn addition to the bag-of-words lyric features for each song, we will also be providing a set of simple audio features. There is code in the starter kit to get the matrix of audio features for the dataset. While you only have to beat the minimum baseline with audio for the final submission, the best classifiers will use both sets of features to make predictions. There is a lot of information contained in the audio that is not present in the lyrics, and vice versa. Here is a quick description of the features we are making available:
FAQHow do I debug my code?We’ll post tips here. Can we use late days for project checkpoints?You may not use late days for project checkpoints. (Aside from being incompatible with the nature of the competition, it is logistically difficult to apply a fair late day policy to groups with multiple people.) |