GenreClassification

Overview

This project involves developing a "genre classification" system that will predict the genre of a song from its lyrics. A song's genre is a categorical description of the "type" of music it is, and examples of genre include "hip hop," "jazz," or "rock." Internet radio services such as last.fm and Spotify use genre prediction to create a station that plays songs from a user-supplied genre.

The dataset includes the lyrics and genre of over 12,000 songs in a bag-of-words format. Each song belongs to one of 10 genres, and your job is to predict the genre for each song in the test set.

There are several features of this project that make it particularly interesting and fun for you:

Transductive setting. You are given (limited) access to some of the test data ahead of time, allowing you to incorporate the statistics of the test data into your methods if you so desire. For example, you can run PCA on the word frequencies for the entire dataset, not just on the training data.

Additional features. In addition to the lyric features, we are releasing the audio features of each song. Using audio in conjunction with the lyric features may give you an edge over the other groups. A quick description of each audio feature is given below.

Music is awesome.

The format of the project is a competition, with live leaderboards (see below for more details).

Project Rules and Requirements

Rules and Policies

  • You CANNOT download or harvest any additional training data from the internet. Since song lyrics and the MSD are both freely available, you may think that you can use additional lyric or audio features to train your model or to match songs in the test set with songs on the internet. Both of these will be considered cheating and the penalty will be very harsh. Please, you MUST ONLY use the data we provide you with. We will test your final classifier, and if it is clear that we cannot replicate your performance because you use additional data, you will get a ZERO for the project.
  • Except when specified otherwise, you are allowed to download additional code or toolboxes from the internet, however, you must cite everything you use in your final project report. We don't want you to reinvent the wheel. If you're unsure of what extra resources are allowed, please ask us.
  • You must work in groups of 2-3 people. No single competitors or groups with more than 3 people will be allowed.
  • Before you can submit to the leaderboard, you need to register your team using turnin (described below).
  • In the competition, you need to reach a certain absolute score to get full credit. Placing particularly well will increase your project grade even further. First place gets 10%, second 8%, third 7%, and the rest of the top 10 teams 5% extra credit added to the project grade. The top 3 teams will also get awesome prizes.
  • You will need to ensure that your code will run from start to finish on our server, so that we can reproduce your result for grading. We will provide a utility so that you can make sure we can run your code successfully. See below for details.

Overall requirements

The project is broken down into a series of checkpoints. There are four mandatory checkpoints (Nov. 14th, Nov. 16th, Nov. 27th, and Nov. 30th). The final writeup is due Dec. 3rd. The leaderboards will be operating continuously so you can monitor your progress against other teams and towards the score based checkpoints. All mandatory deadlines are midnight. So, the deadline “Nov. 16th” means you can submit anytime before the 16th becomes the 17th.

  • 1% - Nov. 14, run turnin -c cis520 -p proj_groups group.txt to let us know your team name. The file should contain a single line: the team name. In order to post scores to the leaderboard, you must have submitted your team name and have 2–3 members of the team total.
  • 9% - Nov. 16, Checkpoint: Beat the baseline Quiz score of 0.2800 mean rank error by any margin.
  • 20% - Nov. 27, Checkpoint: Beat the minimum Quiz score threshold of 0.1850 mean rank error by any margin.
  • 50% - By the final submission (Nov 30), implement (or download and adapt an implementation of) at least 4 of the following:
    • A generative method (NB, HMMs, k-means clustering, GMMs, etc.)
    • A discriminative method (LR, DTs, SVMs, etc.)
    • An instance based method (kernel regression, etc., anything other than the given KNN)
    • Your own kernel (other than the ones such as linear, polynomial, RBF, and sigmoid provided with libsvm)
    • An unsupervised method to generate features or reduce dimensionality of the data (for example, clustering words or running SVD, then using these clusters as new features.)
    • A method that uses additional features, such as bigrams or audio features
  • 20% - Dec 3 Final report: It should be between 2 and 5 pages and should include:
    • Results for each method you tried (try to use checkpoints to get Quiz set accuracy for each of your methods)
    • Analysis of your experiments. What worked and what didn’t work? Why not? What did you do to try to fix it? Simply saying “I tried XX and it didn’t work” is not enough.
    • An interesting visualization of one of your models. For instance, find the words with the most importance for a genre or the means of a clustering model.
  • Extra credit - In the competition, placing well will increase your project grade. First place gets 10%, second 8%, third 7%, and the rest of the top 10 teams 5% extra added to the project grade.

Evaluation

Error metric

Your predictions will be evaluated based on their mean reciprocal rank. Your code should produce an Nx10 matrix of ranks. Each row {$\hat{\mathbf{y}}^{(i)} $} is a ranking of the {$ K=10 $} genre labels in decreasing order of confidence for example {$i$}. If the position of the true genre in your ranking vector is given by {$r_i$}, the mean rank loss over your classifier's predictions is:

{$ \mbox{rank loss} = \frac{1}{N}\sum_{i=1}^N 1-\frac{1}{r_i}$}

The intuition is that for a given true genre, even if you don't predict it exactly, it's better for your prediction to give a high rank to the true class. An example of how to generate the ranking matrix is given in the starter kit.

Final evaluation

We have partitioned the dataset into three subsets: training, quiz, and test. The starter kit includes the features and labels for the training set, and the features for the quiz set. However, we are not releasing the final test set! For the final submission, you will submit your code, and we will evaluate your classifier on the withheld test data to determine your final overall performance.

Requirements for Each Checkpoint

For the second and third checkpoints, you must only submit to the leaderboard(s). For the final checkpoint, you must submit ALL of your code via turnin to the correct project folder. Make sure that you submit any code that you used in any way to train and evaluate your method. After the first checkpoint, we will be opening up an autograder that will check the validity of your code to ensure that we'll be able to evaluate it at the end.

Each code submission should have a file named predict_genre.m that takes the training and test sets and returns a ranking matrix. Any training should be done prior to submission, and your trained classifier must be saved in a format that can be used directly predict_genre.m. If you are using an instance-based method like KNN, the quiz set(s) will be passed to the function and can also be used.

Constraints on checkpoint submissions:

  • {$\mathbf{< 50}$}MB - Total submission should be smaller than 50MB
  • 64-bit linux Matlab R2010a compatible - Without any supervision by us, your code should run on a 64-bit Linux server, since this is the type of machine we will test it on. In other words, if you download tools that use compiled MEX, you must compile it beforehand on a 64-bit machine to get .mexa64 compiled code. If your code runs on biglab, then it will run on our test machine, so test there before submitting to a checkpoint.
  • Your code must run in under 5 minutes. Remember that you are training a model prior to submission, so 5 minutes is more than enough time to classify the given test set.

Be careful that you include in your submission all files you need for your code to run. If you download the starter kit and work entirely out of the code directory, you should be fine. For example, if your algorithms require libsvm, you must ensure that libsvm exists inside the code directory.

You can submit your code as often as you'd like in order to check its correctness, but you will only be able to submit to the leaderboard once every 5 hours.

Detailed Instructions

Download the starter kit

You can download the starter kit here: http://alliance.seas.upenn.edu/~cis520/fall12/project_starter_kit.zip

Take a look at run_submission.m file in the starter code directory to get an idea of how to use code we gave you, and look over how the various components of how the simple baseline method works. You should be able to understand what all the code is doing. We will be discussing the project and the kit during recitation Friday, so please make sure to come to this recitation.

Register your team name

Before you can get results on the leaderboard, you need to submit your team name. Everyone on your team is required to do this. Simply create a text file on eniac with your team name as follows:

$ echo "My Team Name" >> group.txt
$ turnin -c cis520 -p proj_groups group.txt

This group.txt file should be raw text and contain only a single line. Do not submit PDFs, word documents, rich text, HTML, or anything like that. Just follow the above commands. If you have a SEAS email address, then you will get an email confirmation.

Submit to the leaderboard

To submit to the leaderboard, you should submit the file submit.txt which has 10 numbers per line for each example in the test set. An example of how to generate this file is in the starter kit.

Once you have your submit.txt, you can submit it with the following:

turnin -c cis520 -p leaderboard submit.txt

Your team can submit once every 5 hours, so use your submissions carefully. Your submission will be checked against the reference solutions and you will get your score back via email. This score will also be posted to the leaderboard so everyone can see how awesome you are.

You can view the current leaderboard here: http://www.seas.upenn.edu/~cis520/fall12/leaderboard.html

Submit your code for the final checkpoint or to test correctness

You must submit your code for the final checkpoint. You can do so with the following:

turnin -c cis520 -p project <list of files including make_final_prediction.m>

You will receive feedback from the autograder, exactly like the homework.

Audio Features

In addition to the bag-of-words lyric features for each song, we will also be providing a set of simple audio features. There is code in the starter kit to get the matrix of audio features for the dataset. While you only have to beat the minimum baseline with audio for the final submission, the best classifiers will use both sets of features to make predictions. There is a lot of information contained in the audio that is not present in the lyrics, and vice versa.

Here is a quick description of the features we are making available:

  • Loudness: the average energy of the song, measured in dB.
  • Tempo: the speed of the song, measured in beats per minute.
  • Time signature: time signature of the song. The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4. A value of -1 may indicate no time signature, while a value of 1 indicates a rather complex or changing time signature
  • Key: the estimated key of the track. A value of 0 corresponds to C, and a value of 11 corresponds to B.
  • Mode: whether the key is minor or major. A value of 0 corresponds to minor, and a value of 1 corresponds to major.
  • Duration: how long the song is, in seconds.
  • Timbre (pronounced "TAM-bir"): the spectral shape of the sound. Using a fast Fourier transform, a short segment of audio can be transformed to its frequency domain representation. The timbre of a piece of audio is its overall envelope in the spectral domain, and generally corresponds to the "tone" of the sound. A song that sounds "muddled" will have a lot of energy in the lower part of the spectrum, and a song that sounds "bright" will have a lot of energy in the upper part of the spectrum. The spectrum of the whole song is binned into 12 buckets, and the mean and variance of each bucket gives the 24 timbre features that are in the dataset.

FAQ

How do I debug my code?

We'll post tips here.

Can we use late days for project checkpoints?

You may not use late days for project checkpoints. (Aside from being incompatible with the nature of the competition, it is logistically difficult to apply a fair late day policy to groups with multiple people.)