Kapil Garg, Aaron Karp, Yuxuan Xiao
EECS 349: Northwestern University

Our task is to generate predictions for games a user on Steam might like based on the games they currently own. Currently, there is not a reputable, automated recommendation system for video games; users primarily buy games based on reviews of friends and review sites. We believe that this can be improved by using player data from Steam, the largest gaming platform for PC users.

Our data set includes the Steam libraries of around ~10,000 users. Each instance has 2,283 binary attributes, either 0 or 1, representing a user’s ownership of a specific game. We used a random 80/20 split to generate our training and test sets, on which we trained and evaluated our classifiers. As some of the games had few players, as little as 1% of the total size of the data set, we supplemented our accuracy results with precision, recall, and f-measure, which better reflects the quality of a classifier.

The classifiers we evaluated include kNN, Naive Bayes, Decision Trees, and Bayes Net. As a baseline, we used ZeroR. Apart from kNN, which we implemented in Python using bit strings for speed, the other classifiers were from Weka. We tried each classifier with a set of 500 games as target attributes. The first 50 are the first 50 games in the data set, which have many players due to their age. The other 450 games were picked with a pseudo-random number generator to easily reproduce the results between runes. Since games varied widely in the number of owners, the most important measure of success were the f-measures for both owning and non-owning the class. ZeroR always has an f-measure of 1 for one attribute value and 0 for the other. Better classifier had high f-measures for both possible attribute values (0 or 1).

Though the classifiers we found tended to vary in accuracy and f-measure depending on the number of owners of a game, looking at the results for the f-measure for owning the game over the 500 games showed that decision trees was the best classifier, with the highest average f-measure of .474 and the highest upper and lower quartiles from 0.303 to 0.638.

Try it out

Input your steam ID below, and hit "Submit". Your library will be run through our KNN algorithm and you will recieve a list of suggested games. Please allow up to a minute for runtime.

(If you want to test it out but don't have a steam account, you can use one of our team member's IDs: paep3nguin, adawgizzleindahous, RadixAlpha)