How to approach a problem: self-indulgent music recommendations.
I’ve been thinking a lot about music recommendations lately, and I realized that I’m usually a little bearish about listening to recommended bands that I’ve never heard of before. Maybe it’s just because I listen to a pretty broad variety of music, but I love re-discovering a band that I know but haven’t thought of in a while. So with that, let’s build a 100% self-centered music recommender. The goal is to remind myself of some bands that I might like to play next based on what I’m listening to right now.
Fortunately for me, I’ve used last.fm to record the last 135,000+ tracks that I’ve listened to over the course of 7 years (my “now playing” is listed at the top of this page). And even more fortunately, they let you grab your entire history via their API. I was actually able to get 127,873 of them, which is more than plenty to work with. So let’s check out which artists I’ve listened to the most and see how well it matches up with my last.fm profile:
Artist Plays ---------------------------- Tom Waits 3155 Justin Townes Earle 2613 Iron & Wine 2053 M. Ward 2005 Lucero 1832 Old 97's 1761 The Black Keys 1755 Beach House 1624 Death Cab for Cutie 1592 Ryan Adams 1527
The first thing that should become clear is that I listen to a lot of Sad Bastard music. At least both Dillinger Four and Samiam are in the top 20.
When designing this recommender, I’m going to try to answer the following question: Given the artist I’m currently listening to, what have I generally listened to next?
Now since this is specific to me, I’m going add a few constraints. The first of which is this: I prefer listening to full albums rather than individual tracks. I’m not going to recommend songs, I’m going to recommend artists. Because of that, the only important attributes I need for each track are the time that I listened to it and the artist.
Let’s look at the song-to-song transitions. That is, given that I’m listening to a song by The Antlers, which band am I likely to listen to next? This table shows the number of times I transition from listening to an Antlers song to each artist on the list, as well as the probability of the transition.
artist transitions transition_prob The Antlers 854 0.870540 Beach House 22 0.022426 Arcade Fire 5 0.005097 Patrick Watson 4 0.004077 The Tallest Man on Earth 3 0.003058 Okkervil River 3 0.003058 The Avett Brothers 3 0.003058 Carla Bruni 3 0.003058 Pinback 2 0.002039 North Highlands 2 0.002039
This simply confirms what I stated earlier: when I listen to music, I listen to full albums. 87% of the time that I listen to an Antlers song, I listen to another one of their songs next. That’s not helpful for recommendations, so I’ll add another constraint: I’m only interested in transitions where the artists are not the same. Now the above list looks like this:
artist transitions transition_prob Beach House 22 0.173228 Arcade Fire 5 0.039370 Patrick Watson 4 0.031496 The Tallest Man on Earth 3 0.023622 The Avett Brothers 3 0.023622 Okkervil River 3 0.023622 Carla Bruni 3 0.023622 Pinback 2 0.002039 North Highlands 2 0.002039 Fleetwood Mac 2 0.015748
The order is the same as before, but the transition probabilities are much higher. This is a reasonable list of artists to recommend to someone who listens to The Antlers. Even last.fm has Beach House and Okkervil River in the top related artists.
We’re doing well so far, but let’s see if we can make it a little better with just a bit more work. Beach House is the top recommendation, but I listen to them a lot . Of all of the tracks I’ve recorded, 1.27% of them are Beach House tracks. Considering there are a total of 1,342 unique artists in my data set, that means I’m \(\frac{0.0127}{1 / 1342} = 17 \) times more likely to listen to Beach House than the “average” band.
So let’s use this information by dividing each transition probability by the unconditional probability of listening to a given artist. The unconditional probability is simply the total plays for each artist divided by the total number of plays (1624/127873 = 0.0127 for Beach House). The equation for the ranking has now become the following, where \( Pr(artist | Antlers) \) means the probability of listening to a given artist immediately after listening to The Antlers:
\[ \frac{Pr(artist | Antlers)}{Pr(artist)} \]
When I divide by the unconditional probability, I will give weight to artists that I listen to less often overall, making the results a little more exciting. If I multiply by this probability, however, I’ll give extra weight to artists I listen to more often, making the results more familiar. It’s probably important to point out that this is a bit of a hack that I came up with while typing up this blog post, and shouldn’t be confused with Bayes’ Theorem even though it looks sort of related. Anyway, let’s see what these rankings look like:
original less familiar more familiar ------------------------------------------------------------------------------ Beach House April March Beach House Arcade Fire Broken Bells Tom Waits Patrick Watson Beach House Arcade Fire The Tallest Man on Earth Army of Ponch Patrick Watson The Avett Brothers Mineral The Tallest Man on Earth Okkervil River Carla Bruni Okkervil River Carla Bruni Arcade Fire Bon Iver Pinback North Highlands The Avett Brothers North Highlands The Murder City Devils Dillinger Four Fleetwood Mac Pinback Band of Horses
You can see that Tom Waits moved up on the chart on the right because I’ve listen to him more than anybody else. For the middle list, however, there’s lots of stuff that didn’t even make the original cut. A band like Army of Ponch might not seem like the best recommendation to someone currently listening to The Antlers, but I’ve made that transition twice and might want to again.
While we’re at it, here’s the list of recommendations for Bruce Springsteen:
original less familiar more familiar ---------------------------------------------------------------------- Chuck Ragan Buddy Holly Tom Waits Built to Spill Buckingham Nicks Chuck Ragan Camera Obscura Sam Cooke Built to Spill Tom Waits The Jayhawks Wilco Wilco Chuck Ragan Camera Obscura Okkervil River Bridge and Tunnel Okkervil River Mean Creek Built to Spill Death Cab for Cutie Death Cab for Cutie Mastodon Old 97's Dan Auerbach Camera Obscura Ryan Adams Bridge and Tunnel Iron & Wine and Calexico Spoon
Just reading that list reminds me that Mean Creek has a new record out that I’m going to listen to right now.
So which list is best? How does it compare to standard information retrieval techniques? Well, that’s probably different for each person and the only way to find out is to test it. I could (and might eventually) put together a little app that recommends me some artists from my listening history based on what I’m listening to right now. With a simple A/B test, I could see which of the three recommendation algorithms I follow most often and stick with that one in the future. To do that, I would have to record
The artist that is currently playing
Which artists recommendations are displayed
Which recommended artist (if any) was played next
The recommendation algorithm that provides the highest play / display ratio is the one I’d like to go with in the future. This seems like an obvious place to plug sifter for performing A/B and other types of testing in scenarios like this.
The point of this blog post is more about the thought process than the technical parts of the recommender. There are lots of things that I could have done “right,” like using properties of Markov Chains (which is essentially what I built) to improve the system, or account for the fact that Buddy Holly follows Bruce Springsteen in my music library, so maybe that isn’t a true transition.
I think the main takeaway is really in the constraints that I put on the system. The idea for this one-day project followed the following course:
Build a music recommendation system
Build a music recommendation system that only uses my last.fm data
Build a music recommendation system that only recommends music I already know
Build an artist recommendation system, not songs
Only recommend artists that I’ve listened to immediately after the given artist
Come up with a few simple variations and test them for performance
Identifying the problem correctly let me build something in just a few hours. Is it the best recommender system the world has ever seen? Actually, it might be, because I’ve never seen any recommender that only suggests content that you are already familiar with and that’s what I wanted. But we’d have to test it against the likes of Last.fm, Spotify, and Pandora to find out.
As I said in the beginning, I’ve been thinking about this stuff a lot lately, but that’s not to say I’ve put this method into production anywhere. The code for all of this was done using python/pandas, and breaks pretty much every rule that I laid out in my previous blog post, so I’ll clean that up and get it posted soon.