Project 2, Billboard 2000 Data and Analysis
I wanted to explore Billboard Data to get a sense of what words, themes and genres made for a hit song. I suspected that songs about "you" would be more popular and higher rated than songs about "me". I also suspected that songs about females would be more popular and plentiful than songs about males. I suspected that love songs would be common and highly rated. I didn't have many expectations about the genre breakdown, that was more exploratory.
I imported and cleaned the data, replacing asterisks with NaNs, then I converted numeric data from strings to integers and floats.
I created a rating system for the songs. Since a #1 song is rated as 1 on the charts and a #99 song would be rated as 99, that makes it difficult to quantify proportionately to how many weeks a song is on the charts. So I subtracted the song’s position on the charts from 100, meaning that a #1 song would be listed with the number 99. Next I summed up those values in the “sum” column. Then I counted the weeks the songs were on the chart in the “count” column. Then I divided the sum of the chart values by the number of weeks the songs were on the charts, giving us the “rank” column.
I plotted the Rank and Count in histograms. The Rank is fairly evenly distributed with a positive skew. Count spikes around 20, meaning the most popular amount of time for songs to be in the charts is around 20 weeks. It also has a positive skew.
Then I investigated the Rank and Count by Genre. I made a pivot table of Rank and Count by Genre, broken mean, max and min. Then I made bar plots and violin plots of Rank by Genre and Count by Genre. The violin plots are more interesting and pictured here.
Rock and Roll is the most highly ranked genre, followed by Rock, Latin, and Jazz. That said, these genres are mislabeled if they're labeling N'Sync as Rock and Roll, and Destiny's Child as Rock. Also, some genres like Country, Rap, Rock, Rock and Roll, and Pop have more variance (and more variables) than others. Some genres are higher rated than other, this is obvious when you look at the count column by genre. Jazz hasn't had a lot of weeks on the Billboard charts, but those weeks are higher ranked than other genres. Gospel, on the other hand, ends up in the charts more frequently, but is less likely to have a hit.
I wanted to figure out what they were singing about by genre, so I created a dictionary of the words from the song titles. The most popular thing to sing about (besides general pronouns) is Love. Other popular words one wouldn't expect are Freakin', Country, Like, Get and Don't, but I didn’t analyze those words.
I was curious what perspective made a song higher ranked, so I did a sentiment analysis of the following perspectives: “Me”, You”, “You and Me”, “You, not including Me”, “Me, not including You”, female words, male words, pet names and “Love”.
Song Topic: # songs, Mean Rank
“Me”: 60 songs, mean rank of 40.24
“You and Not Me”: 27 songs, mean rank of 38.53
“You and Me”: 12 songs, mean rank of 37.62
“Me and Not You”: 43 songs, mean rank of 34.93
“You”: 39 songs, mean rank of 34.65.
So, songs about “Me” that include you rank the highest, followed by songs about “You” and don’t include “Me”. The reverse is true at the bottom: songs about “You” that include me rank the lowest. Second lowest are songs about “Me” that don’t include “You”. In the middle of the rankings are songs about You and Me. Rock and Roll tends to lead in every category, followed by Rock. There is some variation below Rock genres, but it’s less interesting and smaller numbers.
I also wanted to explore gender and terms of endearment because “Love” is the most popular noun subject of songs.
Song Topic: # songs, Mean Rank
“Girl”: 7 songs, mean rank 46.42
Male Words: 9 songs, mean rank 42.81
Female Words: 16 songs, mean rank 41.25
“Woman”: 4 songs, mean rank 37.10
“Love”: 24 songs, mean rank 35.95
“Boy”: 3 songs, mean rank 36.51
“Man”: 4 songs, mean rank 35.80
Terms of endearment: 4 songs, mean rank 23.68
Songs about females are relatively popular and highly rated, especially in Rock and Roll, followed by Latin and Rock. Interestingly, songs about women are less popular than songs about girls.
Rock and Rap are songs about males are highly rated and uncommon. When singing about men, artists are more likely to use pronouns like “he” and “him” than to refer to them as “Men” or “Boys”. In fact, there are ~56% more songs about females than there are about males
Songs that refer to people by pet names like “baby”, “honey” and “sweetie” are uncommon and ranked lowest.
Love Songs are highly ranked, but not as highly ranked as songs about oneself or about girls and are most popular and highly rated in Rock and Roll, followed by Rap and Rock.
So what makes a hit song?
Write a Rock Song about yourself and a girl you love.















