Music festivals makes up a large chunk of the $20B global concert industry. Within the last decade, music festivals have grown into a major moneymaker in a competitive industry that sees hundreds of such events each year in the U.S. There are the big ones—Coachella, Lollapalooza, Outside Lands, Governors Ball—with big-ticket prices, multiple stages, camping options and nearly endless lists of performers.
As music festivals have taken off in popularity, these kids of experiences have become a key feature of cultural life in America. Since I currently live in Austin, TX, home to two of the biggest music festivals - South by Southwest (SXSW) and Austin City Limits (ACL), I decided to analyze the database of a music festival for my personal project.
Music festivals make most of their money through income generated from ticket sales, sponsorship, advertising, concession fees and merchandise. For the festival producer, gathering this data and insights can help improve the sustainability of the festival and provide potential cost reduction benefits. On the revenue side, the insights could lead to better attendee experiences and drive more returning consumers: reducing churn and thereby increasing the lifetime value of festival-goers.
I used the class_music_festival database, which consisted of 6 tables. Below are the different tables and a brief description of them. You can find the full dataset on my GitHub.
bands - contains data of bands that performed in the music festival, including the band’s name and fee.
venues - contains data of venues used for the music festival, including the venue’s name and capacity.
performances - contains data of the music festival’s performances, including the start and end time as well as band and venue for the performance.
tickets - contains data of tickets sold, including price and ticket number as well as the performance and person related to the ticket.
purchases - contains purchase data such as date of purchase and customer associated with the purchase.
people - stores customer’s information.
In this project, I aim to address the festival organizer’s questions below:
The festival organizer wanted to reward the top spender and the first 10 people that purchased tickets. We need to find out the names of the customers.
What is the name of the band performing at the AMD venue that had the highest fee?
Which bands performed at their biggest venue (more than 2000 capacity)?
How many tickets were sold for the performance that started at 2011-09-18 20:30:00?
Which performance and venue had the highest revenue?
Which performance and band were the most / least profitable for the festival?
Was there any venue that was oversold?
What was the total revenue from ticket sales each month?
What was the average purchase total each month?
Before getting started with analyses, I first tried understanding the ERM (Entity Relationship Model) of this database also known as Schema. I have re-created the Entity Relationship Diagram for the music festival database below:
I have used PostgreSQL for the analysis below. You can view the full code on my GitHub.
The organizers had announced a giveaway for the first 10 customers. They wanted to know the attendee’s names to announce the winners.
SELECT people.name AS customer_name FROM people WHERE people.id IN (SELECT purchases.person_id FROM purchases ORDER BY purchases.date ASC LIMIT 10);
What is the name of the person who spent the most (and how much did they spend)?
SELECT people.id, people.name, SUM(price) as total_spent FROM people JOIN purchases ON purchases.person_id = people.id JOIN tickets ON tickets.purchase_id = purchases.id GROUP BY people.id ORDER BY total_spent DESC LIMIT 1;
What is the name of the band performing at the AMD venue that had the highest fee?
SELECT bands.name FROM performances JOIN venues ON venues.id = performances.venue_id JOIN bands ON bands.id = performances.band_id WHERE venues.name = 'AMD' ORDER BY fee DESC LIMIT 1;
Which bands performed at the venue with more than 2000 capacity?
SELECT DISTINCT bands.name AS band_name FROM bands WHERE bands.id IN (SELECT performances.band_id FROM performances WHERE venue_id = (SELECT venues.id FROM venues WHERE venues.capacity > 2000));
How many tickets were sold for the performance that started at 2011-09-18 20:30:00?
SELECT COUNT(*) FROM tickets JOIN performances ON performances.id = tickets.performance_id WHERE performances.start = '2011-09-18 20:30:00';
Which performance and venue had the highest revenue?
SELECT performances.id, venues.name AS venue_name, SUM(price) as revenue FROM performances JOIN tickets ON tickets.performance_id = performances.id JOIN venues ON venues.id = performances.venue_id GROUP BY performances.id, venues.name ORDER BY revenue DESC LIMIT 1;
Which performance (and by which band) was the most profitable?
SELECT performances.id, bands.name AS band_name, (SUM(tickets.price)-SUM(DISTINCT bands.fee)) AS profit FROM performances JOIN tickets ON tickets.performance_id = performances.id JOIN bands ON bands.id = performances.band_id GROUP BY performances.id, bands.name ORDER BY profit DESC LIMIT 1;
Which band was the least profitable for the festival?
SELECT bands.id, bands.name AS band_name, (SUM(tickets.price)-SUM(DISTINCT bands.fee)) AS profit FROM bands JOIN performances ON bands.id = performances.band_id JOIN tickets ON performances.id = tickets.performance_id GROUP BY bands.id, bands.name ORDER BY profit LIMIT 1;
Which venues were oversold and what were their capacities?
SELECT performances.id, venues.name, SUM(DISTINCT capacity) AS capacity, (COUNT(DISTINCT ticketnum) - SUM(DISTINCT capacity)) AS overcapacity FROM venues JOIN performances ON venues.id = performances.venue_id JOIN tickets ON performances.id = tickets.performance_id GROUP BY performances.id, venues.name HAVING (COUNT(DISTINCT ticketnum) - SUM(DISTINCT capacity)) > 0;
What was the total revenue from ticket sales each month?
SELECT EXTRACT(MONTH FROM purchases.date) AS month, SUM(tickets.price) AS revenue FROM tickets JOIN purchases ON purchases.id = tickets.purchase_id GROUP BY month ORDER BY month;
What was the average purchase total each month?
SELECT EXTRACT(MONTH FROM purchases.date) AS month, ROUND((SUM(price::decimal)/COUNT(DISTINCT purchases.id)),2) AS avg_purchase FROM tickets JOIN purchases ON purchases.id = tickets.purchase_id GROUP BY month;
Looking at the analysis above, the festival organizers will need to ensure that they are not overselling tickets for the AMD venue since overcrowding may result in decrease of attendee’s enjoyment of the performance. From the analysis we have concluded that Preservation Hall Jazz Band and The Del McCoury Band and Vista Equity had the highest revenue, while the band WAX is the least profitable. The organizers can consider tracking attendee’s enjoyment of each performance, in a way that makes it possible to ask which bands/venues are the most popular.