Stats @quidstats - Tumblr Blog

US Quidditch Cup 10: Fantastic Games and Where To Find Them

Disclaimer: All analysis is based on Elo only

The second best part of USQC each year is actually getting to watch close, interesting games from across the country (the first best part is making new friends). But with 120 games on day one, what do you watch? Well, we’ve got some ideas, and also some numbers detailing what to expect on day one.

Pool 1

As previously discussed, this pool should be full of intrigue on day one. Of all the Pot 1 teams, Texas has the second lowest chance of finishing day one at 4-0, at 23%, and their games against Ball State and UNC should make for good viewing. RIT is getting a puncher’s chance at making it out of the pool, at 12% to grab 2+ wins, but the real intrigue in terms of making day two comes from UCLA. UCLA isn’t favored in their games against UNC and Ball State, and will need to win one of those if they have hope of playing on day two.

UCLA vs. Ball State: 9:00 am Pitch 1 UCLA vs. UNC: 11:40 am Pitch 5

Pool 2

Similar to Pool 1, Pool 2 features a strong Pot 4 team, this time in Virginia. They outrank their Pot 3, Cal Quidditch, and our model gives them a 76% chance of getting the 2 wins necessary to advance. This pool also features a Top 25 matchup between Bowling Green and the BosNYan Bearsharks. Both are heavily favored to get out of pool (99% and 91% chance, respectively), but the game should be worth checking out.

Virginia vs. Cal: 6:20 pm Pitch 2 BGSU vs. BosNYan: 7:00 pm Pitch 2

Pool 3

Pool 3 offers less in the way of potential upsets, mathematically at least. The two teams most closely ranked together are Texas A&M and Ohio State, which should make for close game. The Apparators and Anteaters will be hoping for one of those teams to come out of that game exhausted, otherwise our Elo ratings have both of them missing out on day two.

Texas A&M vs. Ohio State: 9:40 am Pitch 2

Pool 4

DCQC, rated as the second strongest Pot 3 team, got a favorable draw. They’re stronger than their Pot 2, Gumbeaux, and have 19% chance of running their day one slate. NYQC, the Pot 4, is also in the mix for day two. At 35%, they have a surprisingly strong chance of making it through.

Gumbeaux vs. DCQC: 11:00 am Pitch 6

Pool 5

Pool 5 should provide a tightly contested game between Penn State (at 1886 Elo) and Boston University (1849 Elo). That should be the game to watch.

Penn State vs. Boston U: 7:40 pm Pitch 1

Pool 6

With Rochester and Lake Erie setting themselves a bit apart from the other 3 teams in this pool; Marquette, LBFQ, and Oklahoma State figure to be battling it out for one spot. Oklahoma State vs. Long Beach is each team’s first game. It will set the tone for both teams’ days.

Oklahoma State vs. LBFQ: 10:20 am Pitch 7

Pool 7

Pool 7 features our second rated Pot 5 team, the Silver Phoenix and our bottom rated Pot 4 team, Southern Storm. It’s the only pool where the Pot 5 team will be favored outright in a game. The margin isn’t much, just 4 Elo points, but it is something. Both teams will need the win if they want to play on day 2, so the energy should be high and worth checking out.

The Southern Storm vs. The Silver Phoenix 3:00 pm Pitch 4

Pool 8

The team with the second highest probability of finishing day 1 undefeated comes from this pool. It just happens to be Pot 2 Lone Star and not Pot 1 Arizona State. Can Arizona State keep it in range and use their snitch on pitch magic for the upset?

Lone Star vs. Arizona State 11:00 am Pitch 1

Pool 9

Pots 2-5 all have a greater than 18% chance of winning 2 games. Can Rochester or Miami (OH) come through? The closest game should be Rutgers vs. Utah State. Good seeding for day two will be on the line.

Rutgers vs. Utah State 2:20 pm Pitch 7

Pool 10

We’ve discussed Pool 10 before. They’ve got Baylor, the strongest Pot 5 team in the tournament, and the pool is reasonably strong everywhere else. Almost any game in this pool should be worth watching. We’ll highlight RPI vs. Skrewts for now because it’s a late game, and both teams may be desparate for the last win to get them to day two.

RPI vs. Skrewts 7:00 pm Pitch 7

Pool 11

Our posts have sort of dismissed Pool 11 as the Pool of Death, but day one will still be interesting. The main reason this pool lost its ranking is because it’s top heavy. It’ll be up to Tufts, a team with a strong history, to prove our Elo wrong.

The Warriors vs. Tufts 10:20 am Pitch 8 Michigan vs. Tufts 7:00 pm Pitch 5

Pool 12

QC Boston’s presence dominates this pool, and I don’t think anyone is predicting the upset. Each of the remaining teams fit in about the middle of their pots. Kansas has looked strong lately, but the Carolina Heat have gained 241 Elo points this calendar year alone.

Kansas vs. Carolina Heat 9:40 am Pitch 1

That wraps it up for our Elo preview of USQC 10. Hopefully we provided some new insights in to the tournament and good luck to everyone competing.

US Quidditch Cup 10: A Deathly Preview Part 2

Disclaimer: All analysis comes from Elo ratings exclusively

Our previous post (link) discussed the “Pool of Death” from the simplest angle, answering the question “Which team has the 5 strongest teams?”. Elo’s answer, Pools 11 and 10, pretty much lined up with preconceived notions about the tournament. That made for a rather short post, and only a small amount of insight added.

But is the “Pool of Death” simply the pool with the 5 highest combined Elos? “Death”, in the context of USQC 10, is exiting the tournament. A real Pool of Death would hasten each team’s exit from the bracket, so wouldn’t a more deadly pool have 5 teams that are going to exhaust each other, keep games close, and ruin each other’s seeding for Day 2? In order to figure out which pools would fit this description, we calcuated out the “closeness” of each pool’s Elo by using the standard deviation of the Elos in the pool. For those who haven’t had a stats class in a while, standard deviation “…is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean…” (from wikipedia). Here’s that data, sorted by lowest standard deviation to highest:

PoolStd Dev.Std Dev. Top 4Std Dev. Bottom 3Pool 1155.290.4166.9Pool 7163.7158.097.5Pool 4173.0119.4189.5Pool 10182.4182.352.7Pool 9189.8168.9119.8Pool 5209.8169.7163.3Pool 3212.4137.6198.0Pool 2237.8176.9172.0Pool 6244.1195.4147.4Pool 12279.6255.3133.5Pool 8288.4251.9127.8Pool 11293.7171.4288.0

The lowest (indicated a close grouping of Elos) value in each column is bolded and italicized, the second lowest just bolded

A lower standard deviation indicates that the teams in a pool are more evenly matched than a pool with a higher standard deviation. Pool 1 (Texas, Ball State, UCLA, UNC - Chapel Hill, RIT Dark Marks) immediatly jumps out as the most tightly packed. That closeness is even more apparent when looking only at the Pots 1-4 teams. Much like Baylor being the key factor in Pool 10’s high average Elo, UNC’s high Elo in the Pot 4 position is the key factor in the low standard deviation in Pool 1. UNC, at 1906.4, is actually ranked higher than Pot 2 Ball State (1889.0), and is within striking distance of Pot 1 Texas (1951.2). UNC rates as the strongest Pot 4 team by a decent margin, and our model has them as a top 20 team in the tournament. Conversely, Texas is actually the weakest* Pot 1 team, and Ball State is a lower half Pot 2. All that together makes for a tightly packed pool that should produce very intriguing games on day one. It’s conceivable, according to Elo at least, that Texas could be 2-2 going in to Day 2. Unlikely, but possible. How likely? We’ll answer that question tomorrow.

It’s interesting to note that Pool 11, the pool with the highest average Elo, actually has the highest standard deviation. Taking in to account our previous post and the standard deviation of the Bottom 3 teams in Pool 11, it appears that this pool is very top heavy. If difficult games throughout day one define a “Pool of Death”, Pool 11 isn’t it.

Pool 10, on the other hand, has the 4th lowest standard deviation overall, and the lowest standard deviation for its bottom 3 teams. Considering their high average Elo and the closeness of the teams in the pool, Pool 10 could very well be the best candidate for “Pool of Death”.

This type of anaylsis matters when trying to accurately predict how a team will perform on days one and two. This tournament takes place on one weekend, in Florida’s 80 degree heat, with a one and done bracket deciding the champion. Seeding and exhaustion matter. There are no easy paths to the championship, but there certainly are preferable ones.

*See disclaimer

Up next: Record probabilities

US Quidditch Cup 10: A Deathly Preview Part 1

Disclaimer: All analysis of a team's strength comes solely from its Elo rating

Two months ago, we released our initial Elo rankings using a modified version of the standard Elo equation (read about that here). Since all the games from this season are finally on USQ's site, we were able to update our standings. Those can be found here. With those updated, we thought it would insightful and unique to take a look at USQC 10 with a mathematical, Elo based approach. The first narrative we wanted to look at was the "Pool of Death" (PoD). Averaging out the Elo in each pool is quick and easy, and here is a table outlining just that, sorted by average Elo:

PoolAverage EloAvg. Elo Top 4Avg. Elo Bottom 3Pool 111853.11966.51718.0Pool 101852.61893.51730.5Pool 121835.51912.11666.5Pool 11812.31872.31740.5Pool 91809.41863.41700.0Pool 61805.71884.41656.4Pool 51792.31859.31675.5Pool 21791.31872.71651.5Pool 41782.61844.71714.9Pool 71776.41816.51670.0Pool 81771.41855.81578.2Pool 31756.81835.51645.4

(Bold and italicized values are the highest in their column, just bold values are the second highest)

Each year, commentators quickly identify a "Pool of Death". This year, there seems to be a consensus that Pool 11 (Cavalry, Warriors, Michigan, Tufts, Illinois State) is this year's "Pool of Death". Our Elo rankings agree, but they also have Pool 10 as a virtual tie in terms of overall strength. Baylor, this season's strongest Pot 5 team, keeps Pool 10's average up. The positive effect of Baylor on Pool 10's average Elo can be seen when removing the bottom team from both of those pools. Pool 11's average jumps significantly, while Pool 10's stays relatively similar. Looking at the bottom three teams from both of those pools also highlights this difference. The average Elo for teams 3-5 in Pool 10 is higher than for Pool 11, highlighting the depth of Pool 10. (Pool 1 actually has the highest average for their bottom teams, but more on their claim to the "Pool of Death" later)

This year's Pool of Death is certainly not clear, based on the numbers, at least. Both Pools 10 and 11 are full of talented teams and should provide grueling and competitive matches throughout day 1.

Up next: Pool of Death from a different persective

QuidStats ELO: An Introduction

Since the beginning of serious record-keeping in quidditch, we in the community have looked up at the stars and wondered "Who is the best team in quidditch?". There are some well established ranking systems provided by USQ and The Eighth Man, so you may be puzzled as to why we spent the time developing another one. We think data is underutilized in quidditch and our rankings are here to change that. We think an effective ranking system should emphasize three things:

Predictive power - effectively anticipates game and tournament outcomes

Objectivity - data-centric approach removes bias

Responsiveness - quickly and accurately tracks changes in team performance

None of the current approaches adequately address all three of these areas.

USQ Standings reset each season and are designed to emphasize USQ play requirements. They're not designed to be predictive, but rather to rank teams prior to regional and national tournaments.

The Eighth Man's media rankings are less beholden to USQ guidelines when they rank teams, but are also likely to suffer from the potential biases of its rankers. In professional sports, where recap and analysis of all games is possible, it’s still difficult for pundits to remain objective. In quidditch, not being able to see more than a small fraction of the games played exacerbates this lack of objectivity. This is not the fault of any individual ranker, as a single person shouldn't be asked to have detailed information on 100+ teams. It is, however, a known limitation.

The Eighth Man's ELO rating system can satisfy the need for objectivity and breadth, but unmodified ELO formulas are liable to be too conservative to effectively rate the performance of new teams and teams that have undergone significant changes during the offseason.

With these limitations in mind, we set out to build an ELO based ranking system that is resistant to bias, can rank all of USQ, and is dynamic enough to change at the same velocity as quidditch teams themselves do.

If you want to read about our ELO implementation, the details are available here.

For the brief highlights:

Placement matches- The ELO system is inherently conservative, and doesn’t allow a single match or tournament to exert too much influence on a team’s ranking. This stability is generally a valuable characteristic, but can inaccurately gauge the strength of new teams. The USQ sees a fair number of teams enter and drop from the league each year, many of which are composed of veteran players. For new teams, a generic ELO system may not see enough games to accurately rate them before the end of the season. We've added an example of this in the deep dive blog post. Our solution: Teams are considered to be “provisional” until they play at least 5 games. During this time, ELO ratings are much more sensitive to wins and losses, allowing new teams to rapidly climb or fall in the rankings.

USQ Nationals predictions- One of the key steps in our design was to optimize the parameters in our equation to accurately predict matches played at US Nationals 9 as well as regional championships. We used this as a grounding point, as we had two full years worth of data to allow rankings to settle prior to the tournament. Frankly, we felt that a ranking system that can’t handle Nationals isn't particularly valuable.

To compare the accuracy of different rating systems, we calculated the percent of USQ Nationals 9 match outcomes correctly guessed by USQ Standings, Eighth Man Media Rankings, and QuidStats ELO, assuming that higher ranked teams should win against lower ranked ones. This isn't a perfect use of ELO rankings, which deal in percentages, but works for a rough test. So how did our model do?

Ranking System All Games Top 28 subset1 USQ Standings 75.5% 76.9% Eighth Man Media N/A 79.5% QuidStats ELO 80.0% 82.9%

While we're happy with this outcome, we had some concern that our model was over-tuned to the results of USQ Nationals 9 and wouldn't do a good job of predicting matches at smaller tournaments.

Let’s look at accuracy at this fall’s Regionals. As previously mentioned, we've included placement matches in our ranking system, and we saw regionals tournaments as a way to test how well our system adapts early-season changes in team performance. Again, we're pretty happy with these results. Here is the accuracy of our ELO rankings for those tournaments2,3:

Tournament Accuracy Northeast Regionals 76.3% Great Lakes Regionals 88.9% Midwest Regionals 81.6% Mid-Atlantic Regionals 83.1%

We'll continue to make adjustments to this system as more data becomes available throughout the year. For now, check out our current Top 25 teams and full rankings below:

This graph is interactive! Mouse over each point to see data on individual teams. Use the buttons on the top of the window to change view. Region labels can be clicked on and off to show a subset of teams.

Want to see the full rankings? Here's our current ELO ranking for all teams in the 2016-2017 season:

Current ELO Rankings

Interested in rankings from past seasons? You can check graphs containing all teams here:

2013-2014 Season

2014-2015 Season

2015-2016 Season

2016-2017 Season

Notes: 1. Due to the fact that The Eighth Man's media rankings only rank 28 teams (top 20 plus others receiving votes), their rankings only predict a subset of the total games. That column shows the accuracy on those games. 2. Eigthman ELO rankings released January 11th, 2017 did not contain data on rankings prior to last season's tournments, and were not used in analysis. 3. The Eighth Man's media rankings were not included in regional comparison due to the small number of ranked teams playing in each regional tournament.

Questions? Comments? Drop us a line at [email protected]

Overview of our ELO rating system

How ELO ratings work:

Originally designed to evaluate chess players, the ELO rating system is designed to quantify the performance of teams or individuals, which can be used to rank competitors and predict the outcomes of games between them. The US Chess Federation, for example, ranks players as experts when their ELO ranking value is above 2000 points, while a beginner is usually rated around 800 points.

Similarly, we can adapt this rating system to quidditch for purposes of ranking and prediction.

Central to the way the ELO system works is the expectation that past outcomes correlate with with future performance; the only way to improve ranking is to establish a track record of strong play. With each game, team ratings are continually updated depending on the outcome of matches and the rankings of their opponents. When teams win games, their point value increases, and when they lose, it decreases. The amount of change will vary depending on the difference between the ELO values of the two teams.

If, for example, a team with a very low rating beats a team with a very high rating, that lower ranked team would experience a large increase in rating as a result of this “upset” win, along with a corresponding decline in the value of the higher ranked team.

Conversely, if the higher rated team won, neither team's rating would change very much, as this is the expected outcome.

Team performance:

The ELO system assumes players/teams have a normally distributed performance about some true performance value. Though a player's actual skill may not change very quickly over time, teams do not consistently play at the same quality on a day-to-day basis.

Let's say we have two teams, one that is a fairly average performer (Team A), with a rating of 1500, and another that is very strong (Team B), with a rating of 2250. When these teams play against each other, the most likely outcome is that Team B will win the match. However, there is some chance that Team B has an off day or Team A plays incredibly well.

Visually, we can think about how teams are expected to perform as two separate distributions:

Here, the height of each curve shows how likely a team is to play at a given performance value. While Team 2 is expected to play better than Team 1 most of the time, there's a small chance that Team 1 beats their opponent, expressed as the portion of the graph where the curves overlap. In this case, there's a roughly 98% chance that Team 2 wins.

The basic ELO formula:

To incorporate this variable performance, ELO rankings have a simple formula:

Rn = Ro + K * (W - We)

Rn = new rating

Ro = old rating

K = K-factor (a constant which determines how impactful each game is)

W = score (which team won?)

We = expected score (which team was expected to win? Value between 0 and 1) We = 1 / (10-dr/400 + 1) dr equals the difference in ratings

In our system, all teams start with an initial ELO rating of 1500

How it works in practice:

Let's revisit the two teams we had earlier.

In scenario one, Team B (ELO rating 2250) defeats Team A (ELO rating 1500). As a result, Team B's rating will rise and Team A's will drop. To determine the amount of change, we plug their values into the ELO formula:

Many ELO formulas set their K-factor to 20, which we'll stick with here.

The difference between team ratings (dr) = -750

Team A's expected win percentage = 1 / (10 ^ (750/400) + 1) = .013 Team B's expected win percentage is the inverse = 1 - .013 = .986

Team A's new rating = 1500 + 20 * (0 - .013) = 1499.74 Team B's new rating = 2250 + 20 * (1 - .987) = 2250.26

Because this outcome was expected, there is little change in the teams' ratings after the game.

However, in the second scenario, Team A beats Team B. By reversing the value of W in our formula, we find that the change in ELO is more substantial.

Team A's new rating would rise to 1519.74 Team B's new rating would fall to 2230.26

This outcome suggests that either Team A played incredibly well in this one game, or that the ratings of our teams don't reflect their actual performance well. There's fairly high likelihood that Team A is actually a better team than it's rating would suggest, but the system is quite conservative–it'll take a number of games for A to climb the ladder.

Tweaks to this system:

After reading the above, one might recognize that the basic ELO system has some problems translating to quidditch. As a result, we've made some tweaks:

Changes to K-factor

While a low K-factor is good for creating an ELO system that is fairly conservative, most quidditch teams don't play enough games in a season to make big changes in their rating. Particularly if we want to be able to predict Nationals play at the end of a season, we need to be sure that our K-factor allows new teams to sort out their rating by April. We found that a K-factor of 60 gives us the best results.

Also, quidditch games during a regional tournament tend to be more informative than those from local tournaments of one-off meets–teams don't always bring full rosters to less impactful tournaments or experiment with team line-ups. As a result, we weighted the results from regional and national championships higher than other games by increasing the K-factor value of matches played during these tournaments.

Margin multiplier

In chess, games end in either a win or loss with little other information on the quality of play. In quidditch, scores can tell us how competitive games are,which provides much more information than a simple win/loss. In our model, we penalize rating change based on score–blowout wins create more change than a close match.

We called this our “margin multiplier”, which scales with quaffle point differential. For games played within 30 quaffle point differential, games were assumed to be very competitive and our margin multiplier was set to 1. For games that ended with more than 30 quaffle point differential, the multiplier is set 1 + log(number of quaffle points scored in excess of 30).

So, a team that wins after outscoring their opponents by 20 has a margin multiplier of 1, but a team that wins by 200 quaffle points has a multiplier of 3.323.

Win expectation

One of the critical components of ELO is the assumption that team performance in an individual game varies around their “actual” performance value. As we showed above, this variation is key to determining how likely teams are to win a game, and also factors in to their rating change through the “We” parameter in our ELO formula.

We can change the shape of this variance in performance to make upsets more or less likely, for example. Currently, the shape of our win expectation curve looks like this:

However, we could increase or decrease the variance in team performance to make this curve smoother or sharper, respectively. A sharper curve suggests that teams tend to play games fairly close to their “actual” rating, while a smoother curve suggests that teams vary a lot in their play quality from day-to-day. In quidditch, where outcomes of matches can depend on a number of factors outside of a team's control, a higher variance may be useful.

Carryover

While names may stay the same, the makeup of many quidditch teams changes substantially from year to year. To accomodate for this change, team ratings regress to the mean each season. In our model, teams gain or lose 30% percent of the difference between their rating and our initial ELO value of 1500.

The formula in plain language:

New Elo Rating = Old Elo Rating + (variable K-factor x (game outcome - win expectancy) x margin multiplier)

Placement Matches

Because ELO is a fairly conservative system, it takes a lot of games before a team can make substantial changes to its ELO rating. While this means that the system is quite accurate for teams which have played for a few seasons, newly established teams may not attain an accurate rating before major tournaments.

To fix this issue, we've added placement matches to our system. For the first 5 games that a team plays, its rating is much more volatile and will swing rapidly with each win or loss. When a team with provisional status plays against an established opponent, the opponent's rating will be used to gauge what a provisional team's status should be. Established teams cannot gain or lose points by playing provisional opponents.

For example, a new team (ELO rank of 1500) might play against 4 teams in its first tournament. These teams have rankings of 1250, 1500, 1750, and 2000. If this team only plays close games (quaffle point differential of less than 30), winning against the first three teams but losing to the fourth, its new ELO ranking under standard rules would be 1586. However, in our placement match system, its new ELO ranking would instead climb to 1721.

This isn't a perfect solution, as the first 5 games represent only a fraction of the total matches a team plays in a season, and as a result, rankings aren't likely to stabilize until later in the year. However, we think that this approach will allow our system to evaluate teams with fairly high accuracy prior to late-season tournaments, and USQ Nationals in particular.

Are snitch catches just a coin toss?

The snitch is the perhaps most iconic aspect of quidditch, when played either by wizards or college-aged nerds. When friends and coworkers ask about the non-magical interpretation of the game, they invariably focus in on how the snitch works, and are often both amused and perplexed by the image of a player running around with a sock stuffed down their shorts. For many, this is where conversation ends, but those who remain interested (and are somewhat familiar with the Harry Potter series) often present a second question, which I find far more interesting:

“Is it still worth 150 points?”

For those of us that have played organized sports throughout our lives, the snitch presents an odd way of determining the end of a match. Most sports have games that are limited by time, often divided into periods, halves, or quarters, while a few others finish when a team or individual reaches a certain score. In quidditch, games do not end at a specific time or score, but rather, through an event—the catching of a snitch by a seeker—that both ends the game and gives one team a large number of points.

Those who ask about the point value of a snitch recognize that if an individual player can have such a tremendous impact on the outcome of a game through a single event, a team sport isn’t particularly competitive. While the snitch serves as a great plot device to grow the hero of Harry Potter, if the same rules applied to muggle quidditch, all members of a team aside from the seeker would be understandably put off by the relative unimportance of their role on the pitch. As a result, the International Quidditch Association rulebook has gradually evolved to reduce the importance of seeker play through a reduction in snitch point value (from 150 to 30) and installation of a minimum game time before the snitch can be caught.

Nonetheless, the role of the snitch remains among the most contentious features of quidditch as it’s played in the muggle world. In 5 years of participating on teams in the Midwest and West Coast, I’ve yet to experience a tournament where snitch catches go largely undisputed—arguments over the validity of catches have become so commonplace at this point that I expect captains to talk to the referees in almost every game. This is not without reason. Given the tension that builds at the end of a quidditch match, I don’t find it surprising that discussion over the validity of snitch catches break out, especially in competitive games. The snitch is a large part of what makes the endgame of quidditch exciting, as teams have to balance their attention to both catching the snitch and preventing the other team from racking up a lead big enough to render the snitch catch irrelevant to the game outcome.

However, the snitch catch can also generate intense frustration, much to the game's detriment. Many games end after a series of potential snitch grabs are ruled unsuccessful as a result of highly subjective “illegal” play, or worse, occur when a snitch is distracted and fails to notice an opposing seeker. Over time, it’s struck me that the result of snitch play between competitive teams often seems to be quite random. This isn’t to say that some seekers aren’t better athletes than others, but instead that the relationship between seeker ability and successful catches is fairly weak. Maybe it’s a result of a rulebook that isn’t very clear on the rules of seeking, or the nature of trying to grab a sock stuck to a runner’s shorts without knocking them down, but the process of determining a winner in a quidditch game seems much more variable than I would like in a sport that is increasingly focused on competition and the crowning of regional, national, and international champions.

To better understand the impact snitches have on the game of quidditch, I decided to start by asking a simple question:

How different are the actual win percentages of quidditch teams over the course of a season when compared to their win percentage if snitch catches were assigned at random?

If we find that a large number of teams are over/underperforming their expected win percentage under a scenario where the game ends with a coin toss deciding a snitch catch, that might be evidence for teams being able to substantially change the outcome of games by having an effective seeker. To answer this, I made a fairly straightforward framework for estimating how likely a team will win a game based on point differential at the game’s conclusion, assuming snitch catches are determined randomly.

How I determined expected win percentage assuming random snitch catches

(You can skip to the next section if you think statistics are boring)

Expected win percentages of individual games under a random snitch catch scenario were determined on the basis of quaffle point differential at the point at which the regulation time snitch catch occurs:

If the snitch is caught when quaffle differential is greater than 30, the team in the lead can expect to win 100% of the time, as the snitch cannot decide the winner.

If the snitch is caught when quaffle differential is less than or equal to 20, each team can expect to win 50% of the time, as a snitch catch for either team will determine the winner.

If the snitch is caught when quaffle differential is equal to thirty, the team in the lead can expect to win 75% of the time. In this case, there is a 50% chance the leading team will win with a snitch catch and a 50% chance the trailing team will catch the snitch and send the game to overtime. To simplify things, I assumed each team had an equal chance of winning in overtime, so the leading team has a 50% to win without OT, a 50% chance to play OT, and a 50% chance to win that OT (50% + (50% x 50%) = 75%).

This model isn’t perfect, as it doesn’t factor in game duration or any of the nuance that occurs at the end of a game when teams change their strategies to accommodate scoring the quaffle while trying to catching the snitch. However, it does form a simple basis that we can compare team performance against. Here’s how it looks in practice:

Let’s say we have a fictional team, named the Society of Quidditch at the University In Bermuda Southeastern (Squibs for short), who have played 6 games in the 2015-2016 season. The Squibs are a new team and have had mixed success:

The Squibs played 6 games and won 3, so their season win percentage is .5 (50%)

Their quaffle point differential (QPD) in these games varied from -100 to +50. This was input into in the model above to calculate their expected win percentage in each game if snitch catches happened at random.

To compare their performance relative to this random catch scenario, total expected win percentage (.25 + .5 + .5 + 1 + .75 = 3) was compared to recorded win percentage (3).

In this case, the Squibs had as many wins as they would have expected to receive if snitches were caught at random.

By performing this same comparison for all of a team’s games over the course of a season, we can evaluate how much better (or worse) a team is performing than they would have expected if snitches were caught at random.

In this analysis, I used game data from all official games played in the 2015-2016 quidditch season. Unofficial games, or those that were not recorded by the US Quidditch Association were not a part of this dataset.

Results

This graph is interactive! Mouse over each point to see data on individual teams. Use the buttons on the top of the window to change view.

Displayed on the horizontal axis of this graph is the expected winning percentage of a team based on difference in quaffle score at the end of their games, assuming random snitch grabs. On the vertical axis, the actual win percentage of each team is recorded. The black 1:1 line in the middle denotes where expected and actual win percentages are equal. Teams that are above this line are winning more games than they would expect if snitch catches occurred at random, while teams below the line lose more games. The distance of each point from the line denotes the magnitude of this over/underperformance. Points represent individual teams, which are colored by region and grow or shrink in size depending on the number of games that team has played in the 2015-2016 season.

What’s initially striking about this graph is that there is a very clear relationship between a team’s actual and expected win percentage. While these percentages don’t always line up perfectly, teams tend not to deviate much from that center line—there are no teams that win an additional 30% or more games than we would expect in a random snitch grab scenario. In fact, this model predicts a team’s win percentage over a season just based on their quaffle point differential with 94% accuracy. Most of the time, teams have a season-long win percentage that is about the same as if snitch catches were just decided with a coin flip.

There are some notable exceptions, however. For example, teams from Arizona State University and the New York Quidditch Club won 12-13% more games last season than this model expected. Conversely, the University of Rochester Thestrals and Moscow Manticores lost nearly 20% more of their games. This isn’t to say that these teams have good or bad seekers necessarily, just that in games where the quaffle score was close, the snitch was caught at different rates than would be expected at random. These same seekers might have had wildly different catch rates in all other games, but perform better/worse when the snitch catch matters (think of this as being related to the Snitch When It Matters (SWIM) statistic that USQuidditch reports on its standings page).

The story is also a bit more complicated than that, so before anyone might accuse me of disparaging their team, I urge you to please read on.

Looking more closely at this figure, we also see an inflation in the cloud of points centered at roughly 50% win percentage. This is interesting, as it suggests that teams with a middling win percentage seem to be influenced more by their snitch catch rates than teams with high or low win percentages.

There are two factors that play into how teams over/underperform this random expectation. First, teams could be catching a greater or lesser proportion of snitches in close games than would be expected at random. If a team plays three games where the end game score was tied, yet wins all three, this team has won 1.5 more times than they would have expected.

Winning a few more games relative to expectation over the course of a season doesn’t necessarily translate to big shifts in win percentage, however, as teams that play a small proportion of close games don’t have many opportunities where the snitch game comes into play. A dominant team with an expected 95% win percentage, for example, may have a great seeking game but can’t deviate much from the expected model because they are almost always beating their opponents by more than 30 points before the snitch is caught. In this case, even if they always caught the snitch in close games, perhaps only 5% of their games had snitch catches that determined the winner, so their seeker's success could not produce a big change in their overall win percentage.

By considering this potential interaction between a team's win percentage and the number of close games they play, we might be able to explain why there is so much variation in the middle of our graph. Judging from my experience over the past few years, I think we can assume that the distribution of strength between teams is bell-shaped (normally distributed). There are likely to be a few really strong teams, a few really weak teams, and a large number of teams that are close to average. Because teams are playing opponents from all parts of this spectrum over the course of a season, we can expect that high win percentage teams only play against teams of their same quality a few times, as do low win percentage teams, but average teams play a lot of matches against other average teams.

In general, average teams are more likely to encounter opponents of roughly the same strength, producing close games in which the snitch determines the winner. It is for this reason that we might see so much deviation from the random snitch catch expectation among teams with a ~50% win percentage, as their snitch catch success decides a greater proportion of their wins and losses. To put it simply, teams with average quaffle play are likely to benefit the most over the course of a season from having a good seeker. High win percentage teams can benefit as well, but an above-average snitch catch percentage isn’t as likely to translate into a large number of wins over the course of a season. Instead, it may well produce tournament titles, because in later rounds of regional and national championships, good teams are often squaring off against teams with similar ability. If good teams are only playing close games in tournaments, I don’t think it should come as much of a surprise that 2016 US Champions Q.C. Boston still managed to outperform their season-long win expectation by 5% despite their dominant overall record.

So, for current quidditch players, feel free to look over the graph and check out how your team performed this last year—maybe it will spark some discussion on ways to improve for the coming season. But before interpreting this graph and making assumptions on the quality of a team’s seekers, keep in mind that it’s hard to make conclusions without data on a team’s frequency of competitive games played and their performance therein.

We’ll continue this series in our next blog post by focusing in on the two factors that make up a team’s seeker effectiveness:

How well are each team’s seekers performing?

How many close games do teams play?

Questions? Comments? Send an email to [email protected]

Seeker Update

First, there will be an update to Quid Stats in the store soon. It includes seeker stuff. I’ve got a build locally, so I’ll show what I’ve got. Also, taking suggestions on what people might want on seeker stats. I have all the raw data in the world, but for now I’m just posting some basics because I don’t know what is particularly interesting to people.

Notes: “SWIM Games” means the seeker was on the field, at some point, while the game was in snitch range. It doesn’t matter if the game was out of range when the snitch was caught. If a seeker is on the field, and the game is within 30, it counts for SWIM. If a seeker never went on the field while the game was in range, it doesn’t count towards their “SWIM Games” total, even if the game was in-range at some earlier or later point in time. I’m also just going to be highlighting SWIM situations. As soon as I figure out what people want from their seeker stats, I’ll post again. But for now, I’m just using performance in SWIM situations.

Detroit

For Detroit, seeking has been a weak point. Jim Richert has been their best seeker, catching their only SWIM snitch (through the first 6 games). He’s actually sitting at 100% in the SWIM situations. He caught game 3 vs. Rochester, in :37 seconds. Other than that, the team as a whole is 1/4 in SWIM games. The seeker to profile here is probably Dylan Schepers. He has played in 3/4 of Detroit’s games that have been in snitch range, but the results haven’t been good.

Dylan Schepers Games Seeked: 4 SWIM Games Seeked: 3 SWIM Catches: 0 SWIM Percent: 0% Time per game in SWIM situations: 0:59

I chose Dylan because he has the most data, but Detroit has been seeker by committee all summer. I think the stats show they could use a more dedicated and experienced seeker.

Cleveland

Cleveland seeking has been all about Sam Roitblat so far. Chad Brown, Stephen Kersey, and Max McAdoo all tried in his absence in their first series, but came away with 0 SWIM catches in 2 SWIM games (the third was out of snitch range the whole time the snitch was on the pitch).

Sam Roitblat Games Seeked: 3 SWIM Games Seeked: 3 SWIM Catches: 2 SWIM Percent: 66.66% Time per game in SWIM situations: 0:32

Cleveland has been much better in SWIM situations with Sam Roitblat. He was only there for the 3 games against Rochester (again, only through their first 6), but each game had SWIM situations, and he went 2/3. The one he didn’t catch, Rochester Game 2, he only had 0:19 while the game was in range. Which, actually, is almost enough time for him. He caught the other two snitches in 0:21 and 0:56.

Indianapolis

Indy has all their games in the books. Their best seeker has been, by percentage, Kyle Isch. He’s 1/1 in SWIM situations. Only took him 0:42 too. But Jason Bowling deserves some highlighting.

Jason Bowling Games Seeked: 9 SWIM Games Seeked: 6 SWIM Catches: 3 SWIM Percent: 50% Timer per game in SWIM: 0:41

Against Cleveland, Jason was shut out. In the two SWIM games (1 & 2), he had about 2 minutes (2:02 and 1:48) in snitch-range to win it or tie it, but failed on both occasions. Against Detroit, it was a completely different story. Games 1 & 3 were in snitch range, and he came away with both catches. He took 1:34 and 2:07, respectively. For the Rochester series, games 1 & 2 were in range, and he grabbed just game 2′s. Overall, 50% is pretty decent, but given the amount of time he’s had, he is clearly a bit behind Sunshine in terms of seeking this summer.

Quid Stats update in store

The app has been updated and pushed to the store. It’s much better to work with video now. Most of the MLQ North video is in-app, so you can watch the videos, take your own stats, or even download the ones I’ve already done and go from there. It’s pretty slick. Downside, it has to delete your old stats, but those are a season old anyways. Look for snitch analysis and beater stats in future. (Don’t worry, those updates won’t be deleting any data).

This is a glorified beta, by the way. Everything works, and I’ve tested it a bunch myself, but I’m sure you’ll find odd things and weird inconveniences. I’ll be sorting those out in due course.

As a side note, this works best on a tablet sized screen, but it can be run on any Android device. I’ve even added some small conveniences for small screens, and I’m open to feedback on that front.

Instructions:

The only part that really needs instructions is the new video features, so here is that rundown:

1. Open the app, and swipe right to see a list of available teams. Select whichever ones you want to download. This list will include the MLQ East eventually, but for now it’s just the north.

2. Swipe back left, and your teams will be there.

3. When selecting one, a menu will pop up. The new video features are all along the bottom row.

a. Watch Videos - Loads a list of the games played by the team so far this summer.

b. Download Stats - Launches a list of games for which you can download stats. Clicking any of them will download any stats I may have taken so far.

c. Video Stats - Launches a screen with all the stats. Sadly, this requires an internet connection. After a short time, the games that team has played will show. Toggle the switches to make everything appear.

4. When you select a video to watch, a YouTube video will load. There will be two ugly white bars. Tapping the one on the left will launch a screen where all the stats are recorded from. Sub players by using the leftmost column, add stats to those on field players with the middle column, and control the clock and snitch with the right column. The top bar shows the actions so far recorded in the game, and their time-stamp. You can micro-adjust the time of the event with the - and + buttons, and you can also click the time-stamp to jump to the time of the event.

5. Long-pressing on the video (when the seek bar isn’t shown) will also get rid of all the overlays on the video. This can help with small screens. Swiping left will jump back 30 seconds. Swiping right will jump forward 30 seconds. Swipe up to add stats (same as the left bar), swipe down to see the stats recorded (same as the top bar). Overall, it isn’t the greatest interface, but it does reclaim some of the screen real estate. You can long-press again to get the overlays back. Most things can be discovered if you play around with it a bit.

MLQ Week 3 (Indy)

Indy. Again. At least I get their regular season stats out of the way after this post. NOTE: I missed two goals (one in game 1, one in game 3) according to the MLQ website scores. There are considerable lapses in the video, and my attention span, so who knows. It may be a goal scored not on camera, or I could have missed one. I’m going to wait for the more official MLQ video to come out. It has the scores at the bottom. I’ll figure out what I missed and where, and update this post later. Sorry about that. 70-170*

100*-60

70-170*

Aggregate

Tough week for the Intensity. They were real short on players. That led them to playing primarily two male beater sets, and putting Danielle Anderson and Erin Moreno at chaser. The injury to Anthony Votaw early may also have contributed. It clearly didn’t work amazingly well, but they took a game of a deeper Rochester squad and still have a nice lead in the division.

As for those beater sets...

Jake Watson and Tyler Walker (+6 / -4) over 14:18 were the only positive pair. Indy ran two male beaters for 55 / 60 minutes, which is a pretty ridiculous percentage. Walker and Matt Pesch (+6 / -10) really shouldered the load. They played together for 21:03, far and away the most of any pair. That’s about a third of the total minutes available. Really shows how short handed Indy was.

The biggest surprise was Sarah Makey (+13 / -19), leading the team in goals, with 6. In weeks 1 and 2, she had 1 goal (although she didn’t play in week 1). Asked to step up this week in the absence of the two leading goal scorers, Blake Fitzgerald (wasn’t there) and Anthony Votaw (+1 / -5) (mostly not there with an injury), and she did.

New stat: Goals / (Missed shots + turnovers). Conversion percentage, let’s call it. Against the Whiteout.... 19 goals / (18 misses + 29 turnovers). 19 / 47 = 40% conversion percent. Not bad. Against Detroit, that number was 32 / 39 = 82%. Cleveland, 26 / 65 = 40%. Indy was pretty consistent in both the Cleveland games and the Whiteout weeks. It was their defense that suffered against Rochester. They kept it slow relative to their Cleveland games (66 possessions vs. 91 possessions), but couldn’t stop Rochester’s offense.

Regular season:

Here’s Indy’s raw numbers for the regular season. Tyler Walker (+42 / -42) played a ton. Tons and tons of minutes. One day I’ll calculate out how many minutes. But + / - is an okay proxy for minutes, and he was out there for plenty. Also, Blake Fitzgerald (+31 / -22) finished up the regular season missing week 3, but still led the team in goals. After 9 full games, Walker and Danielle Anderson (+12 / -7) played together for 22:58 and were the best beater pair. The worst pair, that played together for significant minutes, was Walker and Matthew Pesch (+7 / -10). All other beater pairs fell between these two extremes. As for quaffle pairs, Blake Fitzgerald and Jessica Banaszak (+21 / -10) tore through the regular season. ~41 minutes together on the pitch together, and still were +11. Interestingly, Banaszak and Mac Randolph (+8 / -15) were the worst pair. I wish the entire Indy squad were at the Rochester games. Having 1/3rd of the season lacking key players really hurts the stats.

MLQ Week 2 (Cleveland)

Some stats on a new team, finally.

60-110*

80-140*

150*-70

Aggregate

Cleveland kept game 1 and 2 in snitch, and won game 3 pretty handily, and their plus minus shows it. Where as Detroit had very little positive against the Intensity, Cleveland had some bright spots. Despite 7 turnovers and 9 missed shots, Dan Daugherty (+15 / -13) stayed positive, along with Meredith Taylor (+11 / -9) and David Hoops (+14 / -12).

A surprise against the “chemistry is paramount” argument, Matt Eveland and Julie Fritz (+6 / -7) were only middling over the course of 15:08 of play. In the two losses (game 1 and 2), the pair was only (+2 / -5).

The two male beater set worked for Cleveland. Max Portillo and Max McAdoo (+4 / -1) and Max Portillo and Matt Eveland (+2 / -2) were both good lineups.

One more surprising thing was the combo of David Hoops, Jeremy Boettner, and Dan Daugherty (+2 / -3) was not a positive combo. They played together for 7:26, but still couldn’t generate much traction. Cleveland going forward should probably split this group up. It should help their other lines, and actually be a net positive for them. Or, if they want to run this group, it’ll need some work.

Upgrades

QuidStats has once again been upgraded. Biggest new features include watching the video in app (you can take stats right on top of the video), and sharing the stats that have already been taken. I’m not putting it in the store right now because it’s very far from polished, but if you want the .apk let me know. You can install it on your device, download the stats I’ve already taken, and you can edit them as you please. That also means you can see the advanced stats that never get posted. Like best lineups, worst beater pairs, best quaffle groups, etc... There are tons.

If you’re interested, let me know.

Requires an Android device with a big screen (probably). I haven’t tested on anything but a Nexus 7.

MLQ Week 2 Stats (Indy)

110*-60

140*-80

70-150*

Aggregate

Indianapolis vs. Cleveland saw the Intensity’s first loss. So their stats have finally come back down to Earth a bit. You can see from the aggregate who performed well and who didn’t, but I would like to point out that Tyler Walker (+11 / -6) was the top performer, but played very little in the final game of the series. A game in which Indy lost. Probably not a coincidence.

Jessica Banaszak (+7 / -8) game back down to the land of the mortals in terms of + / -, which is to be expected. Cleveland was a much better chasing team that the Detroit squad.

In terms of beater play, Danielle Anderson and Tyler Walker (+6 / -1) really held Cleveland’s attack at bay. They were significantly better than any other beating duo for the Intensity.

The real interesting bits are the aggregate of all 6 games. The raw stats for that are...

Blake Fitzgerald has been a monster. He has 15 of the team’s 58 buckets. More than a quarter of all their quaffle points. Factor in 4 assists and his +9, and Blake keeps this team going. His 4:15, or .27, assist to turnover ratio is pretty poor, he feeds the ball in to traffic too much, but overall it doesn’t seem to hurt his production much.

Anthony Votaw has quietly snuck in 11 goals. I didn’t really notice how high his production was until after the fact. I know Indy’s regular season is done, but teams might want to prepare a bit for him when it comes time for the championship.

As a team, Indy has 23 assists on 58 goals. That’s an assist on ~40% of their goals. Until some other teams play more than one weekend, we won’t know if that’s high or not. I’m betting it ends up being right in the middle.

I already mention Tyler Walker and Danielle Anderson, but together they have been great as a beating pair. Over six games, the two have combined for +11 / -4. Only 4 goals against. Really impressive stuff.

MLQ Week 1 Stats (Indy)

Indianapolis Intensity:

vs. Detroit game 1

110*-60

vs. Detroit game 2

150*-60

vs. Detroit game 3

150*-90

Aggregate

Note: Sorry for the inconsistent crop jobs. I don’t care enough to fix it right now.

Finally finished the first weekend. I had to make some major overhauls to the way I take stats in order to get them, so that was the delay.

The turnovers are up because every possession either ended in a shot, goal, or turnover (Ethan Sturm style). Overall, I think I like it. Just don’t compare these stats to Detroit’s, because I didn’t record Detroit’s stats like that. The rest of the raw stats you can check out for yourself.

Plus / Minus time:

Over the course of three games, there were 3 lineups for Indy that were +2. Due to the large amount of lineups and relative lack of data, I don’t know how useful those stats are (yet). So I won’t be publishing those, in favor of publishing hopefully more interesting stats.

First, Indy as a whole were pretty dominant. For individual +/-, Jessica Banaszak +23 / -11 reigned supreme. Indy only used two female chasers on the weekend, and they won the quaffle game in all three contests, so naturally her numbers were going to be high. She did have better numbers than the other female chaser, Ali Markus +12 / -10, so she did clearly play very well. Not that the rest of Indy played poorly. The worst +/- performances belonged to Melinda Staup +9 / -10, Zach Rupp +11 / -12, and CJ Dolby +3 / -4. I say “worst” lightly. -1 is a decent day. This helps illustrate how Indy really ran away with the weekend.

In terms of just quaffle play, chasers Rupp, Votaw, and Banaszak and keeper Fitzgerald went +6 / - 3, playing together best as a unit.

Leitch and Moreno +3 / -0, Walker and Moreno +6 / -4, and Anderson + Walker +5 / -3 were the best beating pairs.

Another stat I think is worth pointing out is assist to turnover ratio. Given the high number of turnovers in quidditch, I’m betting <1 ratios will be common (assists / turnovers). We’ll see as the season progresses, but the standouts currently are Matt Brown at 4:3 on the good side, and Blake Fitzgerald at 1:5 on the bad side.

MLQ Week 1 Stats (Detroit)

Inaugural MLQ weekend, inaugural stats*.

Background: I take these stats using QuidStats. Every stat on this post comes from calculations and queries done in-app.

I use plus/minus as the primary rating system for these stats. If you see “best”, that probably means highest plus minus. I’ll try and define any uncertain and subjective terms. I really like plus / minus because it really helps capture beater play as well as chaser play. A good beater or beater duo can keep a lot of points off the board without ever throwing their bludger. Plus / minus captures that with a low minus score. Similarly, a very aggressive beater might have a lot of pluses and minuses, which captures their style and shows whether their aggression is overall a net positive or net negative.

For those who don’t know, plus/minus is a simple stat. Higher is better, and it can go negative. If a goal goes in for team A, every player on team A (no matter position or contribution) gets a plus 1. Similarly, if team B scores, every player on team A gets a minus 1. Add these pluses and minuses up to get a player’s plus / minus for a given game. I haven’t done the addition in the stats columns, I think you can handle that much. Feel free to let me know what stats you’d like to see, and I’ll do my best to calculate them and update this post accordingly.

Detroit Innovators: vs. Indy game 1

60-110*

vs. Indy game 2

60-150*

vs Indy game 3

90-150*

Aggregate

This is now updated to use all kinds of new things. First, I used the new recording method, so accuracy is definitely up. Also, I added in all of the turnovers. As a reminder, all possessions for the offense end in either a goal, shot, or turnover. So those numbers are way up. All the stats from here on out are from the new recordings.

Lisa Lavelanet and Brandon Ollio (+4 / -9) had a rough day overall, being the worst beating duo for the Innovators. They clearly played a lot of minutes together (15:40), so Detroit may want to make a change to that beating duo in the future.

Ashley Calhoun and Tad Walters (+4 / -3), Lisa Lavelanet and Walters (+2 / -1), Calhoun and Jim Richert (+5 / -4) were the best beating groups for Detroit. Those groups should probably net some more minutes going forward. Switching Lisa’s beating partner from Ollio to Walters, and leaving Calhoun and Richert together could really strengthen Detroit’s first two lines. Interestingly, there were only 4 beating pairs that played more than 5 minutes together. The three positive lines mentoined, and Lisa and Brandon Ollio, who were way negative.

Michigan chemistry showed up strong for Detroit. Over 17:27, Zach Fogel and Dylan Schepers (+8 / -4) were a strong pair. On the other hand, over ~14 minutes, Steven Scherer and Sarah Walsh (+1 / -7) were pretty rough. That’s a pairing Detroit may want to avoid going forward.

According to plain plus / minus the best player for Detroit was Zach Fogel (+13 / -9). As for beaters, I don’t think it’s a surprise that Ashley Calhoun (+9 / -8) had the best differential.

Hello, World!

Test post, please ignore.

Trending Blogs

Recently Viewed Blogs

Stats