Show and Tell: I Published a Compounding Genetic Algorithm to Hack FanDuel NBA Lineup Optimization
First off - thanks for reading, I do appreciate it. I wrote this article with the intention of it being read by technical, non technical and fantasy basketball agnostic readers. Soooo there’s no code in this article. If you want to see that, I open sourced my solution on NPM. Enjoy :)
In extremely short summation; this is about how I created a compounding genetic algorithm to solve a pairing problem with 17,437,197,465,249,097,536,000 potential pairs with an average of 98.54% accuracy in around one second.
Ok - so let's dive deeper into the problem I wanted to solve.
FanDuel is a popular daily fantasy sports site; I’ll be focusing on the NBA game that FanDuel offers. In the game, the user is given an imaginary budget of $60,000, and asked to choose a lineup from a pool of NBA players. There are certain constraints on the game:
Positional constraints: The user can select 2 point guards, 2 shooting guards, 2 small forwards, 2 power forwards and 1 center. Each player can only play one position and you can’t select a player twice.
Budget: Each player is given a $ cost by FanDuel, and in selecting their lineup, the user cannot spend more than $60,000 in total on their players.
Let’s take a look at the interface to help visualize what the user sees:
In any given night there can be upwards of 300 players in the player pool. Positional and budget constraints aside, when choosing a combination of 9 out of 300 players, there are roughly 17,437,197,465,249,097,536,000 combinations possible, which is 230 times more pairings than grains of sand on the earth. So - if I wanted to write a program that searched for the valid highest pairing - you can imagine that searching for the pairing using an inefficient algorithm would result in an absurdly slow processing time (Much longer than your lifetime).
Enter genetic algorithms.
In short, a genetic algorithm finds a solution based on a natural selection process, mimicking biological evolution. The algorithm repeatedly breeds and mutates a population of possible solutions based on each solution’s fitness. Fitness is pre-defined by the engineer and evaluated by the program she writes. Genetic algorithms can be constrained (we know the ideal outcome) or unconstrained (we don’t know the ideal outcome).
In this case, the genetic algorithm I built would be considered unconstrained. I don’t know the lineup with the highest projected statistical output before processing. This means I must write the program to stop itself after a certain amount of processing time, therefore the program won’t necessarily hit the highest possible pairing by the time it’s terminated.
Now - I like to be as efficient as possible in my work. So, after documenting the problem I wanted to solve and deciding the way I wanted to solve it, I scoured the web in search of a solution that had already been coded. (Also, for the technically inclined, I wanted the solution to run in the browser, which essentially means it must be written in javascript) And as you can probably guess since I’m writing a blog on how I wrote the solution myself, I didn’t find any pre-written solutions.
Genetic Algorithms: My Brief Adaptation
Now, as a heads up for the not so technically inclined, we’re going to dive a little deeper into the algorithm, the theory behind it and some benchmarking from my tests in developing my solution. I enthusiastically invite you to stick around and read on - I think I did a pretty good job of explaining to all audiences ;)
So - The diagram below gives a great high level overview of what the process looks like:
Moving forward, I am going to use this diagram as an outline to describe each portion of my algorithm.
These types of algorithms are called genetic/evolutionary algorithms because, as you can see above, they directly mimic natural selection. The diagram depicts what I like to call an evolution, and within that evolution you see an area labeled Generation cycle, which I will describe as a generation moving forward.
In the diagram you also see a Start --> Randomly generate initial population section. In my algorithm, this consists of randomly spawning 200 lineup pairings. In order to help the process along, this process involves ensuring that the lineups are positionally sound (Have 2 PGs, 2 SGs, 2 SFs, 2 PFs and 1 C). That is the only pre-validation, as the majority of ‘validation’ is handled in the next step; Evaluating all individuals.
Once the initial generation of 200 random lineups is spawned, as you can guess, we must evaluate the fitness of each lineup, which is shown as Evaluating all individuals in the diagram. In evaluating the fitness of a lineup, we look at the following:
Does the lineup meet budget criteria? If not, the lineup is considered unfit and given a fitness of 0.
Does the lineup meet positional criteria? This is a double check, but if it doesn’t match, the lineup is considered unfit and given a fitness of 0.
Do we have multiples of the same player? If so, the lineup is considered unfit and given a fitness of 0.
If the lineup passes the tests above, it is considered fit, and its fitness is the cumulative projected fantasy stats of each player in the lineup.
Now, this is where the magic happens! Are you excited?!
Once each lineup is evaluated and given a fitness, we select the two fittest lineups to breed and mutate into the next generation.
Breeding, in this case, simply means that we’re creating 18 new lineups. Each lineup is a replica of it’s parent lineup, with one player from the other fittest lineup swapped in. For example:
If parent 1 looks like this:
And parent 2 looks like this:
Then some children of breeding these two pairings would look like this:
Now, mutating, in this case, means something entirely different than breeding. The number of mutations isn’t pre-determined by the algorithm, the user of the algorithm can decide how many mutations they want (as it slows the process but helps reach higher fitness levels). Each mutation is a replica of it’s parent lineup, with one player from the lineup randomly selected from the player pool. For example:
If parent 1 looks like this:
Then two mutations of this lineup may look like this:
Now after breeding and mutating our population, we have created a new fitter generation to repeat the process with.
Once the predetermined amount of generations is completed, the fittest lineup is stored, which completes a single evolution. From each evolution, the fittest lineup is stored, and once the user-defined amount of evolutions is finished, the fittest lineup is returned. This is considered a compounding genetic algorithm as it starts each evolution with a clean slate. So, if the user decides on 10 evolutions, then once each evolution is complete, the program will select the fittest lineup from those 10 evolutions and return that lineup.
Now, let’s talk about my experiences with performance, incest and tweaking the algorithm! (Everyone’s favorite subjects!)
Incest is suboptimal - and in quick summation, I’ll show you why :)
In this case, I consider incest to be a reintroduction of the parent lineups back into the next generation breeding pool. I ran 10,000 tests to observe the influence of incest on my algorithm, and here are the results:
Without compounding and with 40 generations, when the parents were reintroduced into the next generation’s breeding pool for 5,000 tests, the algorithm produced and average fitness score of 300.4118
Without compounding and with 40 generations, when the parents were removed from the next generation’s breeding pool, when the test was ran another 5,000 times, the average fitness score was 301.484
Not a crazy performance boost, but a performance boost none the less!
Sibling Incest and the False Plateau
After around 10 generations, I noticed that the two fittest lineups would tend to be the same. We would be breeding the same lineups, which created the same lineups. This essentially relegated all the magic of evolution to mutation. This is not optimal and is an effect I also classified as a symptom of algorithmic incest.
This created what I call a false plateau, and hindered the algorithm from continuing it’s evolutionary adventure. Think of it like this, if you put a blind man at the bottom of a hill and asked him to keep walking up the hill until he hit the top, what if he hit a plateau? Theoretically, he would believe he was at the top, and not want to venture down to far away from that peak.
My algorithm was suffering from this type of issue by breeding the same two lineups. So, to fix this, I ensured that the two fittest lineups could not be the same. They must contain a different set of players. The performance results were beautiful!
Without compounding, with 40 generations, not reintroducing the parents back into the breeding pool and ensuring that the two fittest lineups were different each generation, 5,000 tests yielded an average fitness of 306.672
That’s a huge boost! But wait, there’s more...
Compounding evolution sounds way more complicated than it is. This simply means allowing the algorithm to run once, save it’s top lineup, then run again. Each one of these cycles stores it’s top lineup, hence the name compounding. Then, after X evolutions, the overall top lineup from those X evolutions is returned.
So now - with 40 generations, not reintroducing the parents back into the breeding pool, ensuring that the two fittest lineups were different each generation, and picking the top lineup from 5 evolutions (40 generations 5 separate times) - after running 5,000 tests the results yielded an astounding average fitness of 313.426 all in around 1 second each time!
Boom baby! For reference, with this dataset, the fittest possible lineup had a fitness of 318.069
Wrapping up this article...
I hope you found a little joy in exploring and possibly contributing to or using my project! I do appreciate your time very much. If you’re looking to see this algorithm in action, go ahead and optimize a couple lineups on my website at BasketballRoto.com and if you’re interested in seeing the code, check my GitHub repo or the NPM package. Thanks!