Sidewords: Making a Good Puzzle
Since releasing Sidewords we’ve had a lot of people ask if the puzzles are handmade or procedurally generated. The answer is: sort of both. I thought it might be fun to talk about how we actually make puzzles for Sidewords and how we built some tools to help us find good puzzles.
A few weeks ago I posted a thread on twitter about how we actually generate and find solutions for puzzles in Sidewords. If you’re into computer science stuff, it’ll provide you with a bit of context for what I’m about to talk about next.
https://twitter.com/OwenGoss/status/897157730431107072
The important take-aways from that thread that are relevant here are:
we have a solver that, given a puzzle, finds all possible solutions to a puzzle (if solutions exist)
it’s now fast enough we can generate puzzles on a mobile device
So... let’s work backwards from what we need. A puzzle is made up of two words, and the solution uses those words to fill in a space. When we first started working on the puzzle generator all it did was pick two random words from our word list and then see if it could find a single solution for it. If it did, it would log the words and the solution to a file. Great, we’re done, right? No, we’re only just getting started.
There are many problems with this approach to generating puzzles, but the biggest problems are:
it generates puzzles with really obscure words
it generates solutions with really obscure words
These are big problems that prevent the game from being at all fun.
The tool that we use to generate puzzles evolved over the course of the project, but this is what it looks like in its current state:
Let’s talk about how we get to here...
One of the first things we did was license a word frequency dataset. It contains a list of the 100k most common English words sorted by frequency. It also contains a whole bunch of other useful data about each word. So now we have two word lists in the game:
the list of all words that are valid for the player to enter
the list of 100k most common English words
When the puzzle generator is run, it loads both these word lists and it generates a large data structure that contains all the words in the game, combined with frequency data. Using this, we can quickly check if any given word the player enters is valid, but we can also quickly check a given word’s frequency index. In this way we can ask “is the word CAT in the top 10,000 most common words?”
Next we define a bunch of different “chunks” to the frequency data: top 1k words, top 5k words, etc. These are the different “dict” indices used in the tool. Dict 2 represents the top 15k most common English words, for example. You can see in the tool screenshot above that we can specify different dicts to use both for picking the words at the sides of the puzzle, but also which words can be used to solve the puzzle. In this way we can make sure the solutions only use common words, and never use obscure words that only champion Scrabble players would know.
Ok, now we’re getting somewhere. Now we can generate puzzles that use good words and have reasonable solutions. Now we’re done, right? No.
In the game, we offer hints to the player if they get stuck on a puzzle. The player can reveal words, one at a time, from the “best” solution that the solver found for the puzzle. But how do you pick a best solution?
I mentioned above that our solvers finds all possible solutions (using a word list subset) for a given puzzle. This is important. If we stop when we find a single solution, we might find a bad one. So we find them all, then run some analysis on them to find the lowest-frequency (least common) word in each solution. We deem the “best” solution to be the one with best lowest-frequency word. This means that if one solution’s least common word has frequency index 379 and another has index 12,978, the first solution is better, because the words it uses are more common. Cool. That gave puzzles in the early part of the project that looked like this:
What you’re seeing are the puzzle words on the left, the best solution’s words in the middle, and a mathematical representation of the solution that tells the game where each word goes on the board. So the plan was to generate lots of puzzles like this and then pick the ones we liked. But we still didn’t have a good idea of what a good puzzle looked like. So we started playing lots and lots of puzzles. And we started keeping track of things we liked about some puzzles and not others. Through a lot of playing and iteration we found that good puzzles generally had a few things in common:
the words at the sides sounded good together
they had a large number of different solutions
they had solutions that used common words
Then we also needed a way to get the generator code to help us determine difficulty. Again, through a lot of playing and iterations we eventually found some guidelines that determined the difficulty of the puzzle:
the size of the puzzle
how obscure the least common word in the best solution is
the frequency with which each pair of letters in the puzzle appears in English words
the number of words that can be made in this puzzle using each letter pair
the number of possible solution using common words
So we implemented analysis code that checks all of these things in the generator.Â
The word frequency list we had licensed also had word type information in it. We realized that we could use that to generate puzzles that had words that sounded good together. For example, a puzzle of the form Adjective/Noun or Adverb/Verb is more fun to do that two random adjectives together. So we incorporated that into the generation code too. We put everything together and do a little mathemagics and pop out an overall heuristic score that attempts to encapsulate everything into one value of difficulty.
Now the puzzles look like this:
At that point we could generate thousands of puzzles into files and sort them by heuristic score. From there we would read through the puzzles and pick out ones we liked the sound of, put them in the game, and see how they felt. Then we’d organize them into sets and arrange them by difficulty to try to give a nice ebb and flow to the game.
And that’s exactly what we did for all the built-in puzzles. But when we decided to add daily puzzles in v1.1, which are generated on the device, we had to add an extra step.
When we were playing the puzzles ourselves, we could get a feel for how “cohesive” the puzzle felt. Some puzzles had solutions that fit into nicer blocks than others. While some would require solutions that had words split up across the board. These latter puzzles were much more difficult to do. So when we started generating puzzles on device, we needed a way to measure that too.
So now the algorithm does an additional analysis on each solution in the puzzle and it picks the “best” solution as the one that feels the most cohesive. It means that, in general, the puzzles for the dailies have nicer feeling solutions if you make use of the hints.
There’s a lot that goes into making good puzzles for a puzzle game. Obviously I’ve only skimmed the surface of what we did for Sidewords, but I hope that gives you a bit of insight into what we did.
Sidewords is available for iOS, Android, and PC.
Owen












