The Math Behind Wordle Guesses
Introduction
In the simple game of Wordle, players have to guess a secret five-letter word in six or fewer turns based on clues about the presence and location of letters revealed by their previous guesses. While somewhat similar games have appeared in the past, everyone who plays Wordle on a particular day has to discover the same secret word, making it easy to share your attempts and discuss the game among your friends. The distinctive structure and presentation of the game inspired the questions in our latest Insights puzzle. The answers are discussed below.
One key to playing a good Wordle game is to choose a strong starting word. Computer analyses embodying information theory techniques suggest that starting words such as “slate” and “crane” enable you (or a computer algorithm, at any rate) to solve Wordles in the least number of turns on average. However, many human solvers feel more comfortable choosing a vowel-rich word such as “adieu,” “audio” or “raise.” This feeling has both an intuitive and a rational basis. First, placed vowels enable you to find a vowel “backbone” that can restrict the number of consonants you need to search for. For instance, if you know the word looks like _AI_E after you play “raise,” there are just a few possible words left: “naïve,” “waive” and “maize.” Second, vowels maximize a quantity that can be called “coverage” — between just the five vowels and Y, we can get at least one positive letter in every one of the 2,309 answers. To get this kind of perfect coverage with consonants, you would have to try all 20 of them, which would require at least five turns.
Our first puzzle challenged readers to figure out which one of these three vowel-rich words is the best first guess.
Puzzle 1
The following table gives the frequency with which the eight letters in the words “adieu,” “audio” and “raise” occur in each position over the entire Wordle answer list of 2,309 words. Based on this table, determine how many greens and yellows you can expect to get over the entire Wordle answer list for each of the three vowel-rich starting words: “adieu,” “audio” and “raise.” (In Wordle, a letter is shown with a green background if it is in the right place, and a yellow background if it is in the word but is in the wrong place.) What does this tell you about their expected performance as starting words?
Reader Rob Corlett showed how to calculate the number of expected greens and yellows from this table. For “adieu,” A is the correct first letter for 140 words, D is the correct second letter for 20 words and so on. The total number of greens over all possible Wordle answers is the sum of these. So “adieu” gets a total of 140 + 20 + 266 + 318 + 1 = 745 greens. For yellows, we have to start with the number of times the letter occurs at least once in a word (906 for the A in “adieu”) and subtract the times it is green (140) to get the number of yellows (766). Add the numbers for each letter in the word to get the total number of yellows. We can divide these numbers by the total number of answers (2,309) to get the expectation of greens and yellows for a single turn, but since this step is common for all our starter words, we can just work with the totals for comparing the three of them. Since we have chosen these words specifically for finding a vowel backbone, we can also calculate how many of the greens come from vowels. Here are the results.
As you can see, there is no comparison! “Raise” is superior to “adieu” in every measure, giving more greens and yellows and yielding more vowels in their right places, to say nothing of the fact that you are also catching or ruling out two of the most common consonants. “Audio” is a distant third on all these measures. Note that while you can get some information about what letters are absent even if you do not get any yellows or greens, as reader Max Davies pointed out, you definitely get more information when you get one or more yellows and greens. So, “adieu” users, perhaps it’s time to say adieu.
Question 1
This was a question about how much we should value greens relative to yellows: How many yellows are equal to a single green? The clear-cut nature of our results above obviates the need to answer this for the above comparison, but it’s an interesting question. There are two aspects to this valuation. The first is the human aspect: How much weight do you give to the mental effort required to figure out all the different ways that a yellow letter may be placed? There is no denying that hitting a lot of greens makes life easier and gives us more of a dopamine boost. From the information theory point of view, you would need to go over every starting word for every answer word and compare how many turns it would take to solve the puzzle when the same letters were green compared to when they were yellow in every instance.
While this is a huge task, I did manage to do it for the best possible computer starting word (the obscure word “tarse,” which means a male falcon, whose full optimal solution tree has been posted online by the mathematician Alex Selby). The answer is surprising. The average number of turns required for a computer solution using an answer word that produced only greens in the first turn was 3.34, while the number of turns required when there were only yellow letters was 3.51, a mere 5% increase! Evidently, to a computer algorithm, placing the yellow letters, which seems so intimidating to us humans, can be achieved without too much of a penalty. I would guess that the difference would be greater for a human solver not just in the number of turns required, but also in the mental effort and time required to solve.
Puzzle 2
A) If you get all five yellows on your first turn, what’s the maximum number of turns it can take to find the answer, assuming best play?
As Rob Corlett and Sam Rhoads correctly stated, the theoretical answer is five: A fully yellow combination of letters such as ABCDE could resist discovery for four more turns, as you might have to cycle through BCDEA, CDEAB and DEABC before discovering that the answer was EABCD. In practice, though, such cyclic “words” are not possible precisely because real words have defined vowel and consonant patterns that cannot be stretched arbitrarily. Even words with many anagrams can be solved in no more than three tries, as Rob Corlett demonstrated with “parse.”
B) Is it ever the case that having a letter in a certain position turn yellow is more valuable than seeing it turn green? If so, can you give an example and explain why this should be?
Yes, a letter coming up yellow can, in rare cases, be more valuable than the same letter coming up green, if it’s a letter that rarely appears in the other positions. This often happens with Y, which is overwhelmingly found at the end of a word. Suppose you start with “belly,” and both B and Y come up green. You’re left with many possibilities: “baggy,” “bitty,” “bobby,” “booty,” “bushy,” etc. But if both B and Y come up yellow, there is only one possibility: “abyss.”
Question 2
Does a person with a good vocabulary of obscure Scrabble words have an advantage or disadvantage in playing Wordle?
As a former tournament Scrabble player who spent quite a few hours memorizing obscure words, I think it is both an advantage and a disadvantage. When I first started playing Wordle, I found myself frequently seeing the possibility of and trying to rule out uncommon words that I later realized had almost no chance of being correct. (In golf terminology, which my Wordle group frequently uses, we refer to this as being stymied by an imaginary hazard.) As I described in the puzzle column, Wordle answers are drawn from a list of simple words, the majority of which are known to all native U.S. English speakers. Even words that are somewhat uncommon but not obscure are not on the Wordle answer list. For example, I recently wasted a turn playing “latex,” a fairly common word that turns out not to be a possible Wordle answer. So, like all Wordle players, I’ve had to build a mental model of the kind of word that might be a Wordle answer and to specifically ignore the kinds of rare and obscure words that I would happily use to score more points in Scrabble. On the other hand, the knowledge of these rare words comes in handy in “sweeping consonants,” which you sometimes have to do to avoid spending many turns guessing a bunch of similar words one by one. For example, if you have _RA_E and are looking at a bunch of possible words containing D, G and K, such as “brake,” “drake,” “drape,” “grade” and “grape,” it helps to know and play the word “kedge,” which can guarantee finding the solution in two more turns (to kedge means to move a ship by dropping its anchor at a distance and then pulling on it with a stout rope).
Getting the same Wordle puzzle as everyone else each day encourages social play. But spoilers abound on the internet, and it is known that some people cheat in reporting their scores. The next puzzle deals with the question of when suspicions of cheating in a Wordle group are warranted based solely on the improbability of a person’s score. Again, this puzzle is framed in golf scoring terms: A Wordle solution in three turns is termed a birdie, getting it in two turns is an eagle and getting a word on the very first turn is, of course, a hole-in-one.)
Puzzle 3
A traditional scientific criterion for investigating further is if the probability of an outcome occurring by chance (the alpha value) is less than 5% or less than 1%, depending on the goals of the researchers. The result is then deemed to be statistically significant at the 5% or 1% level. Since it is not nice to suspect people of cheating when they are not, let us choose the more conservative 1% level in this investigation.
Suppose you belong to a Wordle group of 10 players who have been sharing results with each other every day for 200 days. Assume that a very good human player can expect to get a birdie every 2.5 games, an eagle every 40 games, and a hole-in-one every 2,000 games (which are reasonable real-world estimates).
A) How many birdies in a row would be significant at the 1% level in your group during this time?
B) How many eagles in a row?
C) How many holes-in-one in a row?
The key here is to realize that you have a population size of 2,000 person-games. So, in order to reach this significance level, you would need to see an event that would happen less often than once in 200,000 person-games solely by chance.
A) Birdie-or-better streaks: The probability of getting a birdie or better in a single game is 2/5 + 1/40 + 1/2,000 = 0.4255, which is 1 in about 2.35 games. Let’s call this B. The lowest power of B that exceeds 200,000 is B15, which is more than 368,000 (B14 is about 157,000). So, a birdie-or-better streak of 15 or more for anyone in the group would satisfy this stringent criterion, but one of 14 would not. If you suspected an individual player, you would need to see an event that happens less frequently than once in 20,000 games, which would happen with a birdie-or-better streak of 12. (Note that the actual number of opportunities to have streaks of these lengths is slightly smaller: It is actually 1,850 games for the group and 188 games for the individual player, but that doesn’t make a difference in this case).
Note that these are the frequencies for expert players, and suspicious streaks for most groups and individuals would be smaller. To apply this criterion in practice you would need to determine the corresponding birdie, eagle and hole-in-one frequencies that you are seeing and also take into account the number of games that have been played in your group.
B) Eagle-or-better streaks: The probability for an eagle or better is 1/40 + 1/2,000 = 0.0255, or about 1 in 39.2. The streak lengths that exceed our significance level are 4 for the group and 3 for a suspected individual.
C) Hole-in-one streaks: The streak length that exceeds our significance level is 2 both for the group and for a suspected individual.
There is a caveat to the last two answers: These are rare events, and the sample size is very small, so you have to be careful. Most statisticians would generally wait until they had seen at least five or more instances of eagles or holes-in-one, not necessarily as part of a streak, before they were comfortable applying a significance test.
Question 3
It’s entirely possible that the frequency of good results in your group is significantly higher than the frequency predicted by chance, without anyone cheating. How would you explain this?
One possible reason for this, as Rob Corlett explains, could be that “the players all keep assiduous records of every result.” As I explained in the prelude to puzzle 4, Wordle answers are not due to be repeated for five years or so under the current setup. So even if no one cheats or knows all the words on the answer list, this information can still help any individual or group gradually perform better.
But there is also another reason: The list may not be well randomized. In playing Wordle over the last several months, I noticed that whenever there was a choice between two or more words, the simpler words were more likely to be correct than the less common words. For example, if you had A, N and E and the choices left were words like “sneak,” “hyena” and “enema,” you could unhesitatingly play the simplest word (“sneak” in this instance) and you’d be correct much more often than you would expect by pure chance. I actually used an English prose word frequency list to check how common the answers I was encountering over two months were compared to an average word in the Wordle answer list. The answers I encountered were about 25% more common than the average word on the Wordle answer list, and more importantly, for the rarest words on the list (the bottom 10%), only a third as many showed up as answers as were supposed to. Eagles happened with a frequency closer to 1/20 rather than the 1/40 based on pure chance. So it seems that the Wordle answer sequence is not well randomized, and either it’s front-loaded with simpler words or we happen to be going through a portion of the list that consists of simpler words.
A significant recent change is that The New York Times appointed a Wordle editor to program the day’s word starting November 7. Since then, the removal of difficult or offensive words from the pre-sequenced list has become more common, including the replacement, behind the scenes, of words like “ombre,” “vomit” and “fanny.” While I understand the need for the Times to sanitize and simplify Wordle words to prevent outrage from the millions of people playing, it makes the game less random and that much more predictable. Even worse is the unfortunate editorial tendency in the past few weeks to pick a word to suit the day, such as “feast” on Thanksgiving Day and “medal” on Veterans Day. This amounts to giving an extra clue about the word even before the game starts, making the puzzle easier and detracting from its rich information theory connection. I do hope this is a temporary aberration because randomness is an essential element of this game. Most people who gave feedback to The New York Times about these editorial choices felt the same way.
Our fourth puzzle was based on the fact that, under its current architecture, Wordle solutions will never repeat until the list runs out after five years or so.
Puzzle 4
Consider a person with a perfect memory of past solutions. To such a person, the answer would be obvious on the last day of Wordle’s 2,309-word list. Can you quickly estimate how many holes-in-one this person would expect to get over the duration of the entire list, without doing the actual calculation? Then if you can, try and do the actual calculation.
Rob Corlett answered this perfectly, logically estimating the answer to be 8.25, and then calculating the answer to be 8.32. Corlett’s key calculations are quoted below. You can check the comment for the excellent estimation technique.
If you have m words and you take a guess then the chances of getting it right are 1/m. If you have 1 word the chances are 1/1, 2 words 1/2, 3 words 1/3, etc. If you add these together you get the expected number of holes-in-one! …
[This] needs us to calculate the sum of the reciprocals of all the numbers from 2309 down to 1. I did this in a spreadsheet and found the total to be 8.32, satisfyingly close to my estimate!
Our last question asked how to improve Wordle’s randomization of words while keeping its “client-side” design. Before the Wordle editor was appointed, there was no day-to-day randomization of words: The words came from a downloaded pre-sequenced list that wasn’t very well randomized, as I mentioned above. Then Wordle’s solution word was generated on the client’s (user’s) device from the word list depending on the current date, and the entire puzzle was adjudicated on the user’s device as well. The code to do all this is downloaded the very first time a user connects to the website each day. The user does not have to be online thereafter.
Question 4
How would you design Wordle so that it keeps the client-side design, ensuring that everyone gets the same solution word on a given day, but randomizes the answers in a sensible way without requiring a change in the code every day?
There were some good answers to the randomization question. A couple of readers suggested using a pseudorandom number with a predefined seed to create an index into the Wordle answer list. Mumintrollet even wrote a program that randomly shuffles five Wordle answer lists (lasting 32 years), making sure no word repeats within a year. To me, the most appealing procedure came from BlindThemis, who suggested that the random seed used for the randomization procedure should be the last four digits of the number of people who have played the game by a certain time. (Since Wordle can be played anywhere in the world, this would have to be done in time zones over the eastern Pacific, starting from the International Date Line!) The great thing about this is that no one, not even the New York Times Wordle editor, would know what the word was the day before it was used.
None of these mechanisms can be completely done on the client side, as Tim Ross pointed out. The next word would have to be generated by the server, and this word or its index number would have to be downloaded, possibly in encrypted form with the rest of the code. As Ross pointed out, currently the 2,309 answer words are clearly visible in date order in the source code, which any browser can reveal. One approach could be to encrypt the answer word list and save it in alphabetical order rather than in date order.
While the suggested improvements in randomization would help, encryption will make no difference at all, since there will still be multiple spoilers on the internet and multiple ways to cheat.
Thank you to everyone who contributed to this interesting discussion. The Insights prize for this puzzle goes to Rob Corlett. Congratulations! Our next puzzle will appear in February. Until then, happy puzzling and happy holidays!