Unscrambling the Hidden Secrets of Superpermutations
Introduction
Four. Three. One. Five. Two.
Nothing happens. The evil genius Derango is loading a cataclysmic computer virus that, when launched, will lock the entire world out of the internet. In five minutes, no one will be able to access their online bank accounts, networked video games or online TV shows ever again!
Three. One. Five. Two. Four.
The virus keeps loading. The code to stop the program is some sequence of the digits 1, 2, 3, 4, 5. If you enter them in the right order, you’ll disable Derango’s virus and save the day. The problem is that there are 120 possible codes, and each one is five digits long. You don’t have time to enter 600 digits.
Suddenly, inspiration strikes. If you punch in 123451, you’re actually trying two codes: 12345 and 23451. Even better, entering 1234512 will succeed if the code is 12345, 23451 or 34512. You do some quick calculations. Instead of keying in 600 digits to cover all the possibilities, you now only have to enter 153 digits. You have just enough time — and it works! You’ve saved the day. It’s a good thing you read about “superpermutations” in Quanta.
What are these superpermutations that derailed Derango’s plan? You may remember the word “permutation” from math class. A permutation is an arrangement of symbols that uses each symbol exactly once: no repeats and no omissions. Permutations pop up all over the place: Anytime you arrange items in some order, such as books on a shelf, classes in your schedule, or tasks on a to-do list, you are working with permutations.
We’re often interested in counting the total number of permutations of a given set of objects, and there’s a nice method for doing it. Consider a simple set of just three letters {A, B, C}. There aren’t many permutations of these symbols, so it’s not hard to list them all. Let’s start with all the permutations that begin with A. If A is first, what could come second? Either B, which means C comes last, or C, which means B comes last. So there are two permutations that start with A: namely ABC and ACB. Starting with B we get two more permutations: BAC and BCA, and starting with C, we get CAB and CBA.
So there are a total of six permutations of {A, B, C}, and here they are in alphabetical order:
ABC
ACB
BAC
BCA
CAB
CBA
Listing permutations gets harder the more symbols we use, so we need a better way to count them. Notice that a permutation of {A, B, C} can start with any of the three letters, so we have three options for the first symbol. How many options do we have for the second? We can’t use any letter twice, so there are only two options for the second item in our permutation. And once we’ve chosen the second item, there is only one symbol left. For example, if we start with C and then choose A, the last symbol has to be B.
This constructive method gives us a nice way to count the possible permutations. We have three options for the first item, two for the second, and one for the third. According to the “fundamental counting principle,” the total number of possibilities can be found by multiplying the number of possibilities for each choice. That gives us 3 × 2 × 1 = 6.
This method generalizes beautifully. If we are arranging four symbols, we will have four options for our first item, three for the second, two for the third, and one for the fourth. This gives us a total of 4 × 3 × 2 × 1 = 24 permutations of four symbols. Likewise, there are 5 × 4 × 3 × 2 × 1 = 120 permutations of five symbols — the maximum number of codes we had to consider in thwarting Derango’s evil plot. In general, if we are counting permutations of n distinct symbols, there will be
n × (n – 1) × (n – 2) × … × 3 × 2 × 1
permutations. This product of all the integers from n down to 1 is called n factorial, which is denoted n!. Thus, 3! = 6, 4! = 24, and 5! = 120. Factorials grow very quickly, which is why listing all the permutations isn’t necessarily a good strategy for larger sets of items. For example, if you wanted to know how many different ways 10 runners could finish a race, your list would be 10! = 3,628,800 items long!
So, a permutation is a special arrangement of symbols. A “superpermutation,” in a sense, is a special arrangement of permutations: It is a sequence of symbols that contains every possible permutation of those symbols. For example, stringing together all six permutations of {A, B, C} gives us ABCACBBACBCACABCBA, a superpermutation of {A, B, C}.
But, at 18 symbols long, this is not the shortest possible superpermutation of {A, B, C}. We can shorten it without losing any of the permutations it contains. Notice the double B:
ABCACBBACBCACABCBA
Removing one of the Bs means we’ll no longer have CBB or BBA in our sequence, but neither of those are permutations of {A, B, C}, so we don’t need them. Now we have:
ABCACBACBCACABCBA
This shortened sequence still contains every permutation of {A, B, C}, so it’s a valid superpermutation. In fact, you can also remove other unnecessary letters. And when you can make things shorter, the natural mathematical question to ask is, “How short can you make it?” So, how long is the shortest superpermutation of {A, B, C}?
A good approach to a problem like this is to attack it from both sides. For example, we just found a superpermutation that’s 17 symbols long, so the shortest superpermutation cannot be longer than 17 symbols. This is an upper bound. Can we find a lower bound — the minimum number of symbols the superpermutation must contain?
Well, to get at least one permutation, the sequence has to contain at least three symbols, so let’s start with ABC. When we add A to the end of this sequence we get ABCA, which gives us two permutations: ABC and BCA. (Notice that adding B or C as the fourth symbol won’t give us any new permutations, so A is the best choice for keeping our sequence short.)
We then add B as the fifth symbol, since that’s the only way to add another new permutation. This gives us ABCAB. Now we’ve got three permutations: ABC, BCA and CAB. So far, adding one letter gets us one new permutation. If this keeps up, we’ll need to add only three more letters to get all six permutations, which would give us a total of eight letters. We can’t possibly do any better than this, so no superpermutation of {A, B, C} can be shorter than eight symbols. That means 8 is a lower bound.
But is there a superpermutation of {A, B, C} of length 8? It turns out there isn’t. Our initial strategy doesn’t work forever. When we try to continue from ABCAB, the only letter we can add to the end to make another permutation of {A, B, C} is C. But the “new” permutation at the end is ABC, the same permutation we started with! So adding C doesn’t help, nor does adding B or A. We’re stuck: There is no letter we can add to our sequence that will produce a new permutation.
Our strategy worked well until we cycled back around to where we started from. In fact, these three permutations are called cyclic: They consist of the same three symbols in the same relative order, cycling around as if they were on a circle.
ABC BCA CAB
While cycling around this circle doesn’t allow us to keep getting new permutations by adding just one letter, it does help us divide up the set of all permutations. Every permutation of {A, B, C} can be found in these two lists of cyclic permutations:
ABC BCA CAB
ACB CBA BAC
To make a shorter superpermutation of {A, B, C}, we can just collapse these two lists:
ABCAB
ACBAC
and then stick them together:
ABCABACBAC
We know a superpermutation of {A, B, C} can’t be any shorter than eight letters, so this one of length 10 seems very close to the minimum length. But we can do even better. Let’s look at our two cyclic sequences again.
ABCAB
ACBAC
There’s no reason these cycles have to start with A. If we start the second cycle with BAC we get the same three permutations in a slightly different order:
BAC ACB CBA
which collapses to:
BACBA
Now, instead of ACBAC, we use BACBA as our second cycle. And when we stick ABCAB and BACBA together to get ABCABBACBA, we get a duplicate B in the middle, which we can remove. This leaves us with:
ABCABACBA
This superpermutation is only nine symbols long and is indeed the shortest superpermutation of {A, B, C}. Now, we haven’t formally proved that there is no superpermutation of length 8, but there are fewer than 1,000 possible arrangements of eight symbols that contain the correct number of As, Bs and Cs, and a computer can check that none of them contain all six permutations of {A, B, C}.
So the shortest superpermutation of three symbols has length 9. But what if we add another symbol? What is the minimum length of a superpermutation of {A, B, C, D}? Once again, we turn to cyclic permutations, which give us a nice way to turn a superpermutation of three symbols into a superpermutation of four symbols.
Start with one of the permutations of {A, B, C}, say, ABC, and put a D at the end.
ABCD
Now, create a partial superpermutation by cycling through the cyclic permutations of ABCD:
ABCDABC
Notice that each of the cyclic permutations of ABCD — ABCD, BCDA, CDAB, DABC — is present in this partial superpermutation.
Now we’ll do that for all six permutations of {A, B, C}:
ABC ➔ ABCD ➔ ABCDABC
ACB ➔ ACBD ➔ ACBDACB
BAC ➔ BACD ➔ BACDBAC
BCA ➔ BCAD ➔ BCADBCA
CAB ➔ CABD ➔ CABDCAB
CBA ➔ CBAD ➔ CBADCBA
Every permutation of {A, B, C, D} is in one of the six partial superpermutations above. And each of these partial superpermutations starts with a permutation of {A, B, C}, followed by D, followed by a copy of the original permutation of {A, B, C}. For example:
ABCDABC
ABC D ABC
This tells us how to build a short superpermutation of {A, B, C, D} from the shortest superpermutation of {A, B, C}: Look for each permutation of {A, B, C} in the sequence, find the end of it, and then tack on a D followed by a copy of that permutation of {A, B, C}. Here’s the strategy applied first by locating ABC and then by locating BCA.
ABCABACBA
ABC ABACBA
ABC DABC ABACBA
ABCDABCABACBA
ABCDA BCA BACBA
ABCDA BCA DBCA BACBA
ABCDABCADBCABACBA
After we do this for each of the six permutations of {A, B, C}, we end up with this superpermutation of {A, B, C, D}:
ABCDABCADBCABDCABACDBACBDACBADCBA
(You should check to make sure that each of the 4! = 24 permutations of {A, B, C, D} is in there.) This sequence has 33 symbols, and although this is by no means obvious, it is indeed the shortest possible superpermutation of {A, B, C, D}.
One nice feature of this strategy is that it’s easy to count how many symbols we’ve added in making our new superpermutation: We added four symbols at the end of each of the six permutations of {A, B, C}, for a total of 24 new symbols, which makes sense since 9 + 24 = 33. But there’s something special about 24: it is 4!. To create the shortest superpermutation of four symbols, we added 4! new symbols to the shortest superpermutation of three symbols!
That seems too nice to be a coincidence. Let’s work backward to see if the same strategy can be used to construct the shortest superpermutation of {A, B, C} from the shortest superpermutation of {A, B}:
ABA
AB CAB A
ABCABA
ABCA BA CBA
ABCABACBA
The minimal superpermutation of {A, B} is ABA, and our strategy adds 3! = 6 new symbols to get our minimal superpermutation of {A, B, C} of length 9. Going back even further, we see our strategy turns the minimal superpermutation of {A}, of length 1, into a superpermutation of {A, B} of length 1 + 2! = 3.
A
A BA
ABA
An amazing formula emerges: When we construct superpermutations of a set of n symbols in this way, we get sequences of this length:
n! + (n – 1)! + (n – 2)! + … + 3! + 2! +1!
Using this technique we can produce superpermutations of known length for n symbols. But will those superpermutations be the shortest possible? For n = 1, 2, 3, 4 or 5, the answer is yes. And for many years, that’s all anyone really knew about the problem. Since superpermutations grow in size so quickly, it was hard to explore the question further even with the aid of computers.
But in 2014 someone devised a different strategy for constructing even shorter superpermutations when n > 5. Although stringing together cycles as we did above is elegant and predictable, it turns out there are more efficient ways to put the sequences together when there are more symbols to work with. Researchers are now treating the minimal superpermutation problem like a traveling salesman problem, and since the latter problem has no simple solution, the former may not either.
For now, establishing upper and lower bounds may be the best we can hope for. But help sometimes comes from surprising sources. On the controversial online message board 4chan, a post about how to watch a certain TV series in every possible order led an anonymous commenter to discover a new lower bound for the minimal superpermutation problem. It took a while for mathematicians to notice, and to sort through the details, but the result is now recognized as a valid theorem. We just don’t know who should get the credit!
And the science fiction author and amateur mathematician Greg Egan recently posted a new upper bound for minimal superpermutations on his website. But don’t be too surprised: The title of Egan’s 1994 novel, Permutation City, suggests he may know a thing or two about counting arrangements. Maybe a “super” sequel is in the works? If Derango shows up, at least you’ll know the super secret method for how to save the day.
Exercises
1. In the column, we note that the superpermutation ABCACBBACBCACABCBA can be shortened by removing the duplicate B in the middle. This superpermutation can be shortened even more. What other letters can be removed?
2. The shortest superpermutation of {A, B, C} is nine symbols long. In the column, we established an upper bound of 17 for the length of superpermutations of {A, B, C}. For every integer n between 8 and 17, is there a superpermutation of {A, B, C} of length n? And for which n are there superpermutations of {A, B, C} of length n that can’t be shortened by removing redundant letters?
3. Consider the set {A, A, B}. A permutation of this set of symbols is a string of length 3 containing two As and one B — for example: AAB or ABA. What is the shortest possible superpermutation of {A, A, B}? What about {A, A, B, B}?