Landmark Algorithm Breaks 30-Year Impasse
A theoretical computer scientist has presented an algorithm that is being hailed as a breakthrough in mapping the obscure terrain of complexity theory, which explores how hard computational problems are to solve. Last month, László Babai, of the University of Chicago, announced that he had come up with a new algorithm for the “graph isomorphism” problem, one of the most tantalizing mysteries in computer science. The new algorithm appears to be vastly more efficient than the previous best algorithm, which had held the record for more than 30 years. His paper became available today on the scientific preprint site arxiv.org, and he has also submitted it to the Association for Computing Machinery’s 48th Symposium on Theory of Computing.
(January 15, 2017, update: On January 4, Babai retracted his claim that the new algorithm runs in quasi-polynomial time and then five days later announced that he had fixed the error. Read more on the Abstractions blog.)
Complexity Theory Problem Strikes Back
The legendary graph isomorphism problem may be harder than a 2015 result seemed to suggest.
For decades, the graph isomorphism problem has held a special status within complexity theory. While thousands of other computational problems have meekly succumbed to categorization as either hard or easy, graph isomorphism has defied classification. It seems easier than the hard problems, but harder than the easy problems, occupying a sort of no man’s land between these two domains. It is one of the two most famous problems in this strange gray area, said Scott Aaronson, a complexity theorist at the Massachusetts Institute of Technology. Now, he said, “it looks as if one of the two may have fallen.”
Babai’s announcement has electrified the theoretical computer science community. If his work proves correct, it will be “one of the big results of the decade, if not the last several decades,” said Joshua Grochow, a computer scientist at the Santa Fe Institute.
Computer scientists use the word “graph” to refer to a network of nodes with edges connecting some of the nodes. The graph isomorphism question simply asks when two graphs are really the same graph in disguise because there’s a one-to-one correspondence (an “isomorphism”) between their nodes that preserves the ways the nodes are connected. The problem is easy to state, but tricky to solve, since even small graphs can be made to look very different just by moving their nodes around.
Now, Babai has taken what appears to be a major step forward in pinning down the problem’s difficulty level, by setting forth what he asserts is a “quasi-polynomial-time” algorithm to solve it. As Aaronson describes it, the algorithm places the problem within “the greater metropolitan area” of P, the class of problems that can be solved efficiently. While this new work is not the final word on how hard the graph isomorphism problem is, researchers see it as a game changer. “Before his announcement, I don’t think anyone, except maybe for him, thought this result was going to happen in the next 10 years, or perhaps even ever,” Grochow said.
In recent weeks, Babai gave four talks outlining his algorithm. It will take time for his new paper to be thoroughly vetted by other experts, however. Babai has declined to speak to the press, writing in an email: “The integrity of science requires that new results be subjected to thorough review by expert colleagues before the results are publicized in the media.”
Other researchers are cautiously hopeful that his proof will pan out. “Babai has a sterling record,” Aaronson said. “He’s as trustworthy as they come.”
“I think people are pretty optimistic,” said Luca Trevisan, a computer scientist at the University of California, Berkeley, after Babai’s second talk. Assuming the proof is correct, he said, “it’s a landmark result.”
A Blind Taste Test
Given two graphs, one way to check whether they are isomorphic is simply to consider every possible way to match up the nodes in one graph with the nodes in the other. But for graphs with n nodes, the number of different matchings is n factorial (1 * 2 * 3 * … * n), which is so much larger than n that this brute-force approach is hopelessly impractical for all but the smallest graphs. For graphs with just 10 nodes, for instance, there are already more than 3 million possible matchings to check. And for graphs with 100 nodes, the number of possible matchings far exceeds the estimated number of atoms in the observable universe.
Computer scientists generally consider an algorithm to be efficient if its running time can be expressed not as a factorial but as a polynomial, such as n2 or n3; polynomials grow much more slowly than factorials or exponential functions such as 2n. Problems that have a polynomial-time algorithm are said to be in the class P, and over the decades since this class was first proposed, thousands of natural problems in all areas of science and engineering have been shown to belong to it.
Computer scientists think of the problems in P as relatively easy, and they think of the thousands of problems in another category, “NP-complete,” as hard. No one has ever found an efficient algorithm for an NP-complete problem, and most computer scientists believe no one ever will. The question of whether the NP-complete problems are truly harder than the ones in P is the million-dollar P versus NP problem, widely regarded as one of the most important open questions in mathematics.
The graph isomorphism problem is neither known to be in P nor known to be NP-complete; instead, it seems to hover between the two categories. It is one of only a tiny handful of natural problems that occupy this limbo; the only other such problem that’s as well-known as graph isomorphism is the problem of factoring a number into primes. “Lots of people have spent time working on graph isomorphism, because it’s a very natural, simple-to-state problem, but it’s also so mysterious,” Grochow said.
There are good reasons to suspect that graph isomorphism is not NP-complete. For example, it has a strange property that no NP-complete problem has ever been shown to have: It’s possible, in theory, for an all-knowing being (“Merlin”) to convince an ordinary person (“Arthur”) that two graphs are different without giving away any hints about where the differences lie.
This “zero-knowledge” proof is similar to the way Merlin could convince Arthur that Coke and Pepsi have different formulas even if Arthur can’t taste the difference between them. All Merlin would have to do is take repeated blind taste tests; if he can always correctly identify Coke and Pepsi, Arthur must accept that the two drinks are different.
Similarly, if Merlin told Arthur that two graphs are different, Arthur could test this assertion by putting the two graphs behind his back, moving their nodes around so that they looked very different from the way they started, and then showing them to Merlin and asking him which was which. If the two graphs are really isomorphic, there’s no way Merlin could tell. So if Merlin gets these questions right again and again, Arthur will eventually conclude that the two graphs must be different, even if he can’t spot the differences himself.
No one has ever found a blind-taste-test protocol for any NP-complete problem. For that and other reasons, there’s a fairly strong consensus among theoretical computer scientists that graph isomorphism is probably not NP-complete.
For the reverse question — whether graph isomorphism is in P — the evidence is more mixed. On the one hand, there are practical algorithms for graph isomorphism that can’t solve the problem efficiently for every single graph, but that do well on almost any graph you might throw at them, even randomly chosen ones. Computer scientists have to work hard to come up with graphs that trip these algorithms up.
On the other hand, graph isomorphism is what computer scientists call a “universal” problem: Every possible problem about whether two “combinatorial structures” are isomorphic — for example, the question of whether two different Sudoku puzzles are really the same underlying puzzle — can be recast as a graph isomorphism problem. This means that a fast algorithm for graph isomorphism would solve all these problems at once. “Usually when you have that kind of universality, it implies some kind of hardness,” Grochow said.
In 2012, William Gasarch, a computer scientist at the University of Maryland, College Park, informally polled theoretical computer scientists about the graph isomorphism problem and found that 14 people believed it belongs to P, while six believed that it does not. Before Babai’s announcement, many people didn’t expect the problem to be resolved anytime soon. “I kind of thought maybe it was not in P, or maybe it was but we wouldn’t know in my lifetime,” Grochow said.
Paint by Numbers
Babai’s proposed algorithm doesn’t bring graph isomorphism all the way into P, but it comes close. It is quasi-polynomial, he asserts, which means that for a graph with n nodes, the algorithm’s running time is comparable to n raised not to a constant power (as in a polynomial) but to a power that grows very slowly.
The previous best algorithm — which Babai was also involved in creating in 1983 with Eugene Luks, now a professor emeritus at the University of Oregon — ran in “subexponential” time, a running time whose distance from quasi-polynomial time is nearly as big as the gulf between exponential time and polynomial time. Babai, who started working on graph isomorphism in 1977, “has been chipping away at this problem for about 40 years,” Aaronson said.
Babai’s new algorithm starts by taking a small set of nodes in the first graph and virtually “painting” each one a different color. Then it begins to look for an isomorphism by making an initial guess about which nodes in the second graph might correspond to these nodes, and it paints those nodes the same colors as their corresponding nodes in the first graph. The algorithm eventually cycles through all possible guesses.
Once the initial guess has been made, it constrains what other nodes may do: For example, a node that is connected to the blue node in the first graph must correspond to a node that is connected to the blue node in the second graph. To keep track of these constraints, the algorithm introduces new colors: It might paint nodes yellow if they are linked to a blue node and a red node, or green if they are connected to a red node and two yellow nodes, and so on. The algorithm keeps introducing more colors until there are no connectivity features left to capture.
Once the graphs are colored, the algorithm can rule out all matchings that pair nodes of different colors. If the algorithm is lucky, the painting process will divide the graphs into many chunks of different colors, greatly reducing the number of possible isomorphisms the algorithm has to consider. If, instead, most of the nodes end up the same color, Babai has developed a different way to reduce the number of possible isomorphisms, which works except in one case: when the two graphs contain a structure related to a “Johnson graph.” These are graphs that have so much symmetry that the painting process and Babai’s further refinements just don’t give enough information to guide the algorithm.
In the first of several talks on his new algorithm, on November 10, Babai called these Johnson graphs “a source of just unspeakable misery” to computer scientists working on painting schemes for the graph isomorphism problem. But Johnson graphs can be handled fairly easily by other methods, so by showing that these graphs are the only obstacle to his painting scheme, Babai was able to tame them.
Babai’s approach is “a very natural strategy, very clean in some sense,” said Janos Simon, a computer scientist at the University of Chicago. “It’s very likely that it’s the correct one, but all mathematicians are cautious.”
Even though the new algorithm has moved graph isomorphism much closer to P than ever before, Babai speculated in his first talk that the problem may lie just outside its borders, in the suburbs rather than the city center. That would be the most interesting possibility, Trevisan said, since it would make graph isomorphism the first natural problem to have a quasi-polynomial algorithm but no polynomial algorithm. “It would show that the landscape of complexity theory is much richer than we thought,” he said. If this is indeed the case, however, don’t expect a proof anytime soon: Proving it would amount to solving the P versus NP problem, since it would mean that graph isomorphism separates the problems in P from the NP-complete problems.
Many computer scientists believe, instead, that graph isomorphism is now on a glide path that will eventually send it coasting into P. That is the usual trajectory, Trevisan said, once a quasi-polynomial algorithm has been found. Then again, “somehow this problem has surprised people many times,” he said. “Maybe there’s one more surprise coming.”
This article was reprinted on Wired.com.