How Randomness Improves Algorithms
Introduction
Since the very first days of computer science — a field known for its methodical approach to problem-solving — randomness has played an important role. The first program to run on the world’s first general-purpose electronic computer used randomness to simulate nuclear processes. Similar approaches have since been used in astrophysics, climate science and economics. In all these cases, plugging in random numbers at certain steps in the algorithm helps researchers account for uncertainty about the many ways that complex processes can play out.
But adding randomness into an algorithm can also help you calculate the correct answer to unambiguous true-or-false questions. “You just say ‘OK, let me give up, let me not try, let me just pick something at random,’” said Eric Blais, a computer scientist at the University of Waterloo. “For way too many problems, that ends up being a successful approach.”
Let’s say you want to determine whether a given number is prime (divisible only by 1 and itself) or composite (also divisible by other integers). You could simply try to divide it by all possible factors, but for large numbers this “brute force” method and other factoring algorithms are excruciatingly slow. And if the number turns out to be composite, factoring algorithms tell you the values of its divisors — more information than you asked for. If you care only about a number’s “primality,” is there a more efficient algorithm?
There is if you use randomness. The basic idea goes back to a result from the 17th-century French mathematician Pierre de Fermat, known as his “little theorem.” Fermat considered two integers — call them N and x. He proved that if N is a prime number, then xN − x is always a multiple of N, regardless of the value of x. Equivalently, if xN − x is not a multiple of N, then N can’t be a prime number. But the inverse statement isn’t always true: If xN − x is a multiple of N, then N is usually but not always prime.
To turn Fermat’s little theorem into a primality test, just take the N that you’re interested in, choose x at random, and plug the two numbers into xN − x. If the result is not a multiple of N, then you’re done: You know that N is definitely composite. If the result is a multiple of N, then N is probably prime. Now pick another random x and try again. In most cases, after a few dozen tries, you can conclude with near certainty that N is a prime number. “You do this a small number of times,” Blais said, “and somehow now your probability of having an error is less than the probability of an asteroid hitting the Earth between now and when you look at the answer.”
The first primality tests using randomized algorithms (based on refinements to Fermat’s little theorem) ushered in a new era. Problem after problem turned out to be far easier to solve with randomness than with nonrandom, or deterministic, algorithms. The key was to reframe each problem as one that could be quickly solved given an appropriate value for some number x, and then prove that just about any x would do. The solution works even though researchers have no idea how to determine whether any specific choice is a good one. Mathematicians have joked that this unusual challenge is akin to finding hay in a haystack.
But these successes made researchers wonder why randomness should help with problems like primality testing, which are all about finding hidden, non-random patterns. “There’s something a bit paradoxical about it,” said Rahul Santhanam, a computer scientist at the University of Oxford. “Pure randomness is helping you get a handle on the structure that solves the problem.”
In 1994, the computer scientists Noam Nisan and Avi Wigderson helped resolve this confusion by demonstrating that randomness, though useful, probably isn’t necessary. They proved that one of two things must be true: Either all problems that can be efficiently solved using randomness also have fast deterministic algorithms, or many notoriously difficult problems are secretly easy. Computer scientists consider the second possibility very unlikely.
In fact, computer scientists often find it easier to develop a deterministic algorithm by starting with a randomized version and then “de-randomizing” it. “Once I have it, I suddenly see a very obvious way to make it deterministic,” said Eli Upfal, a computer scientist at Brown University. “But if I didn’t think about it in a randomized way as a probabilistic question, I probably would not think of it.”
Nearly 30 years after Nisan and Wigderson’s landmark proof, randomized algorithms remain as popular as ever, because de-randomization can be tricky and deterministic algorithms are often efficient only in principle. It wasn’t until 2002 that three researchers found a way to de-randomize primality testing, and in practice their algorithm is far slower than the best randomized algorithms. For other problems, it’s hard even to know where to begin — the best known algorithm has a chicken-and-egg problem that you can only escape through randomness.
That’s the case for a recent breakthrough in graph theory. Last year, three computer scientists developed a fast algorithm for finding the shortest path through a graph — a web of nodes connected by line segments — that works even when some segments subtract from the total path length rather than add to it. Their algorithm involved transforming the graph into a simpler one by deleting certain segments, solving the problem for the simplified graph, and then accounting for the deleted segments. They could prove that the algorithm would run quickly if no shortest path passed through too many deleted segments — otherwise, the last step would take too long.
But how to decide which segments to delete in the first place? It’s not just hard to find the ideal set of segments deterministically — it’s impossible. The set depends on which paths are shortest, the very problem the three researchers were trying to solve. But even though they couldn’t find the best set of segments to delete, they could prove that most random choices would be pretty good, and that was enough to break the self-referential loop. In the rare cases where the algorithm makes an unlucky choice and gets bogged down at the last step, they could just stop and run it again.
“Randomness is basically a way to ensure that something is true about the optimal solution without knowing the optimal solution,” said Aaron Bernstein, one of the authors of the new algorithm.
Randomness has found countless other uses in computer science, from cryptography to game theory to machine learning. Chances are, it’s here to stay.