algorithms

How to Build a Big Prime Number

A new algorithm brings together the advantages of randomness and deterministic processes to reliably construct large prime numbers.

Kristina Armitage/Quanta Magazine

Introduction

Prime numbers are tricky things. We learn in school that they’re numbers with no factors other than 1 and themselves, and that mathematicians have known for thousands of years that an infinite number of them exist. Producing one on command doesn’t seem as if it should be difficult.

But it is. Constructing arbitrarily large prime numbers is remarkably complicated. You basically have two computational options, both with drawbacks. You could use randomness and find one by guessing, but the method is inconsistent — you run the risk of generating a different prime every time. Or you could use a more reliable, deterministic algorithm, but at a heavy computational cost.

In May, a team of computer scientists showed that a kind of hybrid approach could also work. They published an algorithm that effectively combines the random and deterministic approaches to output a prime number of a specific length, with a high probability of delivering the same one even if the algorithm is run many times. The algorithm connects randomness and complexity in interesting ways, and it might also be useful for cryptography, where some encoding schemes rely on the construction of big primes.

“They laid out a sequence of attempts, each of them trying to construct a prime number of a different length, and showed that one of the attempts works,” said Roei Tell, a theoretical computer scientist at the Institute for Advanced Study who was not involved with the work. “It’s a construction that outputs a deterministically chosen prime, but allows you to toss coins and make random choices in the process.”

The challenge of making an efficient recipe for primes has deep roots. “We really don’t know that much about how primes are distributed, or about gaps in primes,” said Ofer Grossman, who studies pseudorandom algorithms. And if we don’t know where to find them, there’s no easy way to generate a prime number from scratch.

Rahul Santhanam in a striped purple shirt in front of a leafy plant.

Rahul Santhanam has worked to create better algorithms to find arbitrarily large prime numbers — a surprisingly difficult task.

Lata Ganapathy

Over time, researchers developed the aforementioned approaches. The simplest way is just to guess. If you want a prime with 1,000 digits, for example, you could pick a 1,000-digit number at random and then check it. “If it’s not prime, you just try another one, and another, and so on until you find one,” said Rahul Santhanam, a computer scientist at the University of Oxford and co-author of the new paper. “Because there are many primes, this algorithm will give you some number that’s prime with a high probability, after a relatively small number of iterations.” But using randomness means you’ll likely get a different number every time, he said. That could be a problem if you need consistency — if, say, you’re employing a cryptographic method of security that depends on the availability of large primes.

The other approach is to go with a deterministic algorithm. You could pick a starting point and start testing numbers, sequentially, for primality. Eventually you’re destined to find one, and your algorithm will consistently output the first one you find. But it could take a while: If you’re looking for a prime number with 1,000 digits, even a calculation with 2500 steps — which would take much longer than the age of the universe — isn’t enough to guarantee success.

In 2009, the mathematician and Fields medalist Terence Tao wanted to do better. He challenged mathematicians to come up with a deterministic algorithm for finding a prime of a given size within a computational time limit.

That time limit is known as polynomial time. An algorithm solves a problem in polynomial time if the number of steps it takes is no more than a polynomial function of n, the size of the input. (A polynomial function includes terms that have variables raised to positive integer powers, like n2 or 4n3.) In the context of prime number construction, n refers to the number of digits in the prime you want. Computationally speaking, this doesn’t cost much: Computer scientists describe problems that can be solved by algorithms in polynomial time as easy. A hard problem, by contrast, takes exponential time, which means it requires a number of steps approximated by an exponential function (which includes terms such as 2n).

For decades, researchers have investigated the connection between randomness and hardness. The prime number construction problem was considered easy if you allowed randomness — and were satisfied with receiving a different number each time — and hard if you insisted on determinism.

No one has managed to meet Tao’s challenge yet, but the new work comes close. It draws heavily on an approach introduced in 2011 by Shafi Goldwasser and Eran Gat, computer scientists at the Massachusetts Institute of Technology. They described “pseudodeterministic” algorithms — mathematical recipes for search problems, like finding big primes, that could harness the benefits of randomness and, with high probability, still produce the same answer every time. They would use the efficiency of random bits in the recipe, which would be de-randomized in the outcome, appearing deterministic.

Researchers have been exploring pseudodeterministic algorithms ever since. In 2017, Santhanam and Igor Oliveira of the University of Warwick (who also contributed to the new work) described a pseudodeterministic approach to constructing primes that used randomness and looked convincingly deterministic, but it worked in “subexponential” time — faster than exponential, but slower than polynomial time. Then in 2021, Tell and Lijie Chen, a computer scientist at the University of California, Berkeley, explored how to use a hard problem to build a pseudorandom number generator (an algorithm that generates a string of numbers indistinguishable from a random output). “[We] found a new connection between hardness and pseudorandomness,” Chen said.

The pieces finally came together in the spring of 2023, during a bootcamp on computational complexity at the Simons Institute for the Theory of Computing at Berkeley, when the researchers began to work together on the problem, weaving together past results. For the new work, Chen said, Hanlin Ren — a computer scientist at Oxford and a co-author — had the initial ideas to combine the Chen-Tell result with the Santhanam-Oliveira approach in a novel way. Then the whole team developed the ideas more fully to produce the new paper.

The resulting pseudodeterministic algorithm, Santhanam said, used new ways of looking at past work to produce prime numbers in polynomial time. It provably used randomness to output a prime number of a specific length, and the tool is more accurate than random guessing and more computationally efficient than deterministic crunching.

The new algorithm is also remarkably simple, Santhanam said, and it can be applied to a wide range of search problems — really, to any dense subset of numbers, like the primes, for which membership can be determined in polynomial time. But it’s not perfect. The algorithm works for infinitely many input lengths, but it doesn’t cover all lengths of digits. There may still be some values of n out there for which the algorithm doesn’t deterministically produce a prime.

“It would be cool to get rid of that small caveat,” Grossman said.

The ultimate goal, Santhanam said, is to find an algorithm that doesn’t require randomness at all. But that quest remains open. “Determinism is what we’d like to use,” he said.

But he also pointed out that pseudorandom processes are powerful tools, and projects like constructing primes are just one way of using them to connect ideas from mathematics, computer science, information theory and other areas.

“It’s exciting to try and think where else these brilliant observations will lead,” Tell said.

Comment on this article