**Our current state of knowledge**

It is now known and proven that there are always primes differing by 246 or less no matter how far out you go in the integers. We expect that this is true for every even difference $2, \ 4, \ 6, \ \ldots$, including the twin prime conjecture for primes with difference 2, but the proof does not tell us which specific differences occur. Usually this result is phrased by saying there are infinitely many primes whose difference with another prime is $\le 246$, or that there are infinitely often bounded prime gaps of 246 or less. This result was proved in an online Polymath Project [17] in 2014. Previously Yitang Zhang [20] had proved there are infinitely often bounded gaps between primes in 2014, with the bound 70 million. In a different way than Zhang, James Maynard [16] obtained the bound 600 which appeared in a 2015 paper, and Terry Tao also proved similar results at the same time. All of these results are based in part on the GPY method introduced by Goldston, Pintz, and Yıldırım in 2005 [8]. Prior to GPY, it was considered almost impossible that these results would be proven anytime in the forseeable future. If in 2000 you asked mathematicians which was harder, proving bounded gaps between primes or proving the Riemann Hypothesis, it is possible they would have chosen bounded gaps between primes as harder.

These breakthrough results were based on making improvements in the GPY method, but what was the GPY method based on? That is a complicated story, and I will only describe here some of the early steps that led eventually to GPY. This started in the winter/spring of 1999 when Cem and I were at the Mathematical Science Research Institute (MSRI) in Berkeley. Thus you can see that it took about 6 years to work out a proof that now can be presented in about 10 pages to graduate and advanced undergraduate mathematics students. Maynard’s proof which includes GPY as a special case is 15 – 20 pages. Almost none of the mathematics I will describe below is needed any longer in these proofs. Therefore you do not need to pay any attention to the complicated formulas I will include because they are here only to document the work we did at the time.

**MSRI in 1999**

During the 1995-96 academic school year Cem visited me at San Jose State University and we started to work together on problems related to primes. We finished a paper that year and started a second paper. In both of these papers we worked with a certain truncated divisor sum that approximately detects when an integer is a prime. This turned out to be the seed from which GPY grew. Truncated divisor sums of this type were actually well known and had been used for other purposes for a long time, so there was little reason to suspect their use would lead to anything significant in the theory of primes.

In the winter/spring semester of 1999 I was on sabbatical at MSRI in a program concerning random matrices and applications, and Cem managed at the last minute to joint the program, arriving from Turkey in February. I had already been there for a month (commuting 100 miles round trip each day from San Jose) and a few weeks before he arrived I had decided to give up doing research for the forseeable future. This was for entirely rational reasons. Being at MSRI I was surrounded with many top active mathematicians busy proving exciting results. By contrast, I had just spent 6 months failing to prove anything. I was working on a problem, the first step of which was to prove a certain formula. This was fun and I made good progress and in December I finally proved the formula. I then went to the second step in my problem and discovered in two days that my formula was no help at all and I couldn’t solve the problem. With two young children ages 1 and 3 and a third one due later in 1999, I decided I no longer had time to waste on unproductive research. It was time to grow up and pick a safer academic path. Thus I decided to write a book on primes, motivated by the idea that 2 hours of work generated one page of first-draft book.

When Cem arrived I was busy working on my book, and rather than work on our delayed second paper, we would meet and discuss whatever we were thinking about. Then one Monday when I met Cem he told me he thought he could work out the average for the third power of the truncated divisor sum we had used in our papers. In fact, he had spent all weekend trying to do it. So we started looking into this and we ended up working on it for the rest of the term. At first every time we did the calculation a different way we got a different answer, which we both found more hilarious than annoying because it was only a challenge without any applications or importance. Most times on the hour drive home I would think I found that day’s mistake, but there were so many mistakes that nothing helped for awhile. It took a few months before we finally started to get the same answer each way we did the calculation. By June we felt confident we could actually evaluate this third power average for two different short divisor sums used to detect primes, and all the special cases that arise. But we had not actually written any of the proofs down, and had not worked out most of the details.

The term ended and Cem left Berkeley, and I went back to my normal busy life in San Jose. It is worth reminding the reader on the state of knowledge concerning small gaps between primes in 1999. The average gap between a prime $p$ and the next prime is $\log p$, and the best result known then, due to Maier [15] in 1988, was that infinitely often you could find two primes $p$ and $p’$ within $(.2484\ldots) \log p$ or a little smaller than one quarter of the average spacing. This result was obtained by combining all three of the known methods for finding small gaps between primes. At the time the experts saw little hope of making any substantial progress on this problem, which proves the experts can be spectacularly wrong when it comes to making future predictions. Currently the experts are saying the twin prime conjecture, which we are only 244 away from obtaining, is a very hard problem, but they are no longer saying it is an impossibly hard problem.

**Detecting primes with short divisor sums**

Let $\pi(x)$ denote the number of primes $\le x$. We can write this as

$\pi(x) := \sum_{p\le x}1, \qquad (1) $

where we will always use the letter $p$ to denote a prime number. Thus for example $\pi(10)=4$ and $\pi(100) = 25$. The prime number theorem, proved in 1896, asserts that

$ \pi(x) \sim \frac{x}{\log x},\qquad (2)$

where

$ f(x) \sim g(x) \qquad \text{means} \qquad \lim_{x\to \infty} \frac{f(x)}{g(x)} = 1. $

If you work on prime numbers it soon becomes clear that it is better to count primes with the

von Mangoldt function $\Lambda(n)$, defined to be $\log p$ if $n=p^m$, $m\ge 1$ an integer, and zero otherwise. Thus $\Lambda(n) =0$ unless $n$ is a prime or prime power, and the weight $\log p$ lets us count primes as if they are integers. The prime number theorem now becomes

$\psi(x) := \sum_{n\le x }\Lambda(n) \sim x .\qquad (3)$

While the prime powers are also included here, there are at most a constant times $\sqrt{x}$ squares and higher powers $\le x$ which is insignificant in (3). One can now recover the original prime number theorem from (3) by a standard method and one finds \begin{equation} \pi(x) \sim {\rm li}(x) :=\int_2^x\frac{dt}{\log t}, \qquad (4)\end{equation} and using integration by parts it is easy to show this is equivalent to (2). In fact ${\rm li}(x)$ is a far better approximation of $\pi(x)$ than $x/\log x$.

In a first course in number theory we prove the elementary formula, which you might also try to prove yourself, \begin{equation}\Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d}.\qquad (5)\end{equation} Here $d|n$ means “$d$ divides $n$” and the sum runs over the divisors $d$ of $n$. Also $\mu(n)$ is the Möbius function defined by $\mu(1)=1$, $\mu(n) =(-1)^r$ if $n$ is the product of $r$ distinct primes, and $\mu(n) =0$ if $n$ has a square prime factor. This formula has the drawback that it can have an enormous number of terms. For example, consider the 100th primorial $p_{100}\#$ which is the product of the first 100 primes. Then $\Lambda(p_{100}\#)=0$ obviously, but it is hopeless to compute this by (5) since there are $2^{100}$ terms in the sum.

In place of (5), consider the truncated smoothed approximations for $\Lambda(n)$ \begin{equation} \Lambda_R(n) := \sum_{\substack{d|n\\ d\le R}}\mu(d) \log \frac{R}{d} \qquad (6)\end{equation} and \begin{equation} \lambda_R(n) := \sum_{r\le R}\frac{\mu(r)^2}{\phi(r)}\sum_{\substack{d|r\\d|n}}d\mu(d).\qquad (7) \end{equation}

Selberg obtained these type of approximations in his work on mollifiers and sieves. The approximation $\lambda_R(n)$ comes from the circle method and is also by the Selberg sieve the truncated divisor sum which gives the best least square approximation to $\Lambda(n)$, see [18]. On the other hand, $\Lambda_R(n)$ usually obtains asymptotically the same results as $\lambda_R(n)$ and often it is easier to deal with.

The original and most powerful method for finds pairs of primes close together prior to GPY was first developed by Hardy and Littlewood [14] in the 1920’s and required the assumption of the Generalized Riemann Hypothesis (GRH). In 1966 Bombieri and Davenport [1] were able to improve the method and remove GRH. They showed this method by itself succeeds in finding primes at or closer than 1/2 the average spacing. The proof is based on the circle method and also the Bombieri-Vinogradov theorem on the distribution of primes in arithmetic progressions. Later, the approximations (6) or (7) were used in [2] and [3] to replace the circle method with a simple proof based on the second moment for the number of primes in a short interval. The argument goes as follows. Let

\[ \psi(x,h) := \psi(x+h) -\psi(x) = \sum_{x<n\le x+h}\Lambda(n),\]

and approximate this with

\[ \psi_R(x,h) := \psi_R(x+h) -\psi_R(x) = \sum_{x<n\le x+h}\Lambda_R(n). \]

Then

\[ \int_x^{2x}(\psi(y,h) -\psi_R(y,h) )^2 dy \ge 0\]

and we obtain the lower bound

\[ \int_x^{2x}\psi(y,h)^2 dy \ge 2\int_x^{2x}\psi(y,h)\psi_R(y,h) dy – \int_x^{2x}\psi_R(y,h)^2 dy .\]

Multiplying out the sums in the two integrals on the right, we are able to evaluate these integrals asymptotically using the correlation relations, for $1\le R\le x^{1/2}$, \begin{equation} \sum_{n\le x} \Lambda_R(n)^2 \sim N\log R, \qquad \sum_{n\le x} \Lambda_R(n)\Lambda_R(n+k) \sim \mathfrak{S}(k) (x-|k|),\qquad (8)\end{equation}

\begin{equation} \sum_{n\le x}\Lambda(n) \Lambda_R(n) \sim N\log R, \qquad \sum_{n\le x} \Lambda_R(n)\Lambda(n+k) \sim \mathfrak{S}(k) (x-|k|),\qquad (9) \end{equation}

where \begin{equation}\mathfrak{S}(k) = \begin{cases} {\displaystyle 2 C_{2} \prod_{\substack{ p \mid k \\ p > 2}} \!\left(\frac{p – 1}{p – 2}\right)} & \mbox{if $k$ is even, $k\neq 0$}, \\ 0 & \mbox{if $k$ is odd}\end{cases}\qquad (10) \end{equation} and \begin{equation}C_{2} = \prod_{p > 2}\! \left( 1 – \frac{1}{(p – 1)^{2}}\right) = 0.66016\ldots. \qquad (11) \end{equation}

The result of a calculation is that, on taking $R=x^{1/2}$, and $h=\lambda \log x$ for any fixed number $\lambda>0$, \begin{equation}\begin{split} \int_x^{2x}(\pi(y+\lambda \log x)-\pi(y))^2\, dy &\sim \frac{1}{\log^2 x}\int_x^{2x}(\psi(y+\lambda \log x) -\psi(y))^2\, dy \\& \ge (\frac12 \lambda +\lambda^2 – \epsilon) x,\end{split}\end{equation} for any $\epsilon >0$. Now suppose there are no pairs of primes in the interval $[x,2x]$ closer than $\lambda \log x$. Then there can only be zero or one prime in the interval $(y,y+\lambda \log x]$ and $(\pi(y+\lambda \log x)-\pi(y))^2 = \pi(y+\lambda \log x)-\pi(y)$.

An easy calculation shows that \begin{equation} \int_x^{2x} (\pi(y+\lambda \log x)-\pi(y)) \, dy \sim \lambda x.\qquad (13)\end{equation}

From the last two equations we conclude $\lambda \ge \frac12 \lambda +\lambda^2 -\epsilon$ from which we conclude $\lambda \le \frac12+\epsilon$, for any $\epsilon >0$. This is the result obtained by Bombieri and Davenport.

We mention in passing that it is conjectured that \begin{equation} \int_x^{2x} (\pi(y+\lambda \log x)-\pi(y))^2 \, dy \sim (\lambda+\lambda^2) x ,\qquad (14)\end{equation} which is the second moment of a Poisson distribution. It has further been conjectured that the primes are distributed around their average spacing in a Poisson distribution, but even this second moment conjecture will be formidable to prove.

**Working by Email Fall 1999 through 2000**

The motivation for Cem and my work from after the MSRI semester through 2000 was the vague notion that the above second moment argument might be improved if we tried to do a third moment argument. Therefore the first step was to work out triple correlation formulas for \begin{equation} \sum_{n\le x } \Lambda_R(n+k_1)\Lambda_R(n+k_2)\Lambda_R(n+k_3)\qquad (15) \end{equation} with $R\le x^{1/3}$. We would also need similar sums where one or two of the $\Lambda_R$ are replaced with $\Lambda$. Clearly, and we were well aware of it at the time, one could never have two $\Lambda$’s in one of these sums, since that involved evaluating a sum over prime pairs which could not be done. Also the positivity needed would not work for the third moment and therefore we would have to go to the 4th moment, with many sums over prime triples which could not be done. Looking back now I find it hard to understand how we could have wanted to work on this let alone be so enthusiastic about doing it. The only explanation I can think of is the line from the movie Animal House, “I think that this situation absolutely requires a really futile and stupid gesture, and we are just the guys to do it.”

The first thing we did in working on these triple correlation short divisor sums was to double the amount of work by deciding to do it for both $\Lambda_R(n)$ and $\lambda_R(n)$. I worked mainly on $\Lambda_R(n)$ which appeared in our first paper [10] in 2003, and Cem worked mainly on $\lambda_R(n)$ which finally appeared in our second paper [11] in 2007, two years after GPY was done. We initially worked in parallel and when one of us got ahead we could use the work on that divisor sum to help figure out the other divisor sum. But by mid-2000 it became clear that $\lambda_R(n)$ was far more complicated, and we started putting more of our joint effort into finishing a first paper on $\Lambda_R(n)$.

Here is how we evaluated (15) in the simplest but also most challenging case when $k_1=k_2=k_3=0$. First, \[ \sum_{n\le x} \Lambda_R(n)^3 = \sum_{d_1,d_2,d_3\le R} \mu(d_1)\log(R/d_1)\mu(d_2)\log(R/d_2)\mu(d_3)\log(R/d_3)\sum_{\substack{n\le x\\ d_1|n\\ d_2|n\\ d_3|n}}1.\]

The divisibility conditions in the last sum are equivalent to $[d_1,d_2,d_3] | n$ where $[a,b]$ is the least common multiple or lcm of the numbers in the brackets. As you can check \[ \sum_{\substack{ n\le x\\ d|n}}1 = \frac{x}{d} +O(1) ,\] where we use the usual “big oh” notation, and therefore we obtain \[ \sum_{n\le x} \Lambda_R(n)^3 = x \sum_{d_1,d_2,d_3\le R} \frac{\mu(d_1)\log(R/d_1)\mu(d_2)\log(R/d_2)\mu(d_3)\log(R/d_3)}{[d_1,d_2,d_3]} +O(R^3).\]

The lcm term $[d_1,d_2,d_3]$ depends on the common prime factors of $d_1,d_2,d_3$. By the Möbius function the only non-zero terms in the sum are when $d_1$, $d_2$ and $d_3$ are squarefree. Eventually Cem and I realized that the best way to not make mistakes is to decompose these variables into the Venn diagram of their mutual overlapping prime factors. Thus \[ \begin{split} &d_1 = a_1a_{12}a_{13}a_{123} \\ & d_2 = a_2a_{12}a_{23}a_{123} \\& d_3 = a_3a_{13}a_{23}a_{123} \end{split} \] where for example $a_{12}$ is the prime factors in both $d_1$ and $d_2$ but not in $d_3$. Thus we now have 7 variables which are mutually relatively prime to each other, and the sum on the right-hand side above is equal to \[ \sum’_{\substack{ a_1a_{12}a_{13}a_{123}\le R \\ a_2a_{12}a_{23}a_{123} \le R \\ a_3a_{13}a_{23}a_{123} \le R} } \frac{\mu(a_1)\mu(a_2)\mu(a_3)\mu^2(a_{12})\mu^2(a_{13})\mu^2(a_{23})\mu(a_{123}) L_1(R)L_2(R)L_3(R)} {a_1a_2a_3a_{12}a_{13}a_{23}a_{123} }\] where the dash in the summation indicates that all seven variables are relatively prime to each other, and $L_i(R) = \log(R/d_i)$, $i=1,2,3$.

The next step is to sum over each of the 7 variables, retaining the conditions on being relatively prime with the other variables as it is done. This is done first with $a_1$, then $a_2$, then $a_3$ since these variables only occur in one inequality. Then $a_{12}$, $a_{13}$, $a_{23}$ are done, where they occur in two inequalities which thus have to be split into two ranges. Finally $a_{123}$ is done. The final result is that this sum is \[ \sim \frac{3}{4} \log^2R. \]

This is what Cem and I spent the rest of 1999 and the first half of 2000 working on. Writing the 66 page first paper took most of the rest of 2000 and some of 2001. The results we found were surprisingly simple. In place of (8) and (9) we proved, for $1\le R\le x^{1/3}$ and $1\le k, k’ \le R^\epsilon $,

\begin{equation}\begin{split} &\sum_{n\le x} \Lambda_R(n)^3 \sim \frac34 x\log^2 R, \qquad \sum_{n\le x} \Lambda_R(n)^2\Lambda_R(n+k) \sim \mathfrak{S}(k)x \log R \\ & \sum_{n\le x}\Lambda_R(n)\Lambda_R(n+k)\Lambda_R(n+k’) \sim \mathfrak{S}(k,k’)x,\end{split}\qquad (16)\end{equation} and \begin{equation}\begin{split} &\sum_{n\le x}\Lambda(n) \Lambda_R(n+k)^2 \sim \mathfrak{S}(k)x\log R, \\& \sum_{n\le x} \Lambda(n)\Lambda_R(n+k)\Lambda_R(n+k’) \sim \mathfrak{S}(k,k’)x,\end{split}\qquad (17) \end{equation} where $\mathfrak{S}(k,k’)$ is the expected singular series for a triple of primes.

We were both very surprised to find such simple formulas. The existence of asymptotic formulas depends completely on the smoothing in the divisor sums, and we were expecting at best to get formulas with complicated constants depending on convergent series.

**Summer of 2000: First Result on Primes**

We were still thinking in terms of moments, but in the summer of 2000 we found that the formulas we had obtained were enough by themselves to obtain results on primes. Surprisingly at the time, we needed all of the formulas we obtained above except for the one we worked the hardest on – the third power of $\Lambda_R(n)$ where the constant $\frac34$ occurs.

Consider \begin{equation}\mathcal{M}(h,\rho) := \sum_{x<n\le 2x} \left(\psi(n,h)-\rho \log x\right) \left(\psi_R(x,h) – C\log x\right)^2, \qquad (18)\end{equation} where $C$ is a constant we can choose to make this expression as large as possible. Multiplying out and using the formulas (16) and (17), we obtain an expression quadratic in $C$, and on completing the square we find the choice of $C$ that maximizes this quadratic. The result is that, taking $h=\lambda \log x$, and $\lambda <\rho$, \begin{equation}\mathcal{M}(h,\rho)\sim \frac14 \frac{\lambda}{\rho -\lambda} \left(\frac14\rho-\left( \rho-\lambda\right)^2\right) x \log^3 x.\qquad (19)\end{equation}

We now assume $\rho >1$. Then from (18) if there is only zero or one prime in $(n, n+h]$ for all $x<n\le 2x$ then $\psi(n,h) \le \log(2x)<\rho \log x$, and thus $\mathcal{M}(h,\rho)< 0$. (Here the prime powers can be discarded without altering the lower bound.) But by (19), $\mathcal{M}(h,\rho)>0$ for $\lambda$ in the range $\rho/2<\lambda <\rho$. Hence there has to be an $n$ with two primes in $(n, n+\lambda \log x]$, for any $\lambda >1/2$ on letting $\rho \to 1^+$. Thus we have found exactly the same result as Bombieri-Davenport in the previous section, although our proof requires a 66 page paper. We did not appreciate it at the time, but (18) has an important difference from the second moment method. In the second moment method we seek an approximation that is as close as possible to the prime counting function, but in (18) we seek a weight function that detects primes in the prime counting function and is directly aimed at finding primes in short intervals. This idea carries directly over into GPY and later work.

At this point Cem and I knew that we were actually able to do something new, but we had no expectation it would lead to anything close to the results later obtained in GPY, let alone Zhang or Maynard and Tao. But we also had a secret weapon we didn’t know about, and I would like to conclude this look back on Cem and my work by describing this.

**Epilogue**

By the start of 2001 we had a plan for finally improving on Bombieri and Davenport’s method. The first step was to obtain correlation formulas for products of not just 3 but $k$ $\Lambda_R(n)$’s with shifts. We were sure all the formulas would follow the same pattern as in (16) and (17) except that we did not know what would replace the number $3/4$ in the case of the third power power. And our method was hopelessly complicated and even computed the 4-correlations would be a formidable task.

That is where the secret weapon intervened. Among your peers and other people working in your field there is a lot of expertise available to help you with your problems, but it is up to you to describe exactly what it is you are trying to do, and reduce it to some very specific questions that do not require someone to read, for example, a 66 page paper. At conferences, workshops, and meetings we would describe what we were doing, and almost every problem we could not solve was solved, sometimes with a very simple one or two sentence suggestion. There were far to many people who were helpful to give everyone the credit they deserve, but I will mention a few examples. First, at an American Institute of Mathematics (AIM) workshop, after computing the correlation sums as above, Peter Sarnak shouted out, “Why not write your sum as a contour integral and compute the residue?” to which I could only reply: We didn’t think of that.” That suggestion in short order allowed us to prove our correlation formulas with one exception. The exception was what we called the correlation coefficients, which are the constants $C(k)$ in the formula \[ \sum_{n\le x} \Lambda_R(n)^k \sim C(k) x\log^{k-1}R .\]

We knew already $C(1)=1$, $C(2)=1$, and $C(3)=3/4$. For this problem computing residues becomes incredibly complicated as more residues are constantly splitting off at each step in the argument. David Farmer using Mathematica found that $C(4)=3/4$, $C(5)=11065/16384 =.6735\ldots$ and $C(6)=11460578803/2^{34}=.66709\ldots$. No further values have ever been computed, but Soundararajan proved that the $C(k)$ grow faster than exponentially to infinity as $k\to \infty$, which ruins any use we may have for them as soon as $k$ gets large. By early 2002 we had used Mathematica and the 6 correlation coefficients above in a 7th correlations argument similar to what we used with (18) to obtained two primes within $.38$ times the average spacing. At this point the argument seemed to be reaching its limit, but we were happy with this progress and determined to push on.

The next development was Cem and I thought we had obtained a new type of prime tuple approximation divisor sum, but Granville and Soundararajan discovered this divisor sum was not smooth enough for asymptotic formulas to exist. We received a certain amount of publicity over this “thrill of victory and agony of defeat” story, but the less said about that here the better. On the positive side, many mathematicians studied this problem, and we received a lot of help reformulating the method, especially from Enrico Bombieri, Andrew Granville, and Kannan Soundararajan. This allowed us to completely solve the remaining problems with our approach started in 1999 and avoid the correlation coefficients. The result is in the paper [12] where we obtained two primes within $1/4$ of the average spacing. The viewpoint changed from moments to approximating tuples of primes. While working on this we started to find better approximations for tuples not made from products of $\Lambda_R(n)$ and the path to GPY followed. We will not touch on that here, but the interested reader can find details in [9], [18], [19].

Prime tuples are a generalization of twin primes where we consider the tuple or vector \begin{equation}(n+h_1, n+h_2, \ldots , n+h_k)\qquad (20)\end{equation} with the shifts $h_i$ given by $\mathcal{H}= \{h_1,h_2,\ldots , h_k\}.$

Letting $n=1,2,3,\ldots \ $ we ask how often all of the components of the tuple are simultaneously prime for $n\le x$, and we denote this number by $\pi(x,\mathcal{H})$. Thus for example twin primes correspond to $\mathcal{H}=\{0,2\}$ with the tuple $(n,n+2)$. On the other hand the tuples $(n,n+1)$ is only made up of primes when $n=2$ since at least one of these numbers is even. Similarly $(n,n+2,n+4)$ is only made up of primes when $n=3$ since at least one of these numbers is divisible by 3. Tuples which do not always have a component divisible by some integer $\ge 2$ are called \emph{admissible}, and for these we expect that infinitely often all their components will simultaneously be primes. This is called the Hardy-Littlewood prime tuple conjecture.

Hardy and Littlewood also made a more precise conjecture [14]. Let $\nu_p(\mathcal{H})$ denote the number of distinct residue classes$\pmod p$ the numbers $h\in \mathcal{H}$ fall into. Let \begin{equation}\mathfrak{S}(\mathcal{H}) = \prod_p\left(1-\frac{1}{p}\right)^{-k}\left(1 -\frac{\nu_p(\mathcal{H})}{p}\right).\qquad (21)\end{equation}

If $\mathfrak{S}(\mathcal{H})\neq 0$ then $\mathcal{H}$ is admissible.

**Prime Tuple Conjecture** If $\mathcal{H}$ is admissible, then \begin{equation} \pi(x,\mathcal{H}) \sim \mathfrak{S}(\mathcal{H}) \text{li}_k(x), \ \ \text{where} \ \ \text{li}_k(x) = \int_2^x \frac{1}{(\log t)^k}\, dt.\qquad (22)\end{equation}

Hardy and Littlewood also formulated the prime tuple conjecture in terms of the von Mangoldt function by defining \begin{equation} \Lambda(n;\mathcal{H}) := \Lambda(n+h_1)\Lambda(n+h_2)\cdots \Lambda(n+h_k), \qquad (23)\end{equation} (so this will be zero if any of the $\Lambda(n+h_i)$ is zero), and then equivalently to (22) conjectured \begin{equation} \sum_{n\le x}\Lambda(n;\mathcal{H}) \sim \mathfrak{S}(\mathcal{H})x. \qquad (24)\end{equation}

We now approximate $\Lambda(n;\mathcal{H})$ by \begin{equation}\Lambda_R(n;\mathcal{H}):= \Lambda_R(n+h_1)\Lambda_R(n+h_2)\cdots \Lambda_R(n+h_k). \qquad (25)\end{equation}

To detect primes we need non-negative approximations and so we square this approximation, and the key formulas we need to compute are \begin{equation}\sum_{n\le x}\Lambda_R(n;\mathcal{H})^2 \quad \text{and} \quad \sum_{n\le x}\Lambda(n+h_0)\Lambda_R(n;\mathcal{H})^2. \qquad (26)\end{equation}

For the second sum, the single factor of $\Lambda(n+h_0)$ really is detecting primes in the tuple, since we find that if $h_0\in \mathcal{H}$ we get a result larger by a factor of $\log R$ than what we get when $h_0 \not \in \mathcal{H}$. The second sum is evaluated by summing $\Lambda(n+h_0)$ through arithmetic progressions modulo products of divisors from $\Lambda_R(n;\mathcal{H})^2$. Notice all the $\Lambda_R$’s are all only second powers, and thus the complicated correlation coefficients no longer occur. To obtain the $1/4$ result still requires considerable effort, especially in the optimization argument.

The main disadvantage of this approximation is that it is formed from many short divisor sums multiplied together; in (16) there are $2k$ of them, which forces $1\le R\le N^{1/2k}$ to control the error term in evaluating the first sum, while for the second sum we require $1\le R\le N^{1/4k- \epsilon}$ This reduces the quality of the approximation. It is natural to think that if we could approximate the number of primes in a tuple by a single divisor sum then we could take the approximation much longer and obtain a better result. As it turns out, sieve methods are based exactly on this idea, and at this point the correct method for finding small gaps between primes finally has started to emerged. Even with this, the approximations require an additional idea, which Janos Pintz discovered and then the three of us worked together to develop the GPY method.

I would like to mention a few people involved in the development of GPY. Andrew Granville and Kannan Soundarajan made a number of significant contributions to the discovery and development of GPY over many years. Yoichi Motohashi substantially simplified the initial proof [7]. Sid Graham, who worked with us as GPY was developed, was mainly responsible for obtaining a Selberg sieve proof of the method, as well as developing GPY for use in studying EK numbers, where an EK number is an integer with exactly K distinct prime factors. One notable result proved in [5] is that every admissible 3-tuple contains two E2 numbers infinitely often. See also [4],[6].

For readers interested in diving into this subject I highly recommend reading Andrew Granville’s survey paper [13] which provides an abundance of insights and technical details on the work of Zhang and of Maynard and Tao.

## References

[1] E. Bombieri and H. Davenport, *Small differences between prime numbers*, Proc. Roy. Soc. Ser. A, **293** (1966), 1–18.

[2] D. A. Goldston, *On Bombieri and Davenport’s theorem concerning small gaps between primes*, Mathematika **39** (1992), 10–17.

[3] D. A. Goldston, *A lower bound for the second moment of primes in short intervals*, Expo Math. **13** (1995), 366–376.

[4] D. A. Goldston, S. W.Graham, J. Pintz, and C.Y. Yıldırım, *Small gaps between primes or almost primes*, Transactions of the AMS, **361** No. 10, (2009), 5285–5330.

[5] D. A. Goldston, S. W. Graham, J. Pintz, and C.Y. Yıldırım, *Small gaps between products of two primes*, Proc. London Math. Soc. (3) **98** (2009), 741–774.

[6] D. A. Goldston, S. W. Graham, J. Pintz, and C.Y. Yıldırım, *Small Gaps Between Almost Primes, the Parity Problem, and Some Conjectures of Erdös on Consecutive Integers*, International Mathematics Research Notices,** 7** (2011), 1439–1450.

[7] D. A. Goldston, Y. Motohashi, J. Pintz, and C.Y. Yıldırım, *Small gaps between primes exist*, Proc. Japan Acad., **82A** (2006), 61–65.

[8] D. A. Goldston, J. Pintz, and C.Y. Yıldırım, *Primes in Tuples I*, Annals of Mathematics (2), **170** (2009), 819–862.

[9] D. A. Goldston, J. Pintz, and C.Y. Yıldırım, *The path to recent progress on small gaps between primes*, Clay Mathematics Proceedings, Volume 7, 2007, 129–139.

[10] D. A. Goldston and C.Y. Yıldırım, *Higher correlations of divisor sums related to primes: I: Triple correlations*, Integers **3** (2003), A5, 66pp. (electronic).

[11] D. A. Goldston and C.Y. Yıldırım, *Higher correlation of short divisor sums II: Triple correlations: Variations of the error term in the prime number theorem*, Proc. London Math. Soc. (3) **95** (2007), 199–247.

[12] D. A. Goldston and C.Y. Yıldırım, \emph{ Higher correlation of short divisor sums III: Gaps between primes}, Proc. London Math. Soc. (3) **95** (2007), 653–686.

[13] A. Granville, *Primes in intervals of bounded length*, Bull. Amer. Math. Soc. (N.S.), **52** (2015), 171–222.

[14] G. H. Hardy and J. E. Littlewood, *Some problems of `Partitio Numerorum’: III On the expression of a number as a sum of primes*}, Acta Math. **44** (1923), 1–70.

[15] H. Maier, *Small differences between prime numbers*, Michigan Math. J., **35** (3) (1988), 323-344.

[16] J. Maynard, *Small gaps between primes*, Annals of Mathematics (2), **181**, 383–413.

[17] D. H. J. Polymath, *Variants of the Selberg sieve, and bounded intervals containing many primes*, Res. Math. Sci. **1** (2014), Art. 12.

[18] A. Selberg, *On an elementary method in the theory of primes,* Norske Vid. Selsk. Forh., Trondhjem **19** (1947) 64–67.

[19] K. Soundararajan, *Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım*}, Bull. Amer. Math. Soc. **44**, Number 1, January 2007, 1–18.

[20] Y. Zhang, *Bounded gaps between primes*, Annals of Mathematics (2), **179** (2014), 1121–1174.

**Our current state of knowledge**

It is now known and proven that there are always primes differing by 246 or less no matter how far out you go in the integers. We expect that this is true for every even difference $2, \ 4, \ 6, \ \ldots$, including the twin prime conjecture for primes with difference 2, but the proof does not tell us which specific differences occur. Usually this result is phrased by saying there are infinitely many primes whose difference with another prime is $\le 246$, or that there are infinitely often bounded prime gaps of 246 or less. This result was proved in an online Polymath Project [17] in 2014. Previously Yitang Zhang [20] had proved there are infinitely often bounded gaps between primes in 2014, with the bound 70 million. In a different way than Zhang, James Maynard [16] obtained the bound 600 which appeared in a 2015 paper, and Terry Tao also proved similar results at the same time. All of these results are based in part on the GPY method introduced by Goldston, Pintz, and Yıldırım in 2005 [8]. Prior to GPY, it was considered almost impossible that these results would be proven anytime in the forseeable future. If in 2000 you asked mathematicians which was harder, proving bounded gaps between primes or proving the Riemann Hypothesis, it is possible they would have chosen bounded gaps between primes as harder.

These breakthrough results were based on making improvements in the GPY method, but what was the GPY method based on? That is a complicated story, and I will only describe here some of the early steps that led eventually to GPY. This started in the winter/spring of 1999 when Cem and I were at the Mathematical Science Research Institute (MSRI) in Berkeley. Thus you can see that it took about 6 years to work out a proof that now can be presented in about 10 pages to graduate and advanced undergraduate mathematics students. Maynard’s proof which includes GPY as a special case is 15 – 20 pages. Almost none of the mathematics I will describe below is needed any longer in these proofs. Therefore you do not need to pay any attention to the complicated formulas I will include because they are here only to document the work we did at the time.

**MSRI in 1999**

During the 1995-96 academic school year Cem visited me at San Jose State University and we started to work together on problems related to primes. We finished a paper that year and started a second paper. In both of these papers we worked with a certain truncated divisor sum that approximately detects when an integer is a prime. This turned out to be the seed from which GPY grew. Truncated divisor sums of this type were actually well known and had been used for other purposes for a long time, so there was little reason to suspect their use would lead to anything significant in the theory of primes.

In the winter/spring semester of 1999 I was on sabbatical at MSRI in a program concerning random matrices and applications, and Cem managed at the last minute to joint the program, arriving from Turkey in February. I had already been there for a month (commuting 100 miles round trip each day from San Jose) and a few weeks before he arrived I had decided to give up doing research for the forseeable future. This was for entirely rational reasons. Being at MSRI I was surrounded with many top active mathematicians busy proving exciting results. By contrast, I had just spent 6 months failing to prove anything. I was working on a problem, the first step of which was to prove a certain formula. This was fun and I made good progress and in December I finally proved the formula. I then went to the second step in my problem and discovered in two days that my formula was no help at all and I couldn’t solve the problem. With two young children ages 1 and 3 and a third one due later in 1999, I decided I no longer had time to waste on unproductive research. It was time to grow up and pick a safer academic path. Thus I decided to write a book on primes, motivated by the idea that 2 hours of work generated one page of first-draft book.

When Cem arrived I was busy working on my book, and rather than work on our delayed second paper, we would meet and discuss whatever we were thinking about. Then one Monday when I met Cem he told me he thought he could work out the average for the third power of the truncated divisor sum we had used in our papers. In fact, he had spent all weekend trying to do it. So we started looking into this and we ended up working on it for the rest of the term. At first every time we did the calculation a different way we got a different answer, which we both found more hilarious than annoying because it was only a challenge without any applications or importance. Most times on the hour drive home I would think I found that day’s mistake, but there were so many mistakes that nothing helped for awhile. It took a few months before we finally started to get the same answer each way we did the calculation. By June we felt confident we could actually evaluate this third power average for two different short divisor sums used to detect primes, and all the special cases that arise. But we had not actually written any of the proofs down, and had not worked out most of the details.

The term ended and Cem left Berkeley, and I went back to my normal busy life in San Jose. It is worth reminding the reader on the state of knowledge concerning small gaps between primes in 1999. The average gap between a prime $p$ and the next prime is $\log p$, and the best result known then, due to Maier [15] in 1988, was that infinitely often you could find two primes $p$ and $p’$ within $(.2484\ldots) \log p$ or a little smaller than one quarter of the average spacing. This result was obtained by combining all three of the known methods for finding small gaps between primes. At the time the experts saw little hope of making any substantial progress on this problem, which proves the experts can be spectacularly wrong when it comes to making future predictions. Currently the experts are saying the twin prime conjecture, which we are only 244 away from obtaining, is a very hard problem, but they are no longer saying it is an impossibly hard problem.

**Detecting primes with short divisor sums**

Let $\pi(x)$ denote the number of primes $\le x$. We can write this as

$\pi(x) := \sum_{p\le x}1, \qquad (1) $

where we will always use the letter $p$ to denote a prime number. Thus for example $\pi(10)=4$ and $\pi(100) = 25$. The prime number theorem, proved in 1896, asserts that

$ \pi(x) \sim \frac{x}{\log x},\qquad (2)$

where

$ f(x) \sim g(x) \qquad \text{means} \qquad \lim_{x\to \infty} \frac{f(x)}{g(x)} = 1. $

If you work on prime numbers it soon becomes clear that it is better to count primes with the

von Mangoldt function $\Lambda(n)$, defined to be $\log p$ if $n=p^m$, $m\ge 1$ an integer, and zero otherwise. Thus $\Lambda(n) =0$ unless $n$ is a prime or prime power, and the weight $\log p$ lets us count primes as if they are integers. The prime number theorem now becomes

$\psi(x) := \sum_{n\le x }\Lambda(n) \sim x .\qquad (3)$

While the prime powers are also included here, there are at most a constant times $\sqrt{x}$ squares and higher powers $\le x$ which is insignificant in (3). One can now recover the original prime number theorem from (3) by a standard method and one finds \begin{equation} \pi(x) \sim {\rm li}(x) :=\int_2^x\frac{dt}{\log t}, \qquad (4)\end{equation} and using integration by parts it is easy to show this is equivalent to (2). In fact ${\rm li}(x)$ is a far better approximation of $\pi(x)$ than $x/\log x$.

In a first course in number theory we prove the elementary formula, which you might also try to prove yourself, \begin{equation}\Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d}.\qquad (5)\end{equation} Here $d|n$ means “$d$ divides $n$” and the sum runs over the divisors $d$ of $n$. Also $\mu(n)$ is the Möbius function defined by $\mu(1)=1$, $\mu(n) =(-1)^r$ if $n$ is the product of $r$ distinct primes, and $\mu(n) =0$ if $n$ has a square prime factor. This formula has the drawback that it can have an enormous number of terms. For example, consider the 100th primorial $p_{100}\#$ which is the product of the first 100 primes. Then $\Lambda(p_{100}\#)=0$ obviously, but it is hopeless to compute this by (5) since there are $2^{100}$ terms in the sum.

In place of (5), consider the truncated smoothed approximations for $\Lambda(n)$ \begin{equation} \Lambda_R(n) := \sum_{\substack{d|n\\ d\le R}}\mu(d) \log \frac{R}{d} \qquad (6)\end{equation} and \begin{equation} \lambda_R(n) := \sum_{r\le R}\frac{\mu(r)^2}{\phi(r)}\sum_{\substack{d|r\\d|n}}d\mu(d).\qquad (7) \end{equation}

Selberg obtained these type of approximations in his work on mollifiers and sieves. The approximation $\lambda_R(n)$ comes from the circle method and is also by the Selberg sieve the truncated divisor sum which gives the best least square approximation to $\Lambda(n)$, see [18]. On the other hand, $\Lambda_R(n)$ usually obtains asymptotically the same results as $\lambda_R(n)$ and often it is easier to deal with.

The original and most powerful method for finds pairs of primes close together prior to GPY was first developed by Hardy and Littlewood [14] in the 1920’s and required the assumption of the Generalized Riemann Hypothesis (GRH). In 1966 Bombieri and Davenport [1] were able to improve the method and remove GRH. They showed this method by itself succeeds in finding primes at or closer than 1/2 the average spacing. The proof is based on the circle method and also the Bombieri-Vinogradov theorem on the distribution of primes in arithmetic progressions. Later, the approximations (6) or (7) were used in [2] and [3] to replace the circle method with a simple proof based on the second moment for the number of primes in a short interval. The argument goes as follows. Let

\[ \psi(x,h) := \psi(x+h) -\psi(x) = \sum_{x<n\le x+h}\Lambda(n),\]

and approximate this with

\[ \psi_R(x,h) := \psi_R(x+h) -\psi_R(x) = \sum_{x<n\le x+h}\Lambda_R(n). \]

Then

\[ \int_x^{2x}(\psi(y,h) -\psi_R(y,h) )^2 dy \ge 0\]

and we obtain the lower bound

\[ \int_x^{2x}\psi(y,h)^2 dy \ge 2\int_x^{2x}\psi(y,h)\psi_R(y,h) dy – \int_x^{2x}\psi_R(y,h)^2 dy .\]

Multiplying out the sums in the two integrals on the right, we are able to evaluate these integrals asymptotically using the correlation relations, for $1\le R\le x^{1/2}$, \begin{equation} \sum_{n\le x} \Lambda_R(n)^2 \sim N\log R, \qquad \sum_{n\le x} \Lambda_R(n)\Lambda_R(n+k) \sim \mathfrak{S}(k) (x-|k|),\qquad (8)\end{equation}

\begin{equation} \sum_{n\le x}\Lambda(n) \Lambda_R(n) \sim N\log R, \qquad \sum_{n\le x} \Lambda_R(n)\Lambda(n+k) \sim \mathfrak{S}(k) (x-|k|),\qquad (9) \end{equation}

where \begin{equation}\mathfrak{S}(k) = \begin{cases} {\displaystyle 2 C_{2} \prod_{\substack{ p \mid k \\ p > 2}} \!\left(\frac{p – 1}{p – 2}\right)} & \mbox{if $k$ is even, $k\neq 0$}, \\ 0 & \mbox{if $k$ is odd}\end{cases}\qquad (10) \end{equation} and \begin{equation}C_{2} = \prod_{p > 2}\! \left( 1 – \frac{1}{(p – 1)^{2}}\right) = 0.66016\ldots. \qquad (11) \end{equation}

The result of a calculation is that, on taking $R=x^{1/2}$, and $h=\lambda \log x$ for any fixed number $\lambda>0$, \begin{equation}\begin{split} \int_x^{2x}(\pi(y+\lambda \log x)-\pi(y))^2\, dy &\sim \frac{1}{\log^2 x}\int_x^{2x}(\psi(y+\lambda \log x) -\psi(y))^2\, dy \\& \ge (\frac12 \lambda +\lambda^2 – \epsilon) x,\end{split}\end{equation} for any $\epsilon >0$. Now suppose there are no pairs of primes in the interval $[x,2x]$ closer than $\lambda \log x$. Then there can only be zero or one prime in the interval $(y,y+\lambda \log x]$ and $(\pi(y+\lambda \log x)-\pi(y))^2 = \pi(y+\lambda \log x)-\pi(y)$.

An easy calculation shows that \begin{equation} \int_x^{2x} (\pi(y+\lambda \log x)-\pi(y)) \, dy \sim \lambda x.\qquad (13)\end{equation}

From the last two equations we conclude $\lambda \ge \frac12 \lambda +\lambda^2 -\epsilon$ from which we conclude $\lambda \le \frac12+\epsilon$, for any $\epsilon >0$. This is the result obtained by Bombieri and Davenport.

We mention in passing that it is conjectured that \begin{equation} \int_x^{2x} (\pi(y+\lambda \log x)-\pi(y))^2 \, dy \sim (\lambda+\lambda^2) x ,\qquad (14)\end{equation} which is the second moment of a Poisson distribution. It has further been conjectured that the primes are distributed around their average spacing in a Poisson distribution, but even this second moment conjecture will be formidable to prove.

**Working by Email Fall 1999 through 2000**

The motivation for Cem and my work from after the MSRI semester through 2000 was the vague notion that the above second moment argument might be improved if we tried to do a third moment argument. Therefore the first step was to work out triple correlation formulas for \begin{equation} \sum_{n\le x } \Lambda_R(n+k_1)\Lambda_R(n+k_2)\Lambda_R(n+k_3)\qquad (15) \end{equation} with $R\le x^{1/3}$. We would also need similar sums where one or two of the $\Lambda_R$ are replaced with $\Lambda$. Clearly, and we were well aware of it at the time, one could never have two $\Lambda$’s in one of these sums, since that involved evaluating a sum over prime pairs which could not be done. Also the positivity needed would not work for the third moment and therefore we would have to go to the 4th moment, with many sums over prime triples which could not be done. Looking back now I find it hard to understand how we could have wanted to work on this let alone be so enthusiastic about doing it. The only explanation I can think of is the line from the movie Animal House, “I think that this situation absolutely requires a really futile and stupid gesture, and we are just the guys to do it.”

The first thing we did in working on these triple correlation short divisor sums was to double the amount of work by deciding to do it for both $\Lambda_R(n)$ and $\lambda_R(n)$. I worked mainly on $\Lambda_R(n)$ which appeared in our first paper [10] in 2003, and Cem worked mainly on $\lambda_R(n)$ which finally appeared in our second paper [11] in 2007, two years after GPY was done. We initially worked in parallel and when one of us got ahead we could use the work on that divisor sum to help figure out the other divisor sum. But by mid-2000 it became clear that $\lambda_R(n)$ was far more complicated, and we started putting more of our joint effort into finishing a first paper on $\Lambda_R(n)$.

Here is how we evaluated (15) in the simplest but also most challenging case when $k_1=k_2=k_3=0$. First, \[ \sum_{n\le x} \Lambda_R(n)^3 = \sum_{d_1,d_2,d_3\le R} \mu(d_1)\log(R/d_1)\mu(d_2)\log(R/d_2)\mu(d_3)\log(R/d_3)\sum_{\substack{n\le x\\ d_1|n\\ d_2|n\\ d_3|n}}1.\]

The divisibility conditions in the last sum are equivalent to $[d_1,d_2,d_3] | n$ where $[a,b]$ is the least common multiple or lcm of the numbers in the brackets. As you can check \[ \sum_{\substack{ n\le x\\ d|n}}1 = \frac{x}{d} +O(1) ,\] where we use the usual “big oh” notation, and therefore we obtain \[ \sum_{n\le x} \Lambda_R(n)^3 = x \sum_{d_1,d_2,d_3\le R} \frac{\mu(d_1)\log(R/d_1)\mu(d_2)\log(R/d_2)\mu(d_3)\log(R/d_3)}{[d_1,d_2,d_3]} +O(R^3).\]

The lcm term $[d_1,d_2,d_3]$ depends on the common prime factors of $d_1,d_2,d_3$. By the Möbius function the only non-zero terms in the sum are when $d_1$, $d_2$ and $d_3$ are squarefree. Eventually Cem and I realized that the best way to not make mistakes is to decompose these variables into the Venn diagram of their mutual overlapping prime factors. Thus \[ \begin{split} &d_1 = a_1a_{12}a_{13}a_{123} \\ & d_2 = a_2a_{12}a_{23}a_{123} \\& d_3 = a_3a_{13}a_{23}a_{123} \end{split} \] where for example $a_{12}$ is the prime factors in both $d_1$ and $d_2$ but not in $d_3$. Thus we now have 7 variables which are mutually relatively prime to each other, and the sum on the right-hand side above is equal to \[ \sum’_{\substack{ a_1a_{12}a_{13}a_{123}\le R \\ a_2a_{12}a_{23}a_{123} \le R \\ a_3a_{13}a_{23}a_{123} \le R} } \frac{\mu(a_1)\mu(a_2)\mu(a_3)\mu^2(a_{12})\mu^2(a_{13})\mu^2(a_{23})\mu(a_{123}) L_1(R)L_2(R)L_3(R)} {a_1a_2a_3a_{12}a_{13}a_{23}a_{123} }\] where the dash in the summation indicates that all seven variables are relatively prime to each other, and $L_i(R) = \log(R/d_i)$, $i=1,2,3$.

The next step is to sum over each of the 7 variables, retaining the conditions on being relatively prime with the other variables as it is done. This is done first with $a_1$, then $a_2$, then $a_3$ since these variables only occur in one inequality. Then $a_{12}$, $a_{13}$, $a_{23}$ are done, where they occur in two inequalities which thus have to be split into two ranges. Finally $a_{123}$ is done. The final result is that this sum is \[ \sim \frac{3}{4} \log^2R. \]

This is what Cem and I spent the rest of 1999 and the first half of 2000 working on. Writing the 66 page first paper took most of the rest of 2000 and some of 2001. The results we found were surprisingly simple. In place of (8) and (9) we proved, for $1\le R\le x^{1/3}$ and $1\le k, k’ \le R^\epsilon $,

\begin{equation}\begin{split} &\sum_{n\le x} \Lambda_R(n)^3 \sim \frac34 x\log^2 R, \qquad \sum_{n\le x} \Lambda_R(n)^2\Lambda_R(n+k) \sim \mathfrak{S}(k)x \log R \\ & \sum_{n\le x}\Lambda_R(n)\Lambda_R(n+k)\Lambda_R(n+k’) \sim \mathfrak{S}(k,k’)x,\end{split}\qquad (16)\end{equation} and \begin{equation}\begin{split} &\sum_{n\le x}\Lambda(n) \Lambda_R(n+k)^2 \sim \mathfrak{S}(k)x\log R, \\& \sum_{n\le x} \Lambda(n)\Lambda_R(n+k)\Lambda_R(n+k’) \sim \mathfrak{S}(k,k’)x,\end{split}\qquad (17) \end{equation} where $\mathfrak{S}(k,k’)$ is the expected singular series for a triple of primes.

We were both very surprised to find such simple formulas. The existence of asymptotic formulas depends completely on the smoothing in the divisor sums, and we were expecting at best to get formulas with complicated constants depending on convergent series.

**Summer of 2000: First Result on Primes**

We were still thinking in terms of moments, but in the summer of 2000 we found that the formulas we had obtained were enough by themselves to obtain results on primes. Surprisingly at the time, we needed all of the formulas we obtained above except for the one we worked the hardest on – the third power of $\Lambda_R(n)$ where the constant $\frac34$ occurs.

Consider \begin{equation}\mathcal{M}(h,\rho) := \sum_{x<n\le 2x} \left(\psi(n,h)-\rho \log x\right) \left(\psi_R(x,h) – C\log x\right)^2, \qquad (18)\end{equation} where $C$ is a constant we can choose to make this expression as large as possible. Multiplying out and using the formulas (16) and (17), we obtain an expression quadratic in $C$, and on completing the square we find the choice of $C$ that maximizes this quadratic. The result is that, taking $h=\lambda \log x$, and $\lambda <\rho$, \begin{equation}\mathcal{M}(h,\rho)\sim \frac14 \frac{\lambda}{\rho -\lambda} \left(\frac14\rho-\left( \rho-\lambda\right)^2\right) x \log^3 x.\qquad (19)\end{equation}

We now assume $\rho >1$. Then from (18) if there is only zero or one prime in $(n, n+h]$ for all $x<n\le 2x$ then $\psi(n,h) \le \log(2x)<\rho \log x$, and thus $\mathcal{M}(h,\rho)< 0$. (Here the prime powers can be discarded without altering the lower bound.) But by (19), $\mathcal{M}(h,\rho)>0$ for $\lambda$ in the range $\rho/2<\lambda <\rho$. Hence there has to be an $n$ with two primes in $(n, n+\lambda \log x]$, for any $\lambda >1/2$ on letting $\rho \to 1^+$. Thus we have found exactly the same result as Bombieri-Davenport in the previous section, although our proof requires a 66 page paper. We did not appreciate it at the time, but (18) has an important difference from the second moment method. In the second moment method we seek an approximation that is as close as possible to the prime counting function, but in (18) we seek a weight function that detects primes in the prime counting function and is directly aimed at finding primes in short intervals. This idea carries directly over into GPY and later work.

At this point Cem and I knew that we were actually able to do something new, but we had no expectation it would lead to anything close to the results later obtained in GPY, let alone Zhang or Maynard and Tao. But we also had a secret weapon we didn’t know about, and I would like to conclude this look back on Cem and my work by describing this.

**Epilogue**

By the start of 2001 we had a plan for finally improving on Bombieri and Davenport’s method. The first step was to obtain correlation formulas for products of not just 3 but $k$ $\Lambda_R(n)$’s with shifts. We were sure all the formulas would follow the same pattern as in (16) and (17) except that we did not know what would replace the number $3/4$ in the case of the third power power. And our method was hopelessly complicated and even computed the 4-correlations would be a formidable task.

That is where the secret weapon intervened. Among your peers and other people working in your field there is a lot of expertise available to help you with your problems, but it is up to you to describe exactly what it is you are trying to do, and reduce it to some very specific questions that do not require someone to read, for example, a 66 page paper. At conferences, workshops, and meetings we would describe what we were doing, and almost every problem we could not solve was solved, sometimes with a very simple one or two sentence suggestion. There were far to many people who were helpful to give everyone the credit they deserve, but I will mention a few examples. First, at an American Institute of Mathematics (AIM) workshop, after computing the correlation sums as above, Peter Sarnak shouted out, “Why not write your sum as a contour integral and compute the residue?” to which I could only reply: We didn’t think of that.” That suggestion in short order allowed us to prove our correlation formulas with one exception. The exception was what we called the correlation coefficients, which are the constants $C(k)$ in the formula \[ \sum_{n\le x} \Lambda_R(n)^k \sim C(k) x\log^{k-1}R .\]

We knew already $C(1)=1$, $C(2)=1$, and $C(3)=3/4$. For this problem computing residues becomes incredibly complicated as more residues are constantly splitting off at each step in the argument. David Farmer using Mathematica found that $C(4)=3/4$, $C(5)=11065/16384 =.6735\ldots$ and $C(6)=11460578803/2^{34}=.66709\ldots$. No further values have ever been computed, but Soundararajan proved that the $C(k)$ grow faster than exponentially to infinity as $k\to \infty$, which ruins any use we may have for them as soon as $k$ gets large. By early 2002 we had used Mathematica and the 6 correlation coefficients above in a 7th correlations argument similar to what we used with (18) to obtained two primes within $.38$ times the average spacing. At this point the argument seemed to be reaching its limit, but we were happy with this progress and determined to push on.

The next development was Cem and I thought we had obtained a new type of prime tuple approximation divisor sum, but Granville and Soundararajan discovered this divisor sum was not smooth enough for asymptotic formulas to exist. We received a certain amount of publicity over this “thrill of victory and agony of defeat” story, but the less said about that here the better. On the positive side, many mathematicians studied this problem, and we received a lot of help reformulating the method, especially from Enrico Bombieri, Andrew Granville, and Kannan Soundararajan. This allowed us to completely solve the remaining problems with our approach started in 1999 and avoid the correlation coefficients. The result is in the paper [12] where we obtained two primes within $1/4$ of the average spacing. The viewpoint changed from moments to approximating tuples of primes. While working on this we started to find better approximations for tuples not made from products of $\Lambda_R(n)$ and the path to GPY followed. We will not touch on that here, but the interested reader can find details in [9], [18], [19].

Prime tuples are a generalization of twin primes where we consider the tuple or vector \begin{equation}(n+h_1, n+h_2, \ldots , n+h_k)\qquad (20)\end{equation} with the shifts $h_i$ given by $\mathcal{H}= \{h_1,h_2,\ldots , h_k\}.$

Letting $n=1,2,3,\ldots \ $ we ask how often all of the components of the tuple are simultaneously prime for $n\le x$, and we denote this number by $\pi(x,\mathcal{H})$. Thus for example twin primes correspond to $\mathcal{H}=\{0,2\}$ with the tuple $(n,n+2)$. On the other hand the tuples $(n,n+1)$ is only made up of primes when $n=2$ since at least one of these numbers is even. Similarly $(n,n+2,n+4)$ is only made up of primes when $n=3$ since at least one of these numbers is divisible by 3. Tuples which do not always have a component divisible by some integer $\ge 2$ are called \emph{admissible}, and for these we expect that infinitely often all their components will simultaneously be primes. This is called the Hardy-Littlewood prime tuple conjecture.

Hardy and Littlewood also made a more precise conjecture [14]. Let $\nu_p(\mathcal{H})$ denote the number of distinct residue classes$\pmod p$ the numbers $h\in \mathcal{H}$ fall into. Let \begin{equation}\mathfrak{S}(\mathcal{H}) = \prod_p\left(1-\frac{1}{p}\right)^{-k}\left(1 -\frac{\nu_p(\mathcal{H})}{p}\right).\qquad (21)\end{equation}

If $\mathfrak{S}(\mathcal{H})\neq 0$ then $\mathcal{H}$ is admissible.

**Prime Tuple Conjecture** If $\mathcal{H}$ is admissible, then \begin{equation} \pi(x,\mathcal{H}) \sim \mathfrak{S}(\mathcal{H}) \text{li}_k(x), \ \ \text{where} \ \ \text{li}_k(x) = \int_2^x \frac{1}{(\log t)^k}\, dt.\qquad (22)\end{equation}

Hardy and Littlewood also formulated the prime tuple conjecture in terms of the von Mangoldt function by defining \begin{equation} \Lambda(n;\mathcal{H}) := \Lambda(n+h_1)\Lambda(n+h_2)\cdots \Lambda(n+h_k), \qquad (23)\end{equation} (so this will be zero if any of the $\Lambda(n+h_i)$ is zero), and then equivalently to (22) conjectured \begin{equation} \sum_{n\le x}\Lambda(n;\mathcal{H}) \sim \mathfrak{S}(\mathcal{H})x. \qquad (24)\end{equation}

We now approximate $\Lambda(n;\mathcal{H})$ by \begin{equation}\Lambda_R(n;\mathcal{H}):= \Lambda_R(n+h_1)\Lambda_R(n+h_2)\cdots \Lambda_R(n+h_k). \qquad (25)\end{equation}

To detect primes we need non-negative approximations and so we square this approximation, and the key formulas we need to compute are \begin{equation}\sum_{n\le x}\Lambda_R(n;\mathcal{H})^2 \quad \text{and} \quad \sum_{n\le x}\Lambda(n+h_0)\Lambda_R(n;\mathcal{H})^2. \qquad (26)\end{equation}

For the second sum, the single factor of $\Lambda(n+h_0)$ really is detecting primes in the tuple, since we find that if $h_0\in \mathcal{H}$ we get a result larger by a factor of $\log R$ than what we get when $h_0 \not \in \mathcal{H}$. The second sum is evaluated by summing $\Lambda(n+h_0)$ through arithmetic progressions modulo products of divisors from $\Lambda_R(n;\mathcal{H})^2$. Notice all the $\Lambda_R$’s are all only second powers, and thus the complicated correlation coefficients no longer occur. To obtain the $1/4$ result still requires considerable effort, especially in the optimization argument.

The main disadvantage of this approximation is that it is formed from many short divisor sums multiplied together; in (16) there are $2k$ of them, which forces $1\le R\le N^{1/2k}$ to control the error term in evaluating the first sum, while for the second sum we require $1\le R\le N^{1/4k- \epsilon}$ This reduces the quality of the approximation. It is natural to think that if we could approximate the number of primes in a tuple by a single divisor sum then we could take the approximation much longer and obtain a better result. As it turns out, sieve methods are based exactly on this idea, and at this point the correct method for finding small gaps between primes finally has started to emerged. Even with this, the approximations require an additional idea, which Janos Pintz discovered and then the three of us worked together to develop the GPY method.

I would like to mention a few people involved in the development of GPY. Andrew Granville and Kannan Soundarajan made a number of significant contributions to the discovery and development of GPY over many years. Yoichi Motohashi substantially simplified the initial proof [7]. Sid Graham, who worked with us as GPY was developed, was mainly responsible for obtaining a Selberg sieve proof of the method, as well as developing GPY for use in studying EK numbers, where an EK number is an integer with exactly K distinct prime factors. One notable result proved in [5] is that every admissible 3-tuple contains two E2 numbers infinitely often. See also [4],[6].

For readers interested in diving into this subject I highly recommend reading Andrew Granville’s survey paper [13] which provides an abundance of insights and technical details on the work of Zhang and of Maynard and Tao.

## References

[1] E. Bombieri and H. Davenport, *Small differences between prime numbers*, Proc. Roy. Soc. Ser. A, **293** (1966), 1–18.

[2] D. A. Goldston, *On Bombieri and Davenport’s theorem concerning small gaps between primes*, Mathematika **39** (1992), 10–17.

[3] D. A. Goldston, *A lower bound for the second moment of primes in short intervals*, Expo Math. **13** (1995), 366–376.

[4] D. A. Goldston, S. W.Graham, J. Pintz, and C.Y. Yıldırım, *Small gaps between primes or almost primes*, Transactions of the AMS, **361** No. 10, (2009), 5285–5330.

[5] D. A. Goldston, S. W. Graham, J. Pintz, and C.Y. Yıldırım, *Small gaps between products of two primes*, Proc. London Math. Soc. (3) **98** (2009), 741–774.

[6] D. A. Goldston, S. W. Graham, J. Pintz, and C.Y. Yıldırım, *Small Gaps Between Almost Primes, the Parity Problem, and Some Conjectures of Erdös on Consecutive Integers*, International Mathematics Research Notices,** 7** (2011), 1439–1450.

[7] D. A. Goldston, Y. Motohashi, J. Pintz, and C.Y. Yıldırım, *Small gaps between primes exist*, Proc. Japan Acad., **82A** (2006), 61–65.

[8] D. A. Goldston, J. Pintz, and C.Y. Yıldırım, *Primes in Tuples I*, Annals of Mathematics (2), **170** (2009), 819–862.

[9] D. A. Goldston, J. Pintz, and C.Y. Yıldırım, *The path to recent progress on small gaps between primes*, Clay Mathematics Proceedings, Volume 7, 2007, 129–139.

[10] D. A. Goldston and C.Y. Yıldırım, *Higher correlations of divisor sums related to primes: I: Triple correlations*, Integers **3** (2003), A5, 66pp. (electronic).

[11] D. A. Goldston and C.Y. Yıldırım, *Higher correlation of short divisor sums II: Triple correlations: Variations of the error term in the prime number theorem*, Proc. London Math. Soc. (3) **95** (2007), 199–247.

[12] D. A. Goldston and C.Y. Yıldırım, \emph{ Higher correlation of short divisor sums III: Gaps between primes}, Proc. London Math. Soc. (3) **95** (2007), 653–686.

[13] A. Granville, *Primes in intervals of bounded length*, Bull. Amer. Math. Soc. (N.S.), **52** (2015), 171–222.

[14] G. H. Hardy and J. E. Littlewood, *Some problems of `Partitio Numerorum’: III On the expression of a number as a sum of primes*}, Acta Math. **44** (1923), 1–70.

[15] H. Maier, *Small differences between prime numbers*, Michigan Math. J., **35** (3) (1988), 323-344.

[16] J. Maynard, *Small gaps between primes*, Annals of Mathematics (2), **181**, 383–413.

[17] D. H. J. Polymath, *Variants of the Selberg sieve, and bounded intervals containing many primes*, Res. Math. Sci. **1** (2014), Art. 12.

[18] A. Selberg, *On an elementary method in the theory of primes,* Norske Vid. Selsk. Forh., Trondhjem **19** (1947) 64–67.

[19] K. Soundararajan, *Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım*}, Bull. Amer. Math. Soc. **44**, Number 1, January 2007, 1–18.

[20] Y. Zhang, *Bounded gaps between primes*, Annals of Mathematics (2), **179** (2014), 1121–1174.