Dense prime clusters

Dense prime clusters
Thomas R. Nicely

http://www.trnicely.net
Current e-mail address

Last updated 0400 GMT 7 June 2009. Originally posted 23 May 2008.

Introduction
Analysis and computations
Theoretical and observed frequencies
Results obtained by other researchers
Bibliography

INTRODUCTION

Dense prime clusters are special instances of prime k-tuples in which a relatively large number of primes appears in a relatively small interval, in a pattern which is conjectured to recur infinitely often. The simplest example is the set of twin-prime pairs {(3,5), (5,7), (11,13), (17,19), ...,(p,p+2),...}; Silva has counted 808675888577435 such pairs below 1e18, and their number is believed to be infinite (the twin primes conjecture), but this has not yet been proven.

A more typical and complex example is one called to my attention on 6 May 2008 by Enrico Federighi, namely, the set of prime decuples (10-tuples) of the form (p, p+2, p+6, p+8, p+18, p+20, p+30, p+32, p+36, p+38), consisting of prime quadruplets differing by 30 (a rare occurrence in itself), with a twin-prime pair midway between, and no other intervening prime. This would constitute ten primes among 39 consecutive integers. The constellation is alternately characterized by D La Pierre Ballard as five twin-prime pairs among thirty-nine integers (and thus he labels it as 5TP39), and as the quintuplet twins (by Jens Kruse Andersen).

It is not immediately evident that such a dense cluster of primes exists at all. A quick inspection of a table of primes to 1000 reveals no such pattern; indeed, the only quadruplets (needed to initiate such a sequence) below 1000 are (5, 7, 11, 13), (11, 13, 17, 19), (101, 103, 107, 109), (191, 193, 197, 199), and (821, 823, 827, 829), and none of these differ by 30, as is necessary (but not sufficient). Thus, the first question is whether or not such a cluster can exist at all.

Very strong evidence for the existence of such a pattern can be obtained using the arguments presented in (for example) Riesel. Consider the set of offsets {0, 2, 6, 8, 18, 20, 30, 32, 36, 38} that define the cluster. Observe the following facts:

All the offsets are even.
Division of the offsets by 3 never leaves remainder 1.
Division of the offsets by 5 never leaves remainder 4.
Division of the offsets by 7 never leaves remainder 5.

Then Federighi's cluster satisfies the definition of an admissible constellation; that is, the set of offsets from the initiating prime are all even, and the set of remainders obtained upon division by each odd prime q <= 10 (the number of elements in the cluster) does not form a complete residue class modulo q. See Riesel or an introductory number theory text for an explanation of modulo and residue classes; basically this just says that at least one possible remainder is left unfilled for each q.

Finally, we cite the famous prime k-tuples conjecture, resulting from a 1922 study by Hardy and Littlewood: Any admissible constellation Q occurs infinitely often, with all its members prime, and the asymptotic number N(Q, x) of such occurrences <= x is

[1]    N(Q, x) ~ C(Q)*x/(ln(x))^k
[2]    N(Q, x) ~ C(Q)*int(1/(ln(t))^k, t, 2, x)

where C(Q) is determined by Q (but independent of x) and k is the number of elements in one instance of the constellation. The integral form is the one more often quoted, as in Riesel and Odlyzko et al.; the lower limit of integration is sometimes taken as zero (e.g., by Harley), which then requires the use of the Cauchy principal value at the singularity t=1. The integral form is a consequence of the non-integral form by virtue of the fact that the limit of the quotient of the two expressions is 1 as x tends to infinity, implying asymptotic equality. The integral form appears to be preferred primarily because it produces significantly more accurate approximations for all but very small or extremely large values of x.

This conjecture remains unproven, even in the simplest case (twin-prime pairs), but no counterexample has been found to date.

Thus the prime k-tuples conjecture implies that Federighi's cluster not only exists, but that it recurs for arbitrarily large values of x, and the number of instances <= x is asymptotically Cx/(ln x)^10 for some positive constant C.

ANALYSIS AND COMPUTATIONS

There remains the matter of finding the first actual instances of the cluster. This is clearly a task for a computer search, which I have undertaken, using computer code written in C and optimized for the task at hand. A brute force procedure would be to simply generate the primes sequentially, then, for each prime p, to test each of p+2, p+6, p+8, p+18, p+20, p+30, p+32, p+36, and p+38 for primality. As soon as one of these is composite, proceed to the next p.

Clearly, this procedure can be optimized. We first note that the quadruplet (p, p+2, p+6, p+8) can only be prime if p % 11 mod 30; that is, p leaves remainder 11 when divided by 30. Here, due to the limitations of HTML, I use the % symbol as the "congruence" symbol (as in C), normally indicated by the triple equal sign (identity or equivalence symbol). Note also that there is one quadruplet which is an exception to this rule, namely (5, 7, 11, 13), but it can be disregarded, as p+20 would be composite. The truth of the assertion can be verified by noting that for any other residue (mod 30), 0..10 or 12..29, at least one of the elements p, p+2, p+6, or p+8, will be divisible by either 2, 3, or 5. For instance, if p % 13 mod 30, then p+8 % 21 mod 30 ==> p+8 = 30n + 21 = 3(10n+7), for some positive integer n, and thus p+8 is divisible by 3 and therefore not prime. Thus p % 11 mod 30 is a necessary condition.

Considering also divisibility by 7, we note that if p % 11 mod 30, then p must be congruent to either 11, 41, 71, 101, 131, 161, or 191 (mod 210). But if p % 11 mod 210, then p+38 is divisible by 7 (note that we are now examining all the elements of Federighi's cluster, not just the initial quadruplet); if p % 41 mod 210, then p+8 is divisible by 7; if p % 71 mod 210, then p+6 is divisible by 7; if p % 101 mod 210, then p+18 is divisible by 7; if p % 131 mod 210, then p+2 is divisible by 7; and if p % 161 mod 210, then p itself is divisible by 7. Thus p must be congruent to 191 mod 210. Note also that this implies that the other intervening odd numbers (p+4, p+10, p+12, p+14, p+16, p+22, p+24, p+26, p+28, p+34) must be divisible by either 2, 3, 5, or 7, and so there will automatically be no other "rogue" primes within Federighi's clusters.

Finally we consider divisibility by 11. Since p % 191 mod 210, then p % (191, 410, 611, 821, 1031, 1241, 1451, 1661, 1871, 2081, or 2291) (mod 2310). But p % 191 mod 2310 ==> p+18 is divisible by 11; p % 410 mod 2310 ==> p+8 is divisible by 11; p % 611 mod 2310 ==> p+38 is divisible by 11; p % 1031 mod 2310 ==> p+36 is divisible by 11; p % 1241 mod 2310 ==> p+2 is divisible by 11; p % 1661 mod 2310 ==> p is divisible by 11; p % 1871 mod 2310 ==> p+32 is divisible by 11; p % 2081 mod 2310 ==> p+20 is divisible by 11; and p % 2291 mod 2310 ==> p+8 is divisible by 11. Thus it is necessary that p be congruent to either 821 or 1451 (mod 2310).

This process of exclusion could be continued, considering divisibility by 13 and remainders modulo 30030, but one eventually reaches a point of diminishing returns, and we choose to stop here.

We now know that, rather than testing each and every prime as a candidate for p in Federighi's cluster, we only have to test primes of the form p = 2310n + 821 and p = 2310n + 1451. Thus the core of the search program will appear as follows (in pseudocode):

    p = 821;
    while(1)
      {
      if(prime(p   ) && prime(p+ 2) && prime(p+ 6) && prime(p+ 8) &&
         prime(p+18) && prime(p+20) &&
         prime(p+30) && prime(p+32) && prime(p+36) && prime(p+38))
        {
        print(p);
        }
      p += 630;  /* to get p = 2310n + 1451 */
      if(prime(p   ) && prime(p+ 2) && prime(p+ 6) && prime(p+ 8) &&
         prime(p+18) && prime(p+20) &&
         prime(p+30) && prime(p+32) && prime(p+36) && prime(p+38))
        {
        print(p);
        }
      p += 1680;  /* to get p = 2310(n + 1) + 821 */
      if(p > UB)break;  /* UB will be part of program input */
      }
    end;

Note that most compilers will generate code which immediately terminates the conditionals as soon as one of the elements is found false (composite), so nested ifs are unnecessary. The code will also need a function/procedure/subroutine prime(n), which returns 1 if n is prime and 0 if n is composite. This might be a standard library routine (as with the GMP library) or a custom routine written by the user.

I have provided a zipfile, rico1.zip, containing one of the codes, rico1.c, that I used in my own search. It is written in GNU C; compilation requires the library support files trn.c and trn.h found in trn.zip, as well as linkage with the GMP library. Also included is an executable for the Windows command line environment.

The results produced by the code are as follows. Shown are all fifty-four solutions found for the initial element of Federighi clusters in p < 10^13.

===================
 p =   39713433671
 p =   66419473031
 p =   71525244611
 p =  286371985811
 p =  480612532451
 p =  535181743301
 p =  789972743471
 p = 1195575264641
 p = 1219449947921
 p = 1256522812841
 p = 1292207447351
 p = 1351477467251
 p = 1450982599271
 p = 1460592638171
 p = 1515361442261
 p = 1592346154541
 p = 1608037625861
 p = 1974772380401
 p = 1998453796481
 p = 2342693983541
 p = 2407899149861
 p = 2846256286631
 p = 2875683396791
 p = 3447368334461
 p = 3764105215691
 p = 3885646849751
 p = 3990788336831
 p = 4281968424491
 p = 4434754318661
 p = 4588646146751
 p = 5281739556851
 p = 5298100228241
 p = 5372177740841
 p = 5516252325341
 p = 5550991305911
 p = 5552057436221
 p = 5928960648431
 p = 5958488542211
 p = 6177779528741
 p = 6454468231031
 p = 6672992365181
 p = 6743600189531
 p = 7244460837371
 p = 7328923752971
 p = 7332453444521
 p = 7425589431311
 p = 7562342553971
 p = 7735261558691
 p = 8327881892381
 p = 8590488666671
 p = 8965822881311
 p = 9134998909241
 p = 9436299883061
 p = 9737406228461
===================

The run required about two days CPU time on an average machine ($600 Dell). Performance could be improved significantly by using sieving to generate large blocks of primes rather than testing them sequentially. The code is well-suited for distributed processing.

THEORETICAL AND OBSERVED FREQUENCIES

As noted, the frequency of occurrence of Federighi's cluster is implied by the prime k-tuples conjecture to be

[3]    N(Q, x) ~ C(Q)*x/(ln(x))^10

of which the integral form is

[4]    N(Q, x) ~ C(Q)*int(1/(ln(t))^10, t, 2, x)

The constant C(Q) may be expressed as follows:

[5]    C(Q) = A(k, Q)*c_k

where c_k is the kth Hardy-Littlewood constant,

[6]    c_k = prod((p^(k-1))*(p-k)/(p-1)^k, p; p prime, p > k)

and A(k, Q) is a constant dependent on both the number of elements k in each member and the specific constellation Q. Actually, if k=2, A(2, Q)=2 for all admissible prime pairs; for k > 2, A will vary, for any given k, depending on the specific constellation Q. As noted by Federighi, this means that the sometimes cited constants D=9/2*c_3 and E=27/2*c_4 are in fact not the value of C(Q) for every admissible set of triplets and (respectively) quadruplets.

It is possible to express A(k, Q) as follows:

[7]    A(k, Q) = prod((p^(k-1))R(Q, p)/(p-1)^k, p; p prime, 2 <= p <= k)
                 * prod(R(Q, p)/(p-k)), p; p prime, k < p <= m)

where R(Q, p) is the number of free (unoccupied) residue classes (mod p) for any specific element of Q, and R=p-k for all primes p > m; note that m will be no larger than s/2, where s is the "span" of the constellation (the difference between the largest and smallest elements of any particular instance). If the second product is empty, its default value is one (one for products, zero for sums).

For Federighi's constellation, k=10, s=38, m=19 (provisionally), and the values of R can be computed by observing the residues obtained when dividing the set of offsets {0, 2, 6, 8, 18, 20, 30, 32, 36, 38} by each prime p from 2 through 19 inclusive. The results are as follows.

    R(Q,  2) =  1  (remainder 1 is free)
    R(Q,  3) =  1  (remainder 1 is free)
    R(Q,  5) =  1  (remainder 4 is free)
    R(Q,  7) =  1  (remainder 5 is free)
    R(Q, 11) =  2  (remainders 1 and 4 are free)
    R(Q, 13) =  4  (remainders 1, 3, 9, and 11 are free)
    R(Q, 17) =  8  (remainders 5, 7, 9, 10, 11, 12, 14, and 16 are free)
    R(Q, 19) = 10  (remainders 3, 4, 5, 7, 9, 10, 12, 14, 15, and 16 are free)

For all p > 19, R(Q, p)=p-k=p-10, as none of the ten elements can produce the same remainder. Thus

    A = 2^9*1/1^10 * 3^9*1/2^10 * 5^9*1/4^10 * 7^9*1/6^10
        * 2/(11 - 10) * 4/(13 - 10) * 8/(17 - 10) * 10/(19 - 10)
      = 41426.615042175...

and so C(Q) = (41426.615042175)*c_10. The value of c_10 has been calculated by Harley:

    c_10 = 0.0418040508121816571...

and this yields C(Q) = 1731.8003202, so that the prime k-tuples conjecture implies

[8]    N(Q, x) ~  1731.8003202*int(1/(ln(t))^10, t, 2, x)

for Federighi's constellation.

Now the question is how well [8] approximates the observed frequency of the constellation. There is an immediate and obvious problem. Calculation reveals that N(Q, 10) = 11422.4755..., a nonsense estimate for the number of occurrences in 2 <= x <= 10. The source of this absurdity is the choice of the lower limit of integration t=2, very close to a singularity (at t=1) of the integrand. While this does not affect the validity of [8] as an asymptotic result (which simply requires that the relative error approach zero for extremely large x), it renders the formula awkward to use for estimating the frequency of occurrences for small x (e.g., below 10^20)---although in fact it will give a quite good estimate for the number of occurrences between large values x_1 and x_2 by taking N(Q, x_2) - N(Q, x_1).

This situation can be corrected by choosing a different (greater) lower limit of integration in [8]. Whatever lower limit is chosen (assuming singularities are excluded), the new formula will still be asymptotically equivalent to the classical Hardy-Littlewood formulas, [1], [2], [3], and [4]. Therefore we can be completely pragmatic and simply look for a lower limit L such that N(Q, F)=1 when F=39713433671, the known first occurrence of the constellation (obviously this procedure must be modified when the first occurrence is not yet known, or when F is very small). Trial and error reveals that such a value is very nearly L=7e9 (or more precisely, L=7,066,204,543.61---but 7e9 will do; indeed, even using L=100 for the lower limit will produce reasonable results). Thus we have the improved estimate

[9]    N2(Q, x) = 0                                            (x <  7e9)
       N2(Q, x) ~ 1731.8003202*int(1/(ln(t))^10, t, 7e9, x)    (x >= 7e9)

As observed, this is still asymptotically equivalent to the classical Hardy-Littlewood approximation resulting from the prime k-tuples conjecture---but it should be far more accurate for small x. In fact, further computation yields

    N2(Q, 39713433671) = 1.003192461
    N2(Q, 10^13)       = 45.79976
    N2(Q, 10^17)       = 27905.0327

The actual count for 10^13 is 54; for 10^17 is 28049 (according to D La Pierre Ballard).

An equivalent correction is produced by using the formula

[10]     N3(Q, x) ~ 1731.8003202*int(1/(ln(t + 15342200696))^10, t, 0, x)

which emphatically solves the singularity problem and produces

    N3(Q, 39713433671) = 1.00000000001
    N3(Q, 10^13)       = 45.51187
    N3(Q, 10^17)       = 27904.70196

If the latter procedure is applied to the twins, triplets ({p,p+2,p+6} and {p, p+4,p+6}), and quadruplets {p,p+2,p+6,p+8}, the values a=6, a=10, and a=6 (respectively) for the constant in ln(t+a) will yield formulas asymptotically equivalent to the standard ones, with improved results for small x; a least-squares analysis might result in superior values for a in each of these cases.

RESULTS OBTAINED BY OTHER RESEARCHERS

On 19 June 2008 I received a message from Jens Kruse Andersen indicating that he and others have made a more detailed study of this constellation. With his permission, I am reproducing (with minor changes, such as removing specific e-mail addresses) his message below.

From: Jens Kruse Andersen
To: "Thomas R. Nicely"
Subject: Dense prime clusters
NTS: 2008.06.19.1630.59
Date: Jun 19, 2008 4:30 PM GMT

I just saw http://www.trnicely.net/dense/dense1.html today.
D La Pierre Ballard calls the constellation 5TP39.
The links at http://www.teapro.com/ include all occurrences below 10^18.
Below is a mail I wrote to D La Pierre Ballard last year.

-- 
Jens Kruse Andersen

>----- Original Message ----- 
>From: "Jens Kruse Andersen"
>To: D La Pierre Ballard
>Sent: Tuesday, August 21, 2007 8:31 PM
>Subject: Re: Twin prime clusters
>
> I first saw your site Saturday and adapted my tuplet finder.
> The program is in C and uses the GMP library. It has been
> adapted for a lot of different prime patterns over years and become
> pretty messy. Many of the previous results by the program are linked
> at http://hjem.get2net.dk/jka/math/myrecords.htm (some results there are
> by other programs).
>
> 31# is the primorial 2*3*5*...*31 = 200560490130.
> Patterns modulo 31# where none of the 10 numbers have a factor <= 31 are
> identified and sieved to 1000. The GMP library then performs prp (probable
> prime) tests on cases where the 10 numbers have no factor below 1000.
> Prp's are not proved prime by the program.
> Only around 10^9 prp's were computed in total. That is a tiny fraction of
> the 2,623,557,157,654,233 primes below 10^17.
>
> The attached version computed the 5TP39 below 10^17 in 13 hours on one
> core of a 2.4 GHz Core 2 Duo running Windows Vista 32-bit (so 32-bit
> code was used). The executable may not run on some older cpu's, but I
> could recompile it for some of them. Some of the screen output is not
> adjusted to the purpose.
>
> Numbers are not tested in increasing order. This means patterns without
> small factors can be sieved faster. The found 5TP39 (I call them
> quintuplet twins) are saved in PrimeForm/GW input format.
>
> Some post processing (by other routines) converted to decimal expansions,
> sorted the numbers, and made primality proofs with PARI/GP (only took
> minutes). If you want to search further with the program, for example to
> 10^18, then I can adapt it and help with post processing.
>
> My computation almost agrees with http://www.teapro.com/fix5tp39.html up
> to 18e15. I found four more:
>
> 1,762,637,672,979,941
> 1,763,453,659,543,781
> 1,766,720,906,216,471
> 3,999,838,558,615,811
>
> I have not compared above 18e15.
>
> Some other results that may be of interest follows.
> My results were found with other versions of the program.
> The C source is modified by me and recompiled for every different prime
> pattern or size. The program is not published.
>
> -----
>
> I found a 62-digit 5TP39 with initial prime
> p = 3121873*139#+4109577991661.
> Decimal expansion of p:
> 31264454983046045293625252164403393316710411262794905229969231
> The expected time was 14 hours. It took 23.
>
> For comparison, the largest known case of 10 "simultaneous primes" at
> http://hjem.get2net.dk/jka/math/simultprime.htm is 104 digits for a
> 10-tuplet, also by the program. By the way, I only searched a 10-tuplet
> and screwed up by forgetting to check whether it could be extended to an
> 11-tuplet. By pure luck it could, and Norman Luhn got shared credit for
> discovering that.
>
> -----
>
> In http://tech.groups.yahoo.com/group/primenumbers/message/18318 I wrote:
>
> Quadruplets above 10 start at one of 210n + 11, 101, 191.
> In the densest admissable constellation of 4 quadruplets,
> the quadruplets start at p + 0, 30, 120, 210,
> or the mirror pattern p + 0, 90, 180, 210.
>
> The smallest "quadruple quadruplet":
> 300000224101777931 + 0,2,6,8; 90,92,96,98; 180,182,186,188;
> 210,212,216,218
> There are 8 other primes between the quadruplets.
>
> The next is 10 times larger, and the smallest with the other pattern:
> 3051450534439926131 + 0,2,6,8; 30,32,36,38; 120,122,126,128;
> 210,212,216,218
> The first 3 of those quadruplets have no primes betweem them,
> but there are 4 primes before the last quadruplet.
>
> -----
>
> In
> http://groups.google.ms/group/alt.math.recreational/browse_thread/thread/92e59d3b02c0659c
> I wrote:
>
>> the first case of two prime quadruplets as closely together
>> as admissible: 1006301 + {0, 2, 6, 8; 30, 32, 36, 38}
>> I have computed the first case of two of those as closely together as
>> admissible:
>> 11281963036964038421 + {0,2,6,8; 30,32,36,38; 420,422,426,428;
>> 450,452,456,458}
>> The second case starts at 12114914563464663491.
>
> I have discovered that these were published by Jörg Waldvogel and
> Peter Leikauf in a report dated February 2007:
> http://www.sam.math.ethz.ch/~waldvoge/Projects/clprimes05.pdf
>
> Their pdf file also contains the first cases of 9 twin prime pairs
> as closely together as admissible, starting with:
> 35902987875008630158997 +
> 0, 2, 12, 14, 30, 32, 42, 44, 54, 56, 60, 62, 84, 86, 90, 92, 102, 104
>
> -----
>
> In http://primes.utm.edu/curios/page.php?number_id=7079 I wrote:
>
> The first case of 8 twin prime pairs as closely together as admissible
> is 17625750738756291797 + n,
> for n = 0, 2, 12, 14, 24, 26, 30, 32, 42, 44, 54, 56, 72, 74, 84, 86.
>
> -----
>
> In case you wonder, the program is not fast enough to find two 5TP39 or 10
> twin prime pairs as closely together as admissible.
>
> -- 
> Jens Kruse Andersen

BIBLIOGRAPHY

Andersen, Jens Kruse. http://hjem.get2net.dk/jka/math.
Ballard, D La Pierre. 5TP39, five twin primes in a group of thirty-nine numbers.
Engelsma, Thomas J. K-tuple permissible patterns.
Hardy, G. H., and J. E. Littlewood. Some problems of 'Partitio Numerorum'; III: On the expression of a number as a sum of primes. Acta Math. 44:1-70 (1922). See also Collected papers of G. H. Hardy, Clarendon Press, Oxford (1966) I:561-630.
Harley, Robert Joseph. Some estimates due to Richard Brent applied to the "high jumpers" problem. December, 1994. See also http://pauillac.inria.fr/~harley/wnt.html.
Odlyzko, Andrew, Michael Rubinstein, and Marek Wolf. Jumping Champions. Experimental Math. 8:2:107-118 (1999).

Riesel, Hans. Prime numbers and computer methods for factorization (2nd edn., 1994, Birkhäuser (Boston)), pp. 60-73.

Silva, Tomás Oliveira e. Admissible prime constellations .

Waldvogel, Jörg, and Peter Leikauf. Finding clusters of primes, I, progress report 2003-2005. January, 2003.
Waldvogel, Jörg, and Peter Leikauf. Finding clusters of primes, II, progress report 2005-2008. December, 2008.
Waldvogel, Jörg, and Peter Leikauf. Parallelization of low-communication processes. 28 September 2006.