Deniz Ee 2
Deniz Ee 2
Introduction
I was interested in prime numbers since I’ve learned about them at 6th grade when I was just 12.
My tutor has taught me prime factorization of composite numbers. I always thought I can formulize
prime numbers. But my math was not enough to know that it is impossible. In high school I met big
mathematicians like Taylor and Euler and I am still trying to understand their approach to find the
formula of primes
In this extended essay, I aim to explore the possibility of finding a formula to predict prime
numbers. The focus is on using various mathematical techniques, including polynomial regression,
linear regression, and exponential functions, to create models that can estimate the sequence of
prime numbers.
Prime numbers are fundamental in mathematics due to their indivisibility by any number other
than 1 and themselves. Despite their simple definition, predicting their occurrence within the set of
natural numbers remains one of the most challenging problems in number theory.
To begin, I considered the first 25 prime numbers: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 89, and 97. First I will try to fit these primes into a linear model using
linear regression:
y=mx+c
Where y represents the prime numbers, and x represents their position in the sequence.
x y
1 2
2 3
3 5
4 7
5 11
6 13
7 17
8 19
9 23
10 29
11 31
12 37
13 41
14 43
15 47
16 53
17 59
18 61
19 67
20 71
21 73
22 79
23 83
24 89
25 97
The scatterplot looks like linear, if I try to sketch a straight line, it cannot passes through all the
points on the plot.
But I will try to minimize the distance between the points and line y=mx+c by minimizing the sum
of the squared differences between the observed prime values and those predicted by the model, I
derived the values of m and c. The calculations were:
&'(()(
= 2. ∑"#
!$%(y! − (mx! + c)).( −1) = 0
&*
− 2(∑"# "#
!$% y! − m. ∑!$% x! − 25c) = 0
∑"#
!$% ,! ∑"#
!$% -! "#*
− 50( "#
− m. "#
− "#
)=0
µ, − m. µ- − c = 0
c = µ, − m. µ-
c = 42−13m
&'(()(
&.
= 2. ∑"#
!$%(y! − (mx! + c)).( −x! ) = 0
− 2(18414−5486m) = 0
18414−5486m = 0
18414 = 5486m
m= 4
c = 42−13.4 = −10
After performing this necessary calculations, the best-fit line equation obtained was: y=4x−10
y=4(26)−10=94y=4(26)−10=94,
Given the inaccuracy of the linear model, I next attempted a quadratic regression:
Where
But I have more equations than variables. The system is going to be inconsistent. I can try linear
least squares which is:
L(X, β)=[|Y− X.β|]" where the β is the coefficients of this equations and [|.|] means the norm
(distance to origin) of a vector and actually we are still trying to minimize the sum of the squared
differences between the observed prime values and those predicted by the model.
=−2X 9 . Y + 2X 9 . X. β=0
2X 9 . X. β=2X 9 . Y
X 9 . X. β=X 9 . Y
Similarly, I derived the coefficients by minimizing the error. This approach yielded a better
prediction for the 26th prime like
y=0.0555927165*26" +2.539204757*26−2.895652174
estimating it to be 104, closer to 101 than the linear model. To further improve accuracy, I explored
cubic and quartic models:
The quartic model provided the best estimate, predicting the 26th prime as 102, which is very close
to the actual value. This suggests that higher-order polynomials may capture the nuances in the
distribution of prime numbers more effectively than simpler models.
To push the limits of polynomial regression, I considered a polynomial like Taylor series which we
know that he thinks that every function can be written as polynomials with infinite degree. I have
25 data, if I use 25 equations I can do it with degree 24, to solve it I can use Gauss-Jordan
elimination. The model takes the form:
y = a; + a% x + a" x " + a: x : + a< x < + a# x # + a= x = + a> x > + a? x ? + a@ x @ + a%; x%; +…+ a"< x "<
To solve for the coefficients, I set up a system of linear equations based on the values of y (prime
numbers) and x (their positions). This required solving a 25x25 matrix equation, which, due to its
complexity, was tackled using Gaussian elimination.
Unfortunately, my solver indicated the system was inconsistent, suggesting either an error in the
setup or limitations in the computational approach.
y(1) = a; + a% + a" + a: + a< + a# + a= + a> + a? + a@ + a%; + a%% + a%" + a%: +…+ a"< = 2
y(2) = a; + 2a% + 4a" + 8a: + 16a< + 32a# + 64a= + 128a> + 256a? + 512a@ +…+ 2"< a"< = 3
y(3) = a; + 3a% + 9a" + 27a: + 81a< + 243a# + 729a= + 3> a> + 3? a? + 3@ a@ +…+ 3"< a"< = 5
y(4) = a; + 4a% + 16a" + 64a: + 256a< + 1024a# + 4= a= + 4> a> + 4? a? + 4@ a@ +…+ 4"< a"< = 7
y(5) = a; + 5a% + 25a" + 125a: + 625a< + 5# a# + 5= a= + 5> a> + 5? a? + 5@ a@ +…+ 5"< a"< = 11
y(6) = a; + 6a% + 36a" + 216a: + 6< a< + 6# a# + 6= a= + 6> a> + 6? a? + 6@ a@ +…+ 6"< a"< = 13
y(7) = a; + 7a% + 49a" + 343a: + 7< a< + 7# a# + 7= a= + 7> a> + 7? a? + 7@ a@ +…+ 7"< a"< = 17
y(8) = a; + 8a% + 64a" + 512a: + 8< a< + 8# a# + 8= a= + 8> a> + 8? a? + 8@ a@ +…+ 8"< a"< = 19
y(9) = a; + 9a% + 81a" + 729a: + 9< a< + 9# a# + 9= a= + 9> a> + 9? a? + 9@ a@ +…+ 9"< a"< = 23
y(25) = a; + 25a% + 625a" + 25: a: + 25< a< + 25# a# + 25= a= + 25> a> + …+ 25"< a"< = 97
1 ⋯ 1 a; 2
p⋮ ⋱ ⋮ t . [ ⋮ ] = [ ⋮ ] then we can use the augmented matrix to use gausssian elimination
1 ⋯ 25"< a"< 97
1 1 … 1 ⋮ 2
[⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ] we can use row operations to solve the system
1 25 … 25"< ⋮ 97
y=?
(I couldn’t find a proper solution yet, something is wrong with my excel solver, it said the system is
inconsistent, but it is impossible, the system has 25 equations and 25 variables, when the
coefficients of variables are increasing the solutions are increasing too.)
y=a.eA.-
Using this model, the predicted 26th prime was 175, which was far off from the actual value,
indicating that exponential growth does not accurately describe the progression of prime numbers.
Additionally, I tried a logarithmic model:
y=a.ln(x)+b
This model predicted the 26th prime as 72, again highlighting a significant margin of error.
However, the logarithmic function is integral to understanding the distribution of primes,
particularly in the context of probability distributions.
A well-known result in number theory is the prime number theorem, which approximates the
number of primes less than a given number x as:
%
f(x)=BC- is the PDF for x>2 to get the sum of a continous probability distribution function from
initial to x, I can use definite integral.
lnx = u
dx = eD . du then
- % BC- %
∫" BC-
.dx = ∫BC" D
. eD . du
Using substitution and Taylor series expansion, the exponential function 𝑒 E is expanded as:
eD = a% + 2a" u + 3a: u" + 4a< u: + 5a# u< + 6a= u# + 7a> u= + 8a? u> + 9a@ u? +…+ n. aC uC7%
susbtitute u = 0 then a% = 1 too
eD = a% + 2a" u + 3a: u" + 4a< u: + 5a# u< + 6a= u# + 7a> u= + 8a? u> + 9a@ u? +…+ n. aC uC7%
differentiate both sides with respect to u again
eD = 2a" + 3.2.a: u + 4.3.a< u" + 5.4.a# u: + 6.5. a= u< + 7.6. a> u# + 8. 7. a? u= +…+ n. (n − 1). aC uC7"
%
susbtitute u = 0 then a" =
"
eD = 2a" + 3.2.a: u + 4.3.a< u" + 5.4.a# u: + 6.5. a= u< + 7.6. a> u# + 8. 7. a? u= +…+ n. (n − 1). aC uC7"
differentiate both sides with respect to u again
eD = 3.2.a: + 4.3.2.a< u + 5.4.3. a# u" + 6.5.4. a= u: + 7.6.5. a> u< + 8. 7.6. a? u# +…+ n. (n − 1). aC uC7"
%
susbtitute u = 0 then a: =
:."
…
D/
eD = ∑CG$% G! then
D/
= ln|u|+ ∑CG$% G.G!
From Linear Regressions I couldn’t find any exact prime number but when we rounded the
solutions to the nearest whole number we found them. It’s not a formula at all. This series
expansion helps approximate the distribution of primes more accurately.
Despite the extensive exploration of various mathematical models, finding a precise formula for
predicting prime numbers remains elusive. Polynomial regressions, especially higher-order ones,
showed potential by approximating the 26th prime more closely than linear or exponential models.
The insights from the probability distribution and cumulative functions highlight the complexity
and irregularity in the distribution of primes.
Ultimately, the quest for a definitive prime number formula continues to challenge mathematicians.
This essay underscores the importance of combining theoretical approaches with computational
techniques to advance our understanding of prime number distributions. Future work may involve
refining these models or exploring entirely new mathematical frameworks to achieve more
accurate predictions.
Barbarani, Vito. "Combinatorial Models of the Distribution of Prime Numbers." Mathematics, vol. 9,
no. 11, p. 1224, https://doi.org/10.3390/math9111224. Accessed 18 Apr. 2024.
https://math.stackexchange.com/questions/149755/textprimes-4n3-le-x-in-terms-of-textlix-and-
roots-of-diric?rq=1
https://www.mdpi.com/2227-7390/9/11/1224