Sieve of Eratosthenes: Time Complexity using Inclusion-Exclusion Principle

Authored by: Rohan Thomas

Sieve of Eratosthenes is an ancient algorithm introduced by Greek mathematician Eratosthenes in the third century BC, used to find prime numbers up to a given number. In this material we will discuss this algorithm and determine the exact number of eliminations made.

πŸ’‘
Problem: Given nn ∈\in N\mathbb{N}. List all the primes up to nn

Algorithm:

  1. List all integers from 2 to nnο»Ώ
  1. Start with the smallest prime, set p=2 p = 2ο»Ώ
  1. Eliminate all the proper multiples of ppο»Ώ which are less than nnο»Ώ such as 2p,3p,4p,...2p, 3p, 4p, . . .ο»Ώ
  1. Assign the value of ppο»Ώ to the smallest number next greater than ppο»Ώ which have been not eliminated, remove the multiples of that number
  1. Repeat the steps until p≀n p \leq \sqrt{n} ο»Ώ

Let’s consider the following example of listing all the primes less than n=15n = 15ο»Ώ

  1. Generate a list of numbers from 2 to 15
    2 3 4 5 6 789101112131415

  1. Starting with the smallest prime p=2 p = 2ο»Ώ, eliminate all multiples of 2
    2 3579111315

  1. Assign the value of ppο»Ώ to the next prime, which is 3, (since 3≀n 3 ≀ \sqrt{n}ο»Ώ). Eliminate
    all the multiples of 3 to obtain.
    2 3571113

4. When p=5p = 5ο»Ώ , the condition p≀np ≀ \sqrt{n}ο»Ώ is false as 5>n5 > \sqrt{n}ο»Ώ. Thus we exit the process, and now all the numbers in the list are prime numbers up to 15.

Time Complexity

The goal is to compute the number of eliminations took place for a given input n. For simplicity let’s just assume that every elimination has some cost associated with it (this includes computing the product, comparisons, etc.).


Let’s first see how many eliminations have been taken place for some input n. Recall we continue the procedure until p<np < \sqrt{n}ο»Ώ, so, the primes involved in the procedure are P={2,3,5,…,pm,…,pe}\mathbb{P}=\{2,3,5,\ldots,p_m,\ldots ,p_e\}ο»Ώ

When p=2:p = 2:ο»Ώ

⌊n2βŒ‹βˆ’1eliminationsΒ made\begin{align}\left\lfloor\frac{n}{2}\right\rfloor - 1 \quad \text{eliminations made} \end{align}

p=3p = 3ο»Ώ: since we already removed all the even multiples, we have:

⌊n3βŒ‹βˆ’1βˆ’βŒŠn2βˆ—3βŒ‹\begin{align} \left\lfloor\frac{n}{3}\right\rfloor - 1 - \left\lfloor\frac{n}{2*3}\right\rfloor \end{align}

p=5p=5ο»Ώ: there are ⌊n/5βŒ‹\lfloor{n/5}\rfloorο»Ώ multiples of 5, from this we need to remove the multiples of 5 which are either divisible by 2 or 3

⌊n5βŒ‹βˆ’(⌊n2βˆ—5βŒ‹+⌊n3βˆ—5βŒ‹βˆ’βŒŠn2βˆ—5βˆ—3βŒ‹)βˆ’1\begin{align}\left\lfloor\frac{n}{5}\right\rfloor - \left(\left\lfloor\frac{n}{2*5}\right\rfloor + \left\lfloor\frac{n}{3*5}\right\rfloor - \left\lfloor\frac{n}{2*5*3}\right\rfloor \right) - 1\end{align}

p=7:p = 7: ο»Ώ

⌊n7βŒ‹βˆ’(⌊n2βˆ—7βŒ‹+⌊n3βˆ—7βŒ‹+⌊n5βˆ—7βŒ‹βˆ’βŒŠn2βˆ—7βˆ—3βŒ‹βˆ’βŒŠn2βˆ—5βˆ—7βŒ‹βˆ’βŒŠn3βˆ—5βˆ—7βŒ‹+⌊n2βˆ—3βˆ—5βˆ—7βŒ‹)βˆ’1\begin{align} &\left\lfloor\frac{n}{7}\right\rfloor - \left(\left\lfloor\frac{n}{2*7}\right\rfloor + \left\lfloor\frac{n}{3*7}\right\rfloor + \left\lfloor\frac{n}{5*7}\right\rfloor \right. \\ &\left. - \left\lfloor\frac{n}{2*7*3}\right\rfloor - \left\lfloor\frac{n}{2*5*7}\right\rfloor - \left\lfloor\frac{n}{3*5*7}\right\rfloor + \left\lfloor\frac{n}{2*3*5*7}\right\rfloor \right)- 1 \end{align}

We can easily note the connections to inclusion-exclusionΒ principle\textbf{inclusion-exclusion principle}ο»Ώ. Let's generalize this to any arbitrary step involving pmp_mο»Ώ.

p=pmp=p_mο»Ώ

Consider the following sets, Am1A_{m_1}ο»Ώ= {x:xx:xο»Ώ is a multiple of pm p_mο»Ώ and 2}, Am2A_{m_{2}}ο»Ώ={x:xx: xο»Ώ is a multiple of pmp_mο»Ώ and 3}, Am4A_{m_{4}}ο»Ώ={x:xx:x ο»Ώ is a multiple of pmp_mο»Ώ and 5}, β‹―\cdotsο»Ώ , AmnA_{m_n}ο»Ώ={x:xx: xο»Ώ is a multiple of pmp_mο»Ώ and pmβˆ’1p_{m-1}ο»Ώ}.

notice there are ⌊npmβŒ‹\lfloor\frac{n}{p_m}\rfloorο»Ώ multiples of pnp_nο»Ώ, but we need remove multiples of pmp_mο»Ώ which are divisible by either 2,3,5,…,2, 3, 5,\ldots,ο»Ώpmβˆ’1. p_{m-1}.ο»Ώ This is exactly same as,

⌊npmβŒ‹βˆ’βˆ£β‹ƒi=1nAmiβˆ£βˆ’1=⌊npmβŒ‹βˆ’(βˆ‘i=1n∣Amiβˆ£βˆ’βˆ‘i<j∣Ami∩Amj∣+βˆ‘i<j<k∣Ami∩Amj∩Amkβˆ£βˆ’β‹―Β +(βˆ’1)nβˆ’1∣Am1βˆ©β‹―βˆ©Amn∣)βˆ’1\begin{align} &\left\lfloor\frac{n}{p_m}\right\rfloor -\left| \bigcup_{i=1}^{n}A_{m_i} \right| -1 \\ &=\left\lfloor\frac{n}{p_m}\right\rfloor -\left( \sum_{i=1}^n\left|A_{m_i}\right| -\sum_{i < j}\left|A_{m_i}\cap A_{m_j}\right| \right. \\ &+ \left. \sum_{i<j<k}\left|A_{m_i}\cap A_{m_j}\cap A_{m_k}\right|-\cdots\ +(-1)^{n-1} \left|A_{m_1}\cap\cdots\cap A_{m_n}\right|\right) -1\end{align}

Thus, for a given nnο»Ώ, we can write the time complexity as

T(n)=C(⌊n2βŒ‹+⌊n3βŒ‹+β‹―+⌊npeβŒ‹βˆ’βˆ‘p∈Pβˆ£β‹ƒi=1nAmiβˆ£βˆ’1) T(n) = C \left( \left\lfloor\frac{n}{2}\right\rfloor + \left\lfloor\frac{n}{3}\right\rfloor + \cdots + \left\lfloor\frac{n}{p_e}\right\rfloor -\sum_{p \in \mathbb{P}}\left| \bigcup_{i=1}^{n}A_{m_i} \right| -1 \right)

where C is the cost per elimination

Here we obtained a more theoretical tighter run time than O(nlog⁑log⁑n) O(n\log \log n) counting the exact number of eliminations made. But in fact T(n)∈O(nlog⁑log⁑n)T(n) \in O(n\log \log n).