I am going to skip ahead to Project Euler problem number 7.
Find the 10001st prime.
The solution to this problem involves creating a prime number generator which will come in very handy when solving some of the earlier problems.
I have been writing procedural code for something like 25 years now in assembler, FORTRAN, C, C++, Java and C#. I think that my first attempt at a solution shows this background.
I decided to use the
Sieve of Eratosthenes as my way of finding primes.
The first problem I faced was "how big is the 10001
st prime?" Poking around in Wikipedia led me to a page on the
prime counting function. There's a lot of interesting ideas there but for me the most useful one is that if p
n is the n
th prime then:
for large n.
With that tidbit of information solving this problem is pretty straightforward. The idea is that we create a bit array big enough that we are guaranteed that it will contain the 10001st prime, apply the Sieve algorithm to it, then scan for the one we are interested in:
let prime m =
let count n =
let f = float n
int (f*(log f + log (log f)))
let flags = new System.Collections.BitArray(count m, true)
let filter n =
match flags.[n] with
| true -> for i in n*n..n..flags.Count do flags.[i] <- false
| false -> ()
let root n =
let f = float n
int (sqrt f)
seq {2..root flags.Count} |> Seq.iter filter
seq {
for i in 2..flags.Count-1 do if flags.[i] then yield i
} |> Seq.nth (m-1)
prime 10001 |> printfn "%A"
Note that the filter function applies the "square optimization": we really only need to mark multiples of the current prime as composite starting with its square; smaller multiples of it will be removed when marking multiples of the smaller primes. This lets us avoid marking some numbers as composite multiple times.
This code solves the problem just fine and does so quite quickly (it took about a third of a second on my computer) but I just don't like it very much. It's functional but not very functional.
During my research on the Sieve I came across a paper by Melissa O'Neill called
The Genuine Sieve of Eratosthenes. In it she talks about implementing the sieve algorithm in a functional language (Haskell). The idea, as presented in section 3, is to mark numbers as composite "just in time". Rather than use a fixed size bit array to mark which numbers are prime and which are composite it uses a map to keep track of the next multiple of a particular prime.
let rec sieve n table =
let rec update n p table =
let n = n+p
match Map.tryFind n table with
| None -> Map.add n p table
| Some(prime) -> update n p table
seq {
match Map.tryFind n table with
| None ->
yield n
yield! sieve (n+1L) (table |> Map.add (n*n) n)
| Some(prime) ->
yield! sieve (n+1L) (Map.remove n table |> update n prime)
}
We seed this routine with the first prime and an empty map. Each time we request the next element in the sequence the function checks to see if the current number is in the table (and hence is composite). If not, the number is returned and then the map is updated with the first composite number that is not a multiple of a previously discovered prime (i.e. the square of the current number). If the current number is in the table then it is composite so we update the table and then check the next number. Once a number has been discovered to be composite we don't need it in the table any more (the sequence delivers up primes one at a time with no backtracking allowed) so we remove it and add the next multiple of the underlying prime.
This algorithm has a couple of nice features:
- The map only contains one entry for each prime we discover.
- With a minor change to use BigInts instead of long ints we could generate primes until your computer runs out of memory.
At this point a small helper function will get us the 10001stprime:
let primeprime n =
sieve 2L Map.empty |> Seq.nth (n-1)
primeprime 10001 |> printfn "%A"
Update: This routine runs a bit slower than the first one. It took about half a second to compute the answer. I believe this is because of the immutable map structure used. Every time an item is added or removed a revised copy of the map is created. After 10,000 primes it's doing a lot of work just in bookkeeping. I imagine you could improve performance by using a mutable data structure like System.Collections.Generic.Dictionary. I haven't cared enough about performance to make this change.