Scattered Thoughts

Monday, March 29, 2010

As previously noted, the call number for "The Oxford Companion to Musical Instruments" by Anthony Baines is 784.1903. Why is that? How do you come up with a Dewey classification number for a book? This one is actually pretty straightforward. It is built by combining 784.19 Instruments with the standard notation 030 Dictionary. Since no classification number is allowed to end in a 0 we end up with 784.1903. Of course this one was easy since I already knew the answer.

Wednesday, March 17, 2010

How do you organize your books so they can be found when you need them? For a "small" library almost anything goes: colour, size, alphabetically by author, alphabetically by title, etc. I have a modest number of scores and books about music acquired gradually over the last 20 years. For the longest time they were sorted alphabetically by author or composer or editor. Recently I looked through this collection and found forgotten treasures. Scores for recorders were filed next to scores for cello. Vocal music was next to string quartet music. My memory of what I owned was so degraded that I had no recollection of the pieces or where I'd bought them. Music that would be fun to play was lost in my filing system!

At this point it became clear that I needed to reorder my collection so it would be easier to find items of interest. Realistically my collection is so small that I could just come up with my own cataloging scheme and be done with it but, being a good programmer, I didn't want to reinvent the wheel. Why not stand on the shoulders of giants? This led me to the web.

A search quickly led me to a blog post by one of the librarians at the San Francisco Public Library about using the Dewey Decimal System to categorize scores. This seemed like a good way to go. If I could figure out the proper class number for each book or score, then sort them numerically, related items would be right beside each other. Also, I could use the SFPL's online catalog to my advantage. If they have the same books I have then I can just use their call number. No thinking required!

If only life was that simple.

Since I had to start somewhere I chose "The Oxford Companion to Musical Instruments" by Anthony Baines (ISBN 0-19-311334-1). Many books include the Dewey class number on their copyright page and this is one of them. According to the OUP, this book should be filed under 784.1903. As a check, just to be sure that my technique was viable, I looked it up on the SFPL web site. To my surprise they gave the call number as 781.91. Why the discrepancy? Which number is right?

After hitting this roadblock my enthusiasm (and progress) slowed somewhat. Finally, in desperation, I wrote to the librarians at the SFPL via their Ask a Librarian web page. I said,

I wish to use the Dewey Decimal System to arrange my personal library of books about music and musical scores. Your page of Musical Scores Call Numbers (http://sfpl.org/librarylocations/main/art/scoresearch.htm) is a great resource. However, I have noticed discrepancies between the numbers you assign books and the numbers others assign them. For example "The Oxford Companion to Musical Instruments" by Anthony Baines is given the DD# 784.91 by the publisher. In your library it is 781.19. Which number is correct and why? Any information you can provide would be much appreciated.

(Note that I jumbled up the numbers. Sheesh.)

One of the librarians there very quickly (and kindly) responded,

A radical revision was made to the music numbers of the Dewey classification in the 20th ed. (mid-1980's). It was decided at that time that SFPL would not start applying the new numbers, but would continue to use the (19th ed.) old schedule - in fact, a "customized" version of the old schedule.

Aha!

I imagine this was a difficult decision to make. Suppose your library has 20,000 music books and scores categorized using the the old schedule. How much would it cost to re-catalog all your books? Say it takes 15 minutes to figure out a new number. (This is obviously a wild guess. In future blog posts I will discuss how I compute a Dewey class number for a book and you will see that 15 minutes is probably an underestimate when you're striving for accuracy.) That means a librarian could classify four books in an hour. So it would take 5000 hours to reclassify all your books. In one year, if you take two weeks holiday, you work 40x50 = 2000 hours. That means it could take a single librarian two and a half years to reclassify all those books. You can start to see why they decided to stick with the old schedule. (According to the blog post referenced earlier, the SFPL may have as many as 45,000 actual items to catalog.)

Of course, on the flip side, now whenever a new music book or score is acquired you cannot use the publisher's determination of the Dewey class number. For each new book you need to return to first principles to figure out where it belongs. Suppose over time your library doubles in size from 20,000 to 40,000 books. You have now spent as much money re-categorizing new books using the old schedule as it would have cost to switch all your old books over in the first place.

When money is tight this is a tough call to make. How do you balance a huge upfront cost against an long-term, ongoing one? I suspect the latter is easier to hide in the budget.

Although I would like to use the SFPL call numbers as a labour-saving tool, it is pretty clear to me that I should use the new schedule for my music library. As luck would have it, my local library has the 20th edition of the "Dewey Decimal Classification and Relative Index" reference manuals. Now all I have to do is learn how class numbers are calculated.

Friday, October 30, 2009

The tenth problem in Project Euler revisits the prime numbers:

Calculate the sum of all the primes below two million.

Using the Sieve of Eratosthenes sequence developed earlier we can write the solution like this:

sieve 2L Map.empty |>

    Seq.takeWhile (fun n -> n < 2000000L) |>

    Seq.sum |>

printfn "sum = %A"

Arguably there is some inefficiency in generating a subsequence just consisting of the primes less than two million and then summing that sequence. I think the clarity of the resulting code makes up for the slight inefficiency.

Thursday, October 29, 2009

Today I want to present my solution to problem nine in the Project Euler series:

Find the only Pythagorean triplet, {a, b, c}, for which a + b+ c = 1000.

A little bit of research on Wikipedia led to the page on Pythgorean triples. Using Euclid's formula we can generate a sequence of tuples which can be manipulated to find the desired answer:

let triples =

    seq {

        for n in 1..100 do

            for m in n+1..100 do

                yield (2*m*n, m*m-n*n, m*m+n*n)

}

Note that my selection of 100 as the endpoint for each of the loops is pretty arbitrary. I just needed a number big enough that the triple whose members summed to 1000 would be included in the sequence. Then to find the right one we can write:

triples |>

    Seq.find (fun (a, b, c) -> a+b+c = 1000) |>

    (fun (a, b, c) -> a*b*c) |>

printfn "product = %A"

This sequence is lazily evaluated until we find the one we're looking for.

Friday, October 23, 2009

Yet another Project Euler post follows. This time I am looking at problem number eight:

Discover the largest product of five consecutive digits in the 1000-digit number.

The first thing to do was get the string of digits into a usable form. I copied them from the Project Euler web site, pasted them into a document and then did some manual labour to make them into a long string:

let chars =



 "73167176531330624919225119674426574742355349194934"+

 "96983520312774506326239578318016984801869478851843"+

 "85861560789112949495459501737958331952853208805511"+

 "12540698747158523863050715693290963295227443043557"+

 "66896648950445244523161731856403098711121722383113"+

 "62229893423380308135336276614282806444486645238749"+

 "30358907296290491560440772390713810515859307960866"+

 "70172427121883998797908792274921901699720888093776"+

 "65727333001053367881220235421809751254540594752243"+

 "52584907711670556013604839586446706324415722155397"+

 "53697817977846174064955149290862569321978468622482"+

 "83972241375657056057490261407972968652414535100474"+

 "82166370484403199890008895243450658541227588666881"+

 "16427171479924442928230863465674813919123162824586"+

 "17866458359124566529476545682848912883142607690042"+

 "24219022671055626321111109370544217506941658960408"+

 "07198403850962455444362981230987879927244284909188"+

 "84580156166097919133875499200524063689912560717606"+

 "05886116467109405077541002256983155200055935729725"+

 "71636269561882670428252483600823257530420752963450"

I know this is kind of inefficient (each + operation will create a new immutable string object containing the contents of the previous operation with the new part appended) but it gets the job done. Now, in F#, a string is automatically a sequence so we can just process it like this:

chars |>

    Seq.map (fun c -> int c - int '0') |>

    Seq.windowed 5 |>

    Seq.map (fun a -> Array.fold (*) 1 a) |>

    Seq.max |>

printfn "max product = %A"

The idea here is that we convert the sequence of characters into a sequence of digits and then create a new sequence from it which consists of each set of five consecutive digits. Then we compute the product of each of these five digit sets and find the maximum.

This was one solution that I reworked extensively on revisiting it. My first attempt was not very idiomatic. I think this one is much better.

Thursday, October 22, 2009

I will write more about music notation later as I explore LilyPond. In this post I want to explore my answer to problem #6 of Project Euler:

What is the difference between the sum of the squares and the square of the sums?

I first defined a helper function:

let square i = i*i

Then the sum of the squares can be expressed as:

let sum_of_squares = {1..100} |> Seq.map square |> Seq.sum

And the square of the sum is:

let square_of_sum = {1..100} |> Seq.sum |> square

So that the desired difference is:

square_of_sum - sum_of_squares |> printfn "diff = %A"

And there's really nothing much more to say.

Yesterday the Craftzine blog printed these words from Sarah Lohman's blog about historic cuisine:

Why bother deciphering a recipe over 150 years old?

You can take a collection of words and measurements written long ago, and turn it into a physical object. You can create something that looks, smells, and tastes just like it did hundreds of years ago. And that's the next best thing to time travel: it's a window to the past that lets you understand a little bit about another way of life.

I play recorders with a group of like-minded people. We mostly play music of the Renaissance (on Baroque recorders...). Do I feel a connection to the musicians who played this music five hundred years ago? To the composers who wrote the music? To the audience who listened to it? Or danced to it? Well, not really, but I like the idea.

Adieu mes amours by Josquin des Prez in the Odhecaton.

Wednesday, October 21, 2009

Project Euler Problem 5:

What is the smallest number divisible by each of the numbers 1 to 20?

There are several ways to solve this problem. I chose to compute the Least Common Multiple of two numbers using their Greatest Common Divisor:

Using Euclid's algorithm, the F# code for gcd is:

let rec gcd a b =

    match a with

    | a when a = 0L -> b

    | a when a < b -> gcd (b-a) a

    | a -> gcd (a-b) a

And lcm is just as easy:

let lcm a b = (a*b)/gcd a b

Then we can write the code that solves the problem:

seq {1L..20L} |> Seq.fold lcm 1L |> printfn "lcm = %A"

The fold operation calculates a running lcm. At each step the result of the previous step is passed to the lcm function with the current element in the sequence of integers.

The downside of posting my solutions to the Project Euler problems is that I end up second guessing myself; "is that really the best way to solve this?" Then I have to spend some time trying alternative ideas just to convince myself that my first idea really was good enough. Often it is. Sometimes I tweak my solution to make some part of it more reusable. Reduce, recycle, reuse:

On to Project Euler problem number four:

Find the largest palindrome made from the product of two 3-digit numbers.

I created a number of helper functions. The first takes a number and returns a sequence containing the number's digits:

 let rec digits n =

    seq {

        if n > 0 then

            yield n%10

            yield! digits (n/10)

    }

The next one takes a number a reverses it (that is, 1234 would become 4321):

let reverse n =

    digits n |> Seq.fold (fun acc k -> 10*acc + k) 0

With these helpers we can easily check whether a number is a palindrome:

let palindrome n = 

    n = reverse n

So now we can create a sequence containing all the palindromes that result from multiplying two three-digit numbers together:

let palindromes = 

    seq { 

        for i in 1..999 do 

            for j in 1..999 do 

                if palindrome (i*j) then yield i*j 

    }

Finally we can find the largest palindrome:

palindromes |> Seq.max |> printfn "max = %A"

I think that's a pretty good solution.

Growing up I had a set of Pentominoes and a rectangular tray in which they could be placed. I may even still have it somewhere. It's got to be more than 35 years old. Hmmm... I guess it would be pretty miraculous to find it tucked away in a box after four or five major moves (including one from Canada to the U.S.).

Recently a friend gave me a puzzle set with pieces that include some pentominoes and some other things. It seems to have a lot of solutions. Here's one of them:

Tuesday, October 20, 2009

Problem 3 of the Euler Project is:

Find the largest prime factor of a composite number.

As previously mentioned, having a lazy prime generator is kind of nice when it's time to solve this problem. The idea I settled on is to create a prime factorization of the target number. From that list I can then select the largest prime factor.

Here's my prime factorization code:

let rec factors n =

    let root (n:int64) =

        int64 (sqrt (float n))

    let rec exp e p n =

        match n%p with

        | 0L -> exp (e+1L) p (n/p)

        | _  -> e

    let rec pow e p =

        match e with

        | 0L -> 1L

        | e  -> p*pow (e-1L) p   

    let primes = sieve 2L Map.empty

    seq {

        if n > 1L then

            let divisor = primes |> Seq.takeWhile (fun k -> k <= root n) |> Seq.tryFind (fun k -> n%k = 0L)

            match divisor with

            | None      -> yield (n, 1L)

            | Some(k)   ->

                let e = exp 0L k n

                yield (k, e)

                yield! factors (n/pow e k)

}

I think this looks more complicated than it actually is because of the embedded helper functions: the root function computes the square root of an integer, exp computes the number of times a prime number appears in the prime factorization of a composite number and pow raises an integer to a power. It's not clear to me whether it would be better to have these as standalone functions or as embedded ones.

The result of passing a number into factors is a sequence of tuples where the first entry is the prime factor and the second is the number of times that factor goes into the composite number. This function will come in useful in my solutions to later Project Euler problems.

With this factorization function the solution to the problem is:

factors 600851475143L |> Seq.map (fun k -> fst k) |> Seq.max |> printfn "%A"

The mapping step basically throws away the exponent and allows us just to focus on the prime factor.

There is a some inefficiency in this code. Every time factors is called recursively it starts checking prime factors from the very first prime. Yet we know that we've already checked some of them previously. I briefly messed around with fixing this by always pulling from the same sequence of primes rather than creating a new one every time. Then I decided it was more trouble than it was worth.

Friday, October 16, 2009

Overexposed.

I am going to skip ahead to Project Euler problem number 7.
Find the 10001^st prime.

The solution to this problem involves creating a prime number generator which will come in very handy when solving some of the earlier problems.

I have been writing procedural code for something like 25 years now in assembler, FORTRAN, C, C++, Java and C#. I think that my first attempt at a solution shows this background.

I decided to use the Sieve of Eratosthenes as my way of finding primes.

The first problem I faced was "how big is the 10001^st prime?" Poking around in Wikipedia led me to a page on the prime counting function. There's a lot of interesting ideas there but for me the most useful one is that if p_n is the n^th prime then:

for large n.

With that tidbit of information solving this problem is pretty straightforward. The idea is that we create a bit array big enough that we are guaranteed that it will contain the 10001st prime, apply the Sieve algorithm to it, then scan for the one we are interested in:

let prime m =

    let count n =

        let f = float n

        int (f*(log f + log (log f)))

    let flags = new System.Collections.BitArray(count m, true)

    let filter n = 

        match flags.[n] with

        | true  -> for i in n*n..n..flags.Count do flags.[i] <- false

        | false -> ()

    let root n =

        let f = float n

        int (sqrt f)   

    seq {2..root flags.Count} |> Seq.iter filter

    seq {

        for i in 2..flags.Count-1 do if flags.[i] then yield i

    } |> Seq.nth (m-1)



prime 10001 |> printfn "%A"

Note that the filter function applies the "square optimization": we really only need to mark multiples of the current prime as composite starting with its square; smaller multiples of it will be removed when marking multiples of the smaller primes. This lets us avoid marking some numbers as composite multiple times.

This code solves the problem just fine and does so quite quickly (it took about a third of a second on my computer) but I just don't like it very much. It's functional but not very functional.

During my research on the Sieve I came across a paper by Melissa O'Neill called The Genuine Sieve of Eratosthenes. In it she talks about implementing the sieve algorithm in a functional language (Haskell). The idea, as presented in section 3, is to mark numbers as composite "just in time". Rather than use a fixed size bit array to mark which numbers are prime and which are composite it uses a map to keep track of the next multiple of a particular prime.

let rec sieve n table =

    let rec update n p table =

        let n = n+p

        match Map.tryFind n table with

            | None          -> Map.add n p table

            | Some(prime)   -> update n p table

    seq {

        match Map.tryFind n table with

        | None              -> 

            yield n

            yield! sieve (n+1L) (table |> Map.add (n*n) n)

        | Some(prime)       ->

            yield! sieve (n+1L) (Map.remove n table |> update n prime)

    }

We seed this routine with the first prime and an empty map. Each time we request the next element in the sequence the function checks to see if the current number is in the table (and hence is composite). If not, the number is returned and then the map is updated with the first composite number that is not a multiple of a previously discovered prime (i.e. the square of the current number). If the current number is in the table then it is composite so we update the table and then check the next number. Once a number has been discovered to be composite we don't need it in the table any more (the sequence delivers up primes one at a time with no backtracking allowed) so we remove it and add the next multiple of the underlying prime.

This algorithm has a couple of nice features:

The map only contains one entry for each prime we discover.
With a minor change to use BigInts instead of long ints we could generate primes until your computer runs out of memory.

At this point a small helper function will get us the 10001^stprime:

let primeprime n =

    sieve 2L Map.empty |> Seq.nth (n-1) 



primeprime 10001 |> printfn "%A"

Update: This routine runs a bit slower than the first one. It took about half a second to compute the answer. I believe this is because of the immutable map structure used. Every time an item is added or removed a revised copy of the map is created. After 10,000 primes it's doing a lot of work just in bookkeeping. I imagine you could improve performance by using a mutable data structure like System.Collections.Generic.Dictionary. I haven't cared enough about performance to make this change.

Wednesday, October 14, 2009

The rain just would not stop yesterday. Our rainwater tanks are full. They hold about 6000 gallons. One storm equals full tanks!

Update: Apparently we got 11 inches of rain in one day.

Thursday, October 08, 2009

I've been meaning to mention that I'm using MathCast to format the equations in this blog. I render them as images which I then embed in the post.

Wednesday, October 07, 2009

Continuing on with the Project Euler solutions I'm now going to look at the second problem:

Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed four million.

Back when I studied math (B.Sc. Math, UBC 1980) the definition of the Fibonacci sequence was usually given as something like:

The sequence defined by this recurrence relation is:

1, 1, 2, 3, 5, 8, 13, 21, ...

For the purposes of this problem the series is defined a bit differently:

So that the sequence is:

1, 2, 3, 5, 8, 13, 21, 34, ...

If I used the first definition my answer would be off by one. There are a lot of ways to implement this in F#. After briefly flirting with a matrix multiplication formula (in a foolish attempt to avoid recursion) I came up with this:

   

let fib m n =

    let rec inner a b =

        seq {

            let c = a + b

            yield c

            yield! inner b c

        }

    seq {

        yield m

        yield n

        yield! inner m n

    }

For the purposes of this sequence the two parameters of the function are the first two values in the series. I coded it this way so that type inference would allow it to be used with any integer type. This is definitely not the simplest F# implementation of the Fibonacci series. A naive implementation might look like this:

let rec fib_naive n =

    match n with

    | 1|2 -> n

    | k -> fib_naive (k-1) + fib_naive (k-2)

The problem with this code is that in the computation of the n-1^th element requires the computation of the n-2^nd element. But then you turn around and recompute the n-2^nd element all over again. In other words you do the same computation twice which, for large values of n, is just wasteful. Also, to use this function you would need to know the approximate value of n for which fib_naive n is greater than 4 million.

The part I like about the first function is that at each step you already have the previous value of the series computed and you can just use it. Plus the first function is really a generator. You can get the next element from it as many times as you want (or as is allowed by the underlying representation of an integer).

So, given the Fibonacci number generator, the solution to this problem can be written as:

fib 1 2 |> 

    Seq.takeWhile (fun n -> n < 4000000) |> 

    Seq.filter (fun n -> n%2 = 0) |> 

    Seq.sum |> 

    printfn "sum = %A"

To put it in words, take the Fibonacci series and select the values from it that are less than 4 million. Then select all the even numbers in the new subset and add them together.

I find myself somewhat obsessed with Project Euler these days. It contains a series of math challenges that (mostly) require a computer program of some sort to find the right answer. If you start looking around the web there seems to be a tradition of using these problems as a way to learn a new programming language. And that's what I'm doing too. My language of choice is F#.

The first problem in the project is:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

I have come up with four solutions. Here's number one:

seq {for i in 0..999 do if i%3=0 or i%5=0 then yield i} |> Seq.sum |> printfn "total = %A"

Solution number two is to sum the sequence of integer multiples of 3 and 5 and then subtract the overlap (the sum of the multiples of 15). That would look like this:

({3..3..999} |> Seq.sum) + ({5..5..999} |> Seq.sum) - ({15..15..999} |> Seq.sum) |> printfn "total = %A"

The third solution is another variation on the theme of summing a sequence:

seq {1..999} |> Seq.filter (fun k -> k%3=0 or k%5=0) |> Seq.sum |> printfn "total = %A"

Finally, solution number four comes from noting that the sum from k to n of the multiples of k can be written like this:

With some tweaking to get the end of the series correct I translated the math into F# as follows:

let sigma k n = 

    let m = k*((n-1)/k)

    (m*(m+k))/(2*k)

In this case m is the largest integer less than n that needs to be included in the sum (this is not the same as n since the problem states that we are looking at numbers below one thousand). With this helper function the solution to the problem becomes:

(sigma 3 1000) + (sigma 5 1000) - (sigma 15 1000) |> printfn "total = %A"

To time these solutions I used the #time functionality of F# Interactive.

Solution 1 took 00:00:00.010.
Solution 2 took 00:00:00.011.
Solution 3 took 00:00:00.024.
Solution 4 took 00:00:00.003.

It's not at all clear to me why the third solution is two times slower than the first two. As expected the fourth solution is by far the fastest.

And that's way too much analysis for such a simple problem.

Wednesday, September 30, 2009

In case you were wondering, the outfit Mallory is wearing in the previous picture was not included in the package when we took possession. She came to us with a dress and pair of squeaky shoes. The squeaky shoes were cute for about a minute. Then we disabled them.

Friday, June 18, 2004

This is a picture of Tang Jiayun. She is two and a half years old and currently lives in the Guangdong province of China. She's going to be our daughter. A sister to Lyle. We are adopting her. The plan is to go over to China at the end of July and handle all the paperwork necessary for the transfer. We are both excited and overwhelmed. The commitment is staggering.

Tuesday, June 08, 2004

The previous two photos were taken by Gail one day when Paloma and Lyle stopped by the office to say hello. I really like them 'cos Lyle doesn't look like a zombie (as he does in many of his photos). The sight of a camera usually causes his eyes to glaze over and his mouth to hang slackly open. I'll post more of them in the next couple days.