Tuesday, November 10, 2009

Root 2: 500 million glyphs

Having generated a text file with the first billion digits of the square root of 2, I started thinking about how to convert it to text.

First Try

My first try was an attempt to convert groups of 2 or 3 digits into a character using the ASCII code table. This was not very straight forward for several reasons. To start, many characters in the range from 0-255 are either non-printable or produce symbols or punctuation. To get around this, I decided to only use the part of the table that started with the numbers and went through the end of the lower case letters.

That left some 2 digits numbers that had to be scaled up into the useful range and many 3 digit numbers (everything greater than 255) had to be scaled down to the range. I did get a script working that accomplished this, but the results were not satisfying.

Second Try

I decided to use a simpler solution and convert each pair of digits into a letter using the range 1-26. To do this, I added one to the modulus of 26 and each digit pair and indexed it into the alphabet. Because 26 does not go into 99 evenly, this produces a slight bias against the last few letters of the alphabet but I was will to live with the result.

As an example, the first two digits are 14. 14 divided by 26 is 0 with a modulus (remainder) of 14. Adding one to 14 is 15 returning the letter O. The next two digits are 14 also returning O. The next two are 21 returning the letter V.

At the end of another scriptaculous ruby adventure, I had converted my billion digit file into 500 million letters.

Signals in the noise

To make the file easy to search, I wrote the resulting text file out in 72 character lines. Yes, this causes some loss of continuity at line breaks, but again, I was willing to live with it.

I quickly found the string with my name "KEITH" 65 times. The first occurrence was 7,398,524 digits into the square root of 2. My wife's name appeared 125 times.

I found "GOD" 33,663 times with the first appearance at 3,186 digits.

As expected, the smaller the text string, the more likely it is to be found. Most words and strings up to 6 characters can be found in the first billion digits of the square root of 2. Strings 7 characters and longer, like "CALIFORNIA" are often not found.

For fun, I ran a few searches for dates and numbers in the integer file, finding things like my birthday and dates of historic events. Numeric strings are much easier to find.

Next, I'll post one of my ruby scripts to calculate the square root of 2 digit by digit. It is very slow and inefficient, but fun for tinkering. I have a few more ideas for grappling with the root of 2.