bologma

Sunday, September 11, 2011

Tests as System Specifications

Today I was geeking out, following a long list of articles and papers by computer scientists sparked by Michael Fogus on papers every programmer should read. I ended up checking out Leslie Lamport's (a distributed systems guru) website and downloading his book on Specifying Systems. In its introduction he writes,

A specification is a written description of what a system is supposed to do. Specifying a system helps us understand it. It's a good idea to understand a system before building it, so it's a good idea to write a specification of a system before implementing it.

He also includes the quote:

Writing is nature's way of letting you know how sloppy your thinking is.

This reminds me of the value I've found in writing tests for non-trivial programming modules before implementing them whenever it's not clear to me exactly how the system should work or what it is supposed to do, a topic I wrote about early in my career as a software engineer:

[tests are] a specification for behavior. In essence, you're forced to think in a very concrete way, what is this code supposed to do? What exactly are it's input, outputs, error conditions, etc? And when you have finished writing the tests, you look at your code, it is amazingly simple.

I wouldn't go so far as to say it always makes your program "amazingly simple", bit It does tend to make it simpler than it might have been were it not written only to satisfy the requirements you really need. And most importantly, to get back to what Lamport was saying, it helps you make sure you have really thought through the problem you are trying to solve before you starting writing a system to solve it.

Friday, January 08, 2010

Lyrics to the intro of "What up with that?"

because I could't find it anywhere online, I present the lyrics to "what up with that?"

i woke up this morning and i got out of bed,
had a big old cup of coffee and to clear my head,
telephone ring and you wanna chat
well sit on down and tell me what - up - with - that

oh whee! what up with that? what up with that?
oh whee! what up with that? what up with that?

he said, she said, we said, me said, what up with that?
who knew, you knew, say what? voodoo! what up with that? what up with that?

Sunday, August 16, 2009

Fun with app engine's task queue

I just finished an app engine site that plots some housing rentals in Ann Arbor on google maps (source). I originally started working on it out of frustration that housing maps doesn't support Ann Arbor. About half way through I realized mapskrieg does, but by then I had already found an excuse to play around with some of app engine's latest features so I was happy to finish it up :) Even though it is hard coded to listings in Ann Arbor, it works based off a list of craigslist search urls, so it could be targeted anywhere.

The meat of the server is pretty simple:

class RefreshCron(webapp.RequestHandler):
  "Cron job that refreshes listings for each housing URL"
  def get(self):
    import pickle
    # fire off tasks for each url
    for url in HOUSING_URLS:
      taskqueue.add(
        url="/task/craigsrss",
        payload=pickle.dumps(url),
        headers={"Content-Type": "application/octet-stream"})
    self.redirect("/")

class CraigsRssTask(webapp.RequestHandler):
  """takes in a url to a craigslist rss feed, and grabs
  each entry.  for each one, fires off a DetailsTask to
  get the detail of each listing."""
  def post(self):
    import pickle
    url = pickle.loads(self.request.body) + "&format=rss"
    listings = parseListings(fetchContent(url))
    for listing in listings:
      taskqueue.add(
        url='/task/fetchdetails', 
        payload=pickle.dumps(listing), 
        headers={"Content-Type": "application/octet-stream"})
    self.redirect("/")

class DetailsTask(webapp.RequestHandler):
  """Fetches details from one listing and stores the 
  full listing in the datastore.  The listing is passed in the request 
  body as a pickled dictionary."""
  def post(self):
    import pickle
    listing = pickle.loads(self.request.body)
    url = listing["url"]

    text = fetchContent(url)
    address = extractAddr(text)
    photos = extractPhotos(text)
    lat,longit = parseGeocode(fetchContent(geocodeUrl(address)))
    listingStored = ListingStore(
      title=listing["title"],
      url=url,
      address=address,
      photos=photos,
      lat=lat,
      longit=longit)
    listingStored.put()

A cron job posts a task for each craigslist search url using the experimental task queue feature. Each rss task parses the feed and some initial data about each listing, and then posts another task to gather the rest of the details from the listing url (including address and photo extraction, and geocoding of the address). What's really cool is this all fans out in parallel, which means a 100+ listings can be parsed and ready to go very quickly. With the addition of the task queue, I feel like app engine's request timeout has turned from a pesky limitation to a helpful reminder that if you are doing too much work in any single request handler, you are doing it wrong and should thing about how to break it down into smaller tasks that can be parallelized. Since firing off a task is almost as easy as calling a function, there's really no excuse not to.

The development server does a really nice job with tasks. Instead of firing them all off automatically, you can view each task in an admin console and fire them off one at a time to make sure everything is working properly, and also verify the tasks that you expect to show up do.

Unfortunately, you don't get the same interface once the app is running in production; it would be nice to have a way to see what tasks were pending, or at least flush them out in case a task is repeatedly barfing (as happened to me the first time I uploaded). Another feature I'd like to see was a way to have a handler notified when a queue empties out to facilitate multi-stage parallel processing. But overall I'm impressed with the task queue and look forward to it becoming a core feature (and available on the jvm).

Saturday, July 11, 2009

Understanding the Y Combinator using Java

I've been interested in clojure lately, and therefore functional programming. This lead me to reading about the Y Combinator, which is essentially the theory explaining how recursion can be implemented without naming, and therefore without being able to make the recursive call to the function name itself. It's also a fun puzzle, and I suspect that's why it is so popular among functional programming geeks.

Here is the Y Combinator in clojure:

(defn Y [r]
  ((fn [f] (f f))
   (fn [f]
     (r (fn [x] ((f f) x))))))

This article does a pretty good job explaining how to apply it, but didn't really explain how the Y Combinator works. What I wanted was the Y Combinator in a familiar form where I could step through the code and see what was going on. Since java is the environment that is easiest for me to step through code, I looked around until I came across this article that does a good job explaining the Y Combinator, and provides source code in both lisp and java.

When you perform a direct translation of functional code into java, you get a mess of anonymous inner classes that isn't particularly easy to reason about. So after pasting into into my IDE, I tried to remove all unnecessary anonymous inner classes while still preserving the essence of what the Y Combinator solves: how to implement recursion without making a direct recursive call. I ended up with a lot more code of course, but java's verbosity shines when trying to understand this dense concept.

/**
 * Demonstrates the principles of the Y-Combinator in java.
 */
public class YComboDemo {

    /**
     * Let's start with a simple function implemented 
     * recursively: factorial
     */
    static int factorial(int n) {
        return n == 0 ? 1 : n * factorial(n - 1);
    }

    /**
     * conceptually, factorial implements an interface 
     * that takes an integer and returns an integer
     */
    interface IntFunc {int apply(int n);}

    /**
     * Here it is again in terms of a class implementing 
     * this interface:
     */
    static class SimpleFactorial implements IntFunc {

        public int apply(int n) {
            return n == 0 ? 1 : n * apply(n - 1);
        }
    }

    /**
     * Suppose java didn't support recursive calls.  
     * How can we accomplish the same thing? Let's 
     * start by abstracting away the recursive call by
     * delegating to some other function.
     *
     * This won't work unless we find a way to get 
     * that function to call back into us...
     */
    static class DelegatingFactorial implements IntFunc {
        private final IntFunc f;
        public DelegatingFactorial(IntFunc f) {
            this.f = f;
        }

        public int apply(int n) {
            // we can't call apply directly, so delegate
            return n == 0 ? 1 : n * f.apply(n - 1);
        }
    }

    /**
     * Let's start by defining something that can generate
     * a function in terms of some other.
     */
    interface IntFuncToIntFunc {IntFunc apply(IntFunc f);}

    /**
     * And then implement it for factorial.  This will generate
     * a function that delegates to the passed in argument.
     *
     * But we still need to pass in the right argument to make
     * the "recursion" work...
     */
    static class FactGen implements IntFuncToIntFunc {
        public IntFunc apply(IntFunc f) {
            return new DelegatingFactorial(f);
        }
    }

    /**
     * That's where the Y-combinator comes in.  It finds the
     * fixed point of any {@link IntFuncToIntFunc}.  That is,
     * it finds the function for which calling 
     * {@link IntFuncToIntFunc} one more time will yield the
     * same function.
     *
     * This is all just a fancy way of saying that it contains 
     * the plumbing to make sure the original function that 
     * needs to get calls back into itself will get them.
     *
     * This implements recursion for any 
     * {@link IntFuncToIntFunc} that generates
     * a function that calls back to the passed in function.
     */
    static class YCombinator implements IntFunc {
        private final IntFuncToIntFunc mIntFuncToIntFunc;

        public YCombinator(IntFuncToIntFunc intFuncToIntFunc) {
            mIntFuncToIntFunc = intFuncToIntFunc;
        }

        /**
         * The fixed point works by kicking of the first call to 
         * generate the int function.  We pass in ourselves as 
         * the argument so it will call back to our
         * {@link #apply(int)} as necessary.
         */
        public IntFunc fixedPoint() {
            return mIntFuncToIntFunc.apply(this);
        }

        /**
         * If the function calls back into us, we generate 
         * another function that calls back into us again.  
         * This will keep happening until it knows the 
         * answer itself (its base case).
         */
        public int apply(int n) {
            // setup another function with ourselves as 
            // the callback again
            final IntFunc func = mIntFuncToIntFunc.apply(this);

            // then just apply it
            return func.apply(n);
        }
    }

    public static void main(String args[]) {     
        final IntFuncToIntFunc factGen = new FactGen();

        // if we apply the ycombinator to factgen, we have 
        // a working implementation of factorial
        final IntFunc factorial = new YCombinator(factGen).fixedPoint();

        System.out.println(factorial.apply(3));
  }
}

So the Y Combinator works by serving as a trampoline for a generated function to bounce off back into itself after another function has been generated.

Friday, December 07, 2007

Finally a fast browser for OS X that supports Google apps

I like Macs, but up until recently, I wasn't happy with its available browsers. My first choice was to stick with Firefox, since I am pretty happy with it on Windows and Linux. For some reason though, Firefox has an exclusive Mac OS X feature of sucking; it hangs all the time, and seems quite a bit slower than Safari. So I tried Safari (version 2 at the time), and found it to be alright, but was disappointed that the support for google web apps on it was lacking. Gmail worked at a basic level (no chat integration though), reader sort of worked but had a bunch of subtle UI bugs, and docs just flat out wasn't supported. Camino, which uses the same layout engine as Firefox but has a native OS X UI, seemed like a good choice for a while since it supported Google apps better, but never seemed to fully meet my expectations. It wasn't quite as slick as Safari, and not quite as versatile as Firefox.

I recently upgraded to Safari 3, and was delighted to find full support for gmail, reader and docs and spreadsheets. That isn't the only improvement over Safari 2 either, it has a lot of snappy UI improvements, and seems even faster. Going to Amazon or nytimes.com is like flipping to a TV channel when I'm on a high speed connection and it's the first browser I've used that allows you to pull a tab out into its own window too, something that ends up being very useful when I find myself with 10+ tabs open.

Thursday, December 06, 2007

Url Encoded Charts

Creating by encoding them in a url is just too easy.

I'm guessing we'll start to see a lot more sites like this :)

Saturday, October 27, 2007

Playing WFUV on a Squeezebox

WFUV is a fantastic station my friend recently pointed me to (I was instantly sold when I heard they had a whole day devoted to Neil Young in honor of his upcoming album). I streamed their high quality stream from work via iTunes with no problem.

Then, I was disappointed when the friggin' stream wouldn't play on my squeezebox at home that evening. I found a crappy quality stream on shoutcast that worked, but at 56 kbps, it was hardly worth listening to.

Today I figured it out with curl:

$ curl http://www.wfuv.org/audio/fuv247_128k.pls
[playlist]
NumberOfEntries=1
File1=http://wfuv.ic.llnwd.net/stream/wfuv_allmusic2
Title1=FUV 24/7 All Music Channel, 128k

It turns out the .pls link returns extra information that the squeezebox can't parse, but the direct link 'http://wfuv.ic.llnwd.net/stream/wfuv_allmusic2' works great.