Tuesday, December 25, 2007

Taxicab Numbers

Futurama is one of my favorite TV shows. It's the quintessential show about nerds, produced by nerds, for nerds, and it's chock full of inside jokes in math, science, sci-fi and pop culture.

I'm happy to note that although Futurama, the series, was cancelled after the fourth season, the show is back with a set of direct-to-DVD features. The first one, Bender's Big Score, is available now, and includes lecture by Sarah Greenwald as a special feature on the various math jokes that have appeared in the show. Some are silly, like π-in-one oil, or Klein's Beer (in a Klein bottle, of course). Others are subtle, like the repeated appearance of taxicab numbers.

Taxicab numbers are an interesting story in and of themselves. The story comes to us from G. H. Hardy, who was visiting Srinivasa Ramanujan:

I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. "No", he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two [positive] cubes in two different ways."

Many of the staff involved in creating Futurama received advanced degrees in math, computer science and physics, so they enjoy making these obscure references, because they know their fans will find them. It shouldn't be surprising that the number 1729 makes many appearances in Futurama, or that the value 87539319 appears as an actuall taxicab number (87539319 is the sum of 2 cubes in 3 different ways).

Of course, the idea of taxicab numbers is interesting, the details less so. After watching the new feature, I wondered what Hardy's taxicab number was, and what two pairs of cubes add up to that value. If I were more mathematically inclined, I'd probably sit down with pencil and paper and just do the math until I found something. If I were truly lazy, I'd just Google for the answer. Instead, I sat down, fired up vim and typed this code up, pretty much as you see it here:

cube x = x * x * x

taxicab n = [(cube a + cube b, (a, b), (c, d))
| a <- [1..n],
b <- [(a+1)..n],
c <- [(a+1)..n],
d <- [(c+1)..n],
(cube a + cube b) == (cube c + cube d)]

I want to find the smallest taxicab number, but searching an unbounded range will not lead to a result. Therefore, it's necessary to search a bounded range. If there are any taxicab numbers among the integers [1..n], then this function will find them. Moreover, if there are many solutions to this problem in that range, this function will find all of them.

I could go into details, but my explanation would not be as clear and precise as the code above.

Here are some results:

*Main> taxicab 10
*Main> taxicab 12
*Main> taxicab 20

Ordinarily, I wouldn't blog about something so trivial, but as I stared at the code and the output from ghci, it reminded me of an incident in college. The Chudnovsky brothers came up with a new algorithm to compute π that was both very precise and very quick. A professor of mine wrote a quick, throwaway implementation of this algorithm in Maple to demonstrate it. Most of my work in those days was in C. Maple wasn't a tool I'd normally use, but it was easily the best tool for the job.

The lesson my professor was trying to teach me wasn't to use computer algebra systems to indulge a fixation with transcendental numbers. Instead, he was trying to show me that a true computer scientist is presented with a problem, writes a program to solve it, and moves onto the next problem.

I could have written my taxicab number search in C or Java, but I would have lost interest somewhere between #include <stdio.h> or public static void main(String args[]) and firing up the compiler. I could have written the code in Perl, but I would have likely lost interest around the time I saw a nested for loop, and contemplated the four levels of nesting.

Perl programmers know Larry Wall's vision for Perl - to make the easy things easy and the hard things possible. As I stared at this code, I realized that Larry's dictum misses an important cornerstone: to make the trivial things trivial.

Haskell may get a bad rap as making hard things easy and easy things hard, but it does make trivial things trivial. Whether it's list comprehensions solve a problem in a single statement, or parser combinators that make parsing code read like the grammar it implements, there's a lot to be said for a language that gets out of your way and lets you say precisely what you mean.

Wednesday, December 19, 2007

More thoughts on rewriting software

I started writing about when it is acceptable to consider rewriting a software project a few months back (part one and two). I remain convinced that it is occasionally economically sound to toss a problematic codebase and start anew. There are dozens of pesky little details to consider, like:

  • Is the code horrifically overcomplicated, or just aesthetically unpleasing?

  • How soon before the new system can replace the old?

  • How soon before there's a net benefit to the rewrite?

  • Are you rewriting to address fundamental flaws, or adopt this week's fashionable language/library/architecture/buzzword?

  • Do you understand the problem the code attempts to solve? Really? Really?

  • Since the project began, is there a new piece of off-the-shelf piece of software that makes a lot of this pain simply go away?

Steve Yegge looks at the problem from a different angle in his essay Code's Worst Enemy. (See Reginald Braithwaite's synopsis if you want to read just the choice quotes.) Steve argues that code size is easily the riskiest element in a project:

My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain. Tools or no tools. That's what I believe.

Steve's essay focuses on behaviors endemic among many Java programmers that lead to large codebases that only get larger. As codebases grow, they become difficult to manage, both in terms of the tools we use to build software from the code (IDEs and the like), as well as the cognitive load needed to keep track of what 20,000 pages of code do on a 1 million line project.

The issues aren't specific to Java, or even Java programmers. Rather, they stem from an idea that the size of a code base has no impact on the overall health of a project.

The solution? Be concise. Do what it takes to keep a project small and manageable.

Of course, there are many ways to achieve this goal. One way is to rewrite large, overgrown projects and rebuild them from scratch. This could be done in the same language (building upon the skills of the developers currently on the project), or porting the project to a new language. Another strategy involves splitting a large project into two or more isolated projects with clearly defined interfaces and responsibilities. Many more alternatives exist that don't involve ignoring the problem and letting a 1 million line project fester into a 5 million line project.

Interestingly, Jeff Atwood touched upon the issue as well, in his essay Nobody Cares What Your Code Looks Like. Jeff re-examines the great Bugzilla rewrite, when the team at Netscape decided in 1998 to convert the code from Tcl to Perl. The goal was ostensibly to encourage more contributions, since few people would be interested in learning Tcl to make contributions to Bugzilla. Nine years later, and the Bugzilla team considers Perl to be its biggest liability, because Perl is stagnating, and Perl culture values the ability of every Perl programmer to speak a slightly different dialect of Perl. Newer projects written in PHP, Python, Java and Ruby outcompete Bugzilla because (in a couple cases) they are comparable to Bugzilla today (without taking nine years of development to reach that plateau), and can deliver new features faster than the Bugzilla team can.

Nevertheless, Jeff stresses that although it may be ugly, harder to customize and extend, Bugzilla works. And it has worked for nearly a decade. And numerous large projects use it, and have no intentions to switch to the new bug tracker of the week anytime soon. So what if the code is ugly?

There's a lesson here, and I don't think it's Jeff's idea that nobody cares what your code looks like. Jeff is right that delivering value to customers is always most important, whether they are paying customers that keep your company in business, or users who rely on your open source project to supply a critical piece of infrastructure. He's also right that customers couldn't care less if your code uses Factories, Decorators and Iterators, or Closures, Catamorphisms and Currying. It is a disservice to your customers if you force them to wait while you rewrite your software from a static language (like Java) to a dynamic language (like Ruby), because dynamic languages are all the rage this year. Are you going to go through the same song and dance when type inferencing becomes popular? Massive concurrency?

While customers don't care about how a product is written, they do care about the effects of that choice. If your software crashes a lot, needs frequent security patches and occasionally corrupts data, well, maybe you shouldn't have written your timesheet application in C. If your rendering application is slow, uses ludicrous amounts of memory, then maybe you shouldn't have written it in Ruby. If your application provides a key piece of infrastructure, and there's literally no one who could replace you if you get hit by a bus, then maybe you shouldn't have written it in APL.

And if your application is written in a manner that leads to a huge codebase that makes it hard to find and fix bugs, costly to extend and requires a floor full of programmers to maintain, maybe you owe it to your customer to find a way to reduce the size of your codebase and deliver more value. Because sooner or later, that customer just might find a similar application, built with a smaller codebase and less internal complexity that's cheaper own and faster to customize and fix. And when it becomes too costly or too painful to maintain your application, they will switch.

Wednesday, October 3, 2007

Defanging the Multi-Core Werewolf

Many developers are have a nagging sense of fear about the upcoming “multi-core apocalypse”. Most of the software we write and use is written in imperative languages, which are fundamentally serial in nature. Parallelizing a serialized program is painful, and usually involves semaphores, locks, and comes with problems like deadlock, livelock, starvation and irreproducible bugs.

If we’re doomed to live in a future where nearly every machine uses a multi-core architecture, then either (a) we have a lot of work ahead of us to convert our serialized programs into parallelized programs, or (b) we’re going to be wasting a lot of CPU power when our programs only use a single processor element on a 4-, 8-, or even 64-core CPU.

At least that’s the received wisdom. I don’t buy it.

Yes, software that knows how to exploit parallelism will be different. I don’t know that it will be much harder to write, given decent tools. And I certainly don’t expect it to be the norm.

Here is some evidence that the multi-core monster is more of a dust bunny than a werewolf.

First, there’s an offhand remark that Rob Pike made during his Google Tech Talk on Newsqueak, a programming language he created 20 years ago at Bell Labs to explore concurrent programming. It’s based on Tony Hoare’s CSP, the same model used in Erlang. During the talk, Rob mentioned that the fundamental concurrency primitives are semaphores and locks, which are necessary when adding concurrency to an operating system, but horrible to deal with in application code. A concurrent application really needs a better set of primitives that hide these low level details. Newsqueak and Erlang improve upon these troublesome primitives by offering channels and mailboxes, which make most of the pain of concurrent programming go away.

Then, there’s Timothy Mattson of Intel, who says that there are just too many languages, libraries and environments available today for writing parallelized software. Timothy is a researcher in the field of parallel computing, and when someone with such a deep background in the field says the tools are too complicated, I’ll take his word for it. The good news is that very few programmers work on the kinds of embarrassingly parallel problems that require these tools. Working on parallel machines isn’t going to change that for us, either. In the future, shell scripts will continue to execute one statement at a time, on a single CPU, regardless of how many CPUs are available, with or without libraries like Pthreads, PVM or MPI. Parallel programmers are still in a world of hurt, but at least most of us will continue to be spared that pain.

Then there’s Kevin Farnham, who posted an idea of wrapping existing computationally intensive libraries with Intel’s Thread Building Blocks, and loading those wrapped libraries into Parrot. If all goes well and the stars are properly aligned, this would allow computationally intensive libraries to be used from languages like Perl/Python/Ruby/etc. without the need to port M libraries to N languages. (Tim O’Reilly thought it was an important enough meme that he drew attention to it on the O’Reilly Radar.)

This sounds like a hard problem, but adding Parrot to the equation feels like replacing one bad problem with five worse problems. If we’re going to live in a world where CPUs are cheap and parallelism is the norm, then we need to think in those terms. If we need Perl/Python/Ruby/etc. programs to interact with parallelized libraries written in C/C++/Fortran, where’s the harm in spawning another process? Let the two halves of the program communicate over some IPC mechanism (sockets, or perhaps HTTP + REST). That model is well known, well tested, well-understood, widely deployed and has been shipping for decades. Plus, it is at least as language-agnostic as Parrot hopes to become. (+2 points if the solution uses JSON instead of XML.)

Fourth, there’s Patrick Logan, who rightly points out the issue simply isn’t about a multi-core future, but also a multi-node future. Some applications will run in parallel on a single machine, others will run across multiple nodes on a network, and still others will be a hybrid of both approaches. Running programs across a network of nodes is done today, with tools like MapReduce, Hadoop and their kin.

If you have a grid of dual-core machines today, and need to plan out how to best use the network of 64-core machines you will have a decade from now, here’s a very simple migration plan for you: run 32x as many processes on each node!

With that said, here is my recipe for taming the multi-core dust bunny:

  • Determine what kind of parallelism makes sense for you: none, flyweight, fine-grained or coarse grained.
  • Avoid troublesome low-level concurrency primitives wherever possible.
  • Use tools like GHC’s Nested Data Parallelism for flyweight concurrency (one program, lots of data, spread out over multiple CPUs on a single machine).
  • Use tools like GHC’s Software Transactional Memory for lightweight concurrency (many interoperating processes managing shared data on a single machine).
  • Use tools like MapReduce and friends for heavyweight concurrency (work spread out across multiple cooperating processes, running on one or many machines).

As Timothy Mattson points out, parallel programming is fundamentally hard, and no one language, tool or environment is going to slay the dragon. I cite NDP here not as a perfect solution, but as a placeholder for a whole class of tools that exhibit of SIMD parallelism. Similarly, STM is a placeholder for a whole class of tools that exhibit MIMD parallelism. Sometimes you need one, sometimes you need the other, sometimes you need both, and sometimes you need neither.

And then there is the issue of virtualization. Perhaps the best use of a multi-core system isn't to use it as a single multiprocessing computer, but as a host for a series of virtual machines. Such a usage sidesteps all of the thorny issues around parallelism entirely, focusing instead on cost savings that accrue from server consolidation and simplified management. This is a very old idea that becomes more and more important as power efficiency in our data centers becomes a hot button issue.

Finally, there’s a looming question about what to do about the desktop. If your laptop has 32 cores, what do you do with them? The simple answer is nothing. As CPUs get faster, they spend more and more of their time in an idle state. The only thing that changes in a multi-core world is that more CPUs are idle. Desktop programmers can spend a lot of time evenly distributing that idleness across all CPUs, or make very few changes, and use only as many CPUs as necessary. Operating systems and basic tools (emulators, compilers, VMs, database engines, web servers, etc.) will need to be multi-core aware and distribute their work across as many CPUs as are available. Some of that work is already done -- make -j has been around for years. Processing intensive applications, like audio/video codecs, image manipulation and the like, will also need to be multi-core aware. The vast majority of the programs we write will continue to be mostly serial, and rarely care about parallelism.

After all, authenticating a user doesn’t get 32x faster on a 32-core machine.

Sunday, September 30, 2007

Rewriting Software, Part 2

When I wrote my previous post about rewriting software, I had a couple of simple goals in mind. First was answering Ovid’s question on when rewriting software is wise, and when it is foolish. Second was to re-examine Joel Spolsky’s dictum, never rewrite software from scratch, and show how software is so complex that no one answer fits all circumstances. Finally, I wanted to highlight that there are many contexts to examine, ranging from the “small here and short now” to the “big here and long now” (to use Brian Eno’s terminology).

When I wrote that post, I though there would be enough material for two or three followup posts that would meander around the theme of “yes, it’s OK to rewrite software”, and move on. The more I wrote, the more I found to write about, and the longer it took to condense that material into something worth reading.

Rather than post long rambling screeds on the benefits of rewriting software, I decided to take some time to plan out a small series of articles, each limited to a few points. Unfortunately, I got distracted and I haven’t posted any material to this blog in over a month. My apologies to you, dear reader.

Of course, there’s something poetic about writing a blog post about rewriting software, and finishing about a month late because I couldn’t stop rewriting my post. There’s a lesson to learn here, also from Joel Spolsky. His essay Fire and Motion is certainly worth reading in times like these. I try to re-read it, or at least recall his lessons, whenever I get stuck in a quagmire and can’t see my way out.

In that spirit, here’s a small nugget to ponder.

If you are a writer, or have ever taken a writing class, you’ve probably come across John R. Trimble’s assertion that “all writing is rewriting.” Isn’t it funny that software is something developers write yet fear rewriting?

There’s a deep seated prejudice in this industry against taking a working piece of software and tinkering with it, except when it involves fixing a bug or adding a feature. It doesn’t matter if we’re talking about some small-scale refactoring, rewriting an entire project from scratch, or something in between. The prejudice probably comes from engineering — there’s no good reason to take a working watch or an engine apart because it looks “ugly” and you want to make it more “elegant.”

Software sits at the intersection of writing and engineering. Unlike pure writing, there are times when rewriting software is simply a bad idea. Unlike pure engineering, there are times when it is necessary and worthwhile to rewrite working code.

As Abelson and Sussman point out, “programs must be written for people to read, and only incidentally for machines to execute.” Rewriting software is necessary to keep code concise and easy to understand. Rewriting software to follow the herd or track the latest trend is pointless and a wasted effort.

Tuesday, August 21, 2007

Rewriting Software

Ovid is wondering about rewrite projects. It’s a frequent topic in software, and there’s no one answer that fits all situations.

One of the clearest opinions is from Joel Spolsky, who says rewrites are “the single worst strategic mistake that any software company can make”. His essay is seven years old, and in it, he takes Netscape to task for open sourcing Mozilla, and immediately throwing all the code away and rewriting it from scratch. Joel was right, and for a few years Mozilla was a festering wasteland of nothingness, wrapped up in abstractions, with an unhealthy dose of gratuitous complexity sprinkled on top. But this is open source, and open source projects have a habit of over-estimating the short term and under-estimating the long term. Adam Wiggins revisited the big Mozilla rewrite issue earlier this year when he said:
[W]hat Joel called a huge mistake turned into Firefox, which is the best thing that ever happened to the web, maybe even the best thing that’s ever happened to software in general. Some “mistake.”
What’s missing from the discussion is an idea from Brian Eno about the differences between the “small here” vs. the “big here”, and the “short now” vs. the “long now”. Capsule summary: we can either live in a “small here” (a great apartment in a crappy part of town) or a “big here” (a beautiful city in a great location with perfect weather and amazing vistas), and we can live in a “short now” (my deadline is my life) or a “long now” (how does this project change the company, the industry or the planet?).

On the one hand, Joel’s logic is irrefutable. If you’re dealing with a small here and a short now, then there is no time to rewrite software. There are revenue goals to meet, and time spent redoing work is retrograde, and in nearly every case poses a risk to the bottom line because it doesn’t deliver end user value in a timely fashion.

On the other hand, Joel’s logic has got more holes in it than a fishing net. If you’re dealing with a big here and a long now, whatever work you do right now is completely inconsequential compared to where the project will be five years from today or five million users from now. Requirements change, platforms go away, and yesterday’s baggage has negative value — it leads to hard-to-diagnose bugs in obscure edge cases everyone has forgotten about. The best way to deal with this code is to rewrite, refactor or remove it.

Joel Spolsky is arguing that the Great Mozilla rewrite was a horrible decision in the short term, while Adam Wiggins is arguing that the same project was a wild success in the long term. Note that these positions do not contradict each other. Clearly, there is no one rule that fits all situations.

The key to estimating whether a rewrite project is likely to succeed is to first understand when it needs to succeed. If it will be evaluated in the short term (because the team lives in a small here and a short now), then a rewrite project is quite likely to fail horribly. On the other hand, if the rewrite will be evaluated in the long term (because the team lives in a big here and a long now), then a large rewrite project just might succeed and be a healthy move for the project.

Finally, there’s the “right here” and “right now” kind of project. Ovid talks about them briefly:
If something is a tiny project, refactoring is often trivial and if you want to do a rewrite, so be it.
In my experience, there are plenty of small projects discussed in meetings where the number of man hours discussing a change or rewrite far exceeds the amount of time to perform the work, often by a factor of ten or more. Here, the answer is clear — just do the work, keep a back up for when you screw up, and forget the dogma about rewriting code. If it was a mistake, rolling back the change will also take less time than the post mortem discussion.

Ovid raises another interesting point: large projects start out from smaller ones, so if it’s OK to rewrite small projects, and small projects slowly turn into large projects, when does it become unwise to rewrite a project?

The answer here isn’t to extrapolate based on project size, but rather on the horizon. A quick little hack that slowly morphs into something like MS Word will eventually become rewrite-averse due to short term pressures. A quick little hack that slowly morphs into something like Firefox will remain somewhat malleable, so long as it can take a long time to succeed.

Monday, August 20, 2007

Universal Floating Point Errors

Steve Holden writes about Euler’s Identity, and how Python can’t quite calculate it correctly. Specifically,
e i π + 1 = 0
However, in Python, this isn’t quite true:
>>> import math
>>> math.e**(math.pi*1j) + 1
If you note, the imaginary component is quite small: -1 x 10-16.

Python is Steve’s tool of choice, so it’s possible to misread his post and believe that python got the answer wrong. However, the error is fundamental. Witness:
$ ghci
Prelude> :m + Data.Complex
Prelude Data.Complex> let e = exp 1 :+ 0
Prelude Data.Complex> let ipi = 0 :+ pi
Prelude Data.Complex> e
2.718281828459045 :+ 0.0
Prelude Data.Complex> ipi
0.0 :+ 3.141592653589793
Prelude Data.Complex> e ** ipi + 1
0.0 :+ 1.2246063538223773e-16
As I said, it would be possible to misread Steve’s post as a complaint against Python. It is not. As he says:
I believe the results would be just as disappointing in any other language
And indeed they are, thanks to irrational numbers like π and the limitations of IEEE doubles.

Updated: corrected uses of -iπ with the proper exponent, .

Wednesday, August 15, 2007

Does Syntax Matter?

An anonymous commenter on yesterday’s post posits that Haskell won’t become mainstream because of the familiar pair of leg irons:
I think one of the biggest problems in Haskell, aside from it not being very easy (whats a monad?), is syntax.
There are many reasons why Haskell may not become mainstream, but syntax and monads aren’t two of them. I’m a populist, so I get offended when a language designer builds something that’s explicitly designed to drag masses of dumb, lumbering programmers about half way to Lisp, Smalltalk, or some other great language. I want to use a language built by great language designers that they themselves not only want to use, but want to invite others to use.

I could be wrong here. Maybe being a ‘mainstream programming language’ is means designing something down to the level of the great unwashed. I hope not. I really hope not. But it could be so. And if it is, that’s probably the one and only reason why Haskell won’t be the next big boost in programming language productivity. That would also disqualify O’Caml, Erlang and perhaps Scala as well. Time will tell.

But syntax? Sorry, not a huge issue.

Sure, C and its descendants have a stranglehold on what a programming language should look like to most programmers, but that’s the least important feature a language provides. Functional programmers, especially Lisp hackers have been saying this for decades. Decades.

A language’s syntax is a halfway point between simplifying the job of the compiler writer and simplifying the job of the programmer. No one is going back and borrowing syntax from COBOL, because it’s just too damn verbose and painful to type. C is a crisp, minimal, elegant set of constructs for ordering statements and expressions, compared to its predecessors.

Twenty years ago, the clean syntax like C provided made programming in all caps in Pascal, Fortran, Basic or COBOL seem quaint. Twenty years from now, programming with curly braces and semicolons could be just as quaint. Curly braces and semicolons aren't necessary, they're just a crutch for the compiler writer.

To prove that syntax doesn’t matter, I offer 3 similar looking languages: C, Java (or C#, if you prefer) and JavaScript. They all use a syntax derives from C, but they are completely separate languages. C is a straight forward procedural language, Java is a full blown object oriented language (with some annoying edge cases), and JavaScript is a dynamic, prototype-based object oriented language. Just because a for loop looks the same in these three languages means absolutely nothing.

Knowing C doesn’t help you navigate the public static final nonsense in Java, nor does it help you understand annotations, inner classes, interfaces, polymorphism, or design patterns. Going backward from Java to C doesn’t help you write const-correct code, or understand memory allocation patterns.

Knowing C or Java doesn’t help much when trying to use JavaScript to its full potential. Neither language has anything resembling JavaScript’s dynamic, monkeypatch everything at runtime behavior. And even if you have a deep background in class-based object oriented languages, JavaScript’s use of prototypes will strike you as something between downright lovely and outright weird.

If that doesn’t convince you, consider the fact that any programmer worthy of the title already uses multiple languages with multiple syntaxes. These typically include their language of choice, some SQL, various XML vocabularies, a few config file syntaxes, a couple of template syntaxes, some level of perl-compatible regular expressions, a shell or two, and perhaps a version or two of make or a similar utility (like Ant or Maven).

Add that up, and a programmer can easily come across two dozen different syntaxes in a single project. If they can’t count that high, it’s not because they do all their work in a single syntax[1], but because it takes too much effort to stop and count all of the inconsequential little syntaxes. (Do Apache pseudo-XML config files count as a separate syntax? Yeah, I guess they do. It took that Elbonian consultant a day to track down a misconfigured directive last year…)

So, no, Mr. Anonymous. Haskell’s syntax isn’t a stumbling block. You can learn the basics in an afternoon, get comfortable within a week, and learn the corner cases in a month or two.

Now, as for monads - the problem with monads is that they seem harder to understand than they really are. That is, it is more difficult to explain what a monad is than it is to gain a visceral understanding of what they do. (I had this same problem when I was learning C — it was hard to believe that it was really that simple.)

If you caught my introduction to Haskell on ONLamp (parts 1, 2 and 3), you may have seen this tidbit right before the end of part 3:
[M]onads enforce an order of execution on their statements. With pure functions, sub-expressions may be evaluated in any order without changing their meaning. With monadic functions, the order of execution is very important.
That is, monads allow easy function composition that also ensures linear execution, much like you would expect from writing a series of statements within a function in C, a method in Java, or a block of Javascript. There are other interesting properties of monads, but this is the most fundamental.

[1]: Lisp and Smalltalk programmers might honestly count one single syntax for all their work. :-)

More SPJ

If, like me, you couldn’t make it to OSCon this year and missed Simon Peyton Jones’ presentations, then you may want to catch up with these videos:Enjoy!

Haskell: more than just hype?

Sometimes I wonder if the Haskell echo chamber is getting louder, or if programmers really are fed up with the status quo and slowly drifting towards functional programming. My hunch is that this is more than a mere echo chamber, and interest in Haskell and functional programming is for real.

I’m hardly an objective observer here, since I’m solidly within the echo chamber (note the name of this blog). Functional programming has been on my radar for at least a decade, when I first started playing with DSSSL. Coding with closures now feels more natural than building class hierarchies and reusing design patterns, regardless of the language I am currently using.

If you still aren’t convinced that Haskell is more than a shiny bauble lovingly praised by a lunatic fringe, here are some recent data points to consider, all involving Simon Peyton Jones:
  • Bryan O’Sullivan pointed out last week that the Simon’s talks are the most popular videos from OSCon:
    “Simon’s Haskell language talks are the most popular of the OSCON videos, and have been viewed over 50% more times than the next ten most popular videos combined.”

  • In Simon’s keynote presentation at OSCon last month, he points out that threading as a concurrency model is decades old, easy enough for undergraduates to master in the small, but damn near impossible to scale up to tens, hundreds or thousands of nodes. The research on threads has stalled for decades, and it’s time to find a better concurrency model that can help us target multiprocessor and distributed systems. (Nested data parallelism and software transactional memory both help, and are both available for experimentation in Haskell today.)

  • An interview with Simon and Erik Meijer that introduces the topic of nirvana in programming.

    Start by separating programming languages along two axes: pure/impure and useful/useless. The top left has impure and useful languages like C, Java and C#. The bottom right has pure and useless languages like Haskell (at least before monadic I/O). The sweet spot is the top right, where a pure, useful language would be found, if it existed.

    However, the C# team is borrowing heavily from Haskell to produce LINQ, and the Haskell research community is folding the idea back into Haskell in the form of comprehensive comprehensions. Both languages are slowly converging on nirvana: C# is getting more pure, and Haskell is getting more useful.

The subtext in all of these tidbits is that computing is not a static discipline. We know that hardware improves at a predictable pace, in the form of faster CPUs, cheaper memory, denser storage and wider networks.

Software, or at least the way we construct software, also improves over time. One unscientific way to express this is through relative differences in productivity between programming languages. It’s reasonable to expect a project to take less time to create when using Java instead of C, and even less time when using Ruby or Python instead of Java or C#.

By extension, there should be a new language that is even more productive than Python or Ruby. I happen to think that language will be Haskell, or at least very similar to Haskell. But it might also be OCaml, Erlang or Scala. In any case, Simon Peyton Jones is right — the way forward involves functional programming, whether it means choosing a language like Haskell, or integrating ideas from Haskell into your language of choice.

Tuesday, July 24, 2007

Intro to Haskell, Parts 1, 2 and 3

It's been a while since I've posted here (too long, in fact!).

Readers of this blog would probably be interested in a set of three articles I'm writing for the O'Reilly Network. The first two are available now:My goals here are to reach the core audience for ONLamp: Perl, Python, PHP and Ruby programmers. Haskell is a very big step away from dynamic languages, and it's also a big change from statically typed languages in the ALGOL family (C, and its descendants C++, Java and C#). The first article makes the case that, although Haskell is strange and different, it's worth the effort to learn -- even if you never manage to use it on a job or a side project. The second article shows how it is possible to write programs using pure functions in the absence of all the familiar tools -- classes, objects, globals and mutable variables.

I'm working on part 3, which will be an introduction to monads. This is something of a rite of passage, because writing monad intros is something of a cottage industry in the Haskell world -- either you've written one, or you're still a dabbler, beginner, or casually interested in Haskell (and not committed to delving into it).

Articles on ONLamp should be article length, oddly enough. That means roughly 2000 words or so. Given that tutorials like All About Monads run dozens of pages, condensing a full tutorial into a single article is hopeless.

Rather than attempt to outdo what many other fine tutorials are going very well already, I'm taking a different approach. The one thing I want to focus on is the mechanics of monads. That is, how they work. Maybe that will be the missing link that the next wave of programmers need to see before they can study Haskell.

Part 3 should be published in a couple of weeks.

Wednesday, June 6, 2007

Solving Collatz Sequences

Steve, over at cod3po37ry, is learning Haskell and working through some contest problems, starting with the Collatz sequence. Steve doesn't mention it explicitly, but this is an interesting unsolved problem in mathematics. The problem starts out with a simple function over a single positive integer n:
  • if n is even, halve it
  • if n is odd, triple it and add one
A sequence can be generated by starting with some positive integer n, generating the result of this function, and continually applying this function on the series of results until it produces a value of 1:
f 2 = [2,1]
f 3 = [3,10,5,16,8,4,2,1]
f 4 = [4,2,1]
f 5 = [5,16,8,4,2,1]
f 6 = [6,3,10,5,16,8,4,2,1]
f 7 = [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
It's an open question as to whether or not this function always converges on a value of 1 for any positive integer n. Wikipedia claims that this property has been proven up through at least 10 × 258, but it's unclear if it's true for all positive integers.

Steve's solution is a little, um, cumbersome, and he makes it clear that his goal is to learn:
This post follows the Law of Usenet: if you post a question, nobody will answer it. If you post an answer, they'll all rush to correct it. I welcome corrections.
In that spirit, here is my response. :-)

Unlike most languages I have used, Haskell has a very interesting property -- if you find yourself writing a lot of code, chances are you're doing something wrong. Generally, this is because the Prelude and the standard library have a rich set of tools for building solutions from the bottom up. Writing a lot of code usually means that you're ignoring a key abstraction somewhere. Haskell programs tend to be short because they can be short; many common problems have generalized solutions in the library already. If you find a general problem that's not solved in the standard library, implementing a general solution typically involves writing one missing function.

The first step to solving this problem involves the collatz function which implements the interesting 3n + 1 property:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2
Next, there's the function to produce a "collatz sequence" starting with a number n and (hopefully) terminating with the number 1. Fortunately, this behavior is precisely what the iterate function in the Prelude provides:
iterate :: (a -> a) -> a -> [a]
That is, iterate takes a function and a seed value, and returns a list that contains the seed, and an infinite sequence of values produced by applying the function to the previous value.

Therefore, expanding this simple collatz function to a function that produces a collatz sequence is simply:
collatzSequence :: Int -> [Int]
collatzSequence = iterate collatz
Actually, that's not quite correct, since a collatz sequence terminates with a value of 1:
collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs
Sure enough, this function works as expected:
*Main> collatzSequence 7
That solves the problem of generating collatz sequences. But the contest problem was to find the longest collatz sequence within a range of numbers. Given:
1 10
100 200
201 210
900 1000
produce this output:
1 10 20
100 200 125
201 210 89
900 1000 174
Generating these results is simple enough. First, convert a collatz sequence to its length:
*Main> length $ collatzSequence 7
Next, convert a range of integers into their collatz lengths:

*Main> map (length . collatzSequence) [1..10]
Next, pick out the largest sequence in the range:
*Main> maximum $ map (length . collatzSequence) [1..10]
Next, consume a line of input, perform this calculation, and produce a line of output:
run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])
And finally, write the main function that consumes all input and produces the desired output:
main = do inp <- getContents
mapM_ run (lines inp)
Here is the completed program in its entirety:
collatz :: Int -> Int
collatz 1 = 1
collatz n = if (odd n)
then (3 * n + 1)
else n `div` 2

collatzSequence :: Int -> [Int]
collatzSequence = terminate . iterate collatz
terminate (1:_) = [1]
terminate (x:xs) = x:terminate xs

run :: String -> IO ()
run s = do let (i:j:_) = map read (words s)
m = maximum $ map (length . collatzSequence) [i..j]
putStrLn (concat [show i, " ", show j, " ", show m])

main = do inp <- getContents
mapM_ run (lines inp)
Hope this helps!

Thursday, May 24, 2007

Lessons from the White-bearded Professor

The first part of my Introduction to Haskell series came out on ONLamp.com today. As always, there was a lot of material I wanted to mention that didn't fit. This first part is an exploration of why haskell deserves more widespread attention. (The next two parts cover functions and monads.)

One topic I wanted to cover dates back to when I was an undergrad. One of my professors, Jim Maginnis, was something of a village elder in computing. He wasn't one of the pioneers in computing that won a Turing Award, or wrote prolifically about his favorite pet trends in computing, or a discoverer of anything fundamental.

This white-bearded, gentle professor was happy teaching decades' worth of students the skills they needed to go out in the world and solve real problems. He felt great joy in both the topics and the students he taught, and that feeling was heartfelt and infectious.

One of the stories Prof. Maginnis told dates back to when he consulted with big businesses as they started computerizing their operations in the 50s, 60s and 70s. He began by recommending every project start by hiring a mathematician for a week or two to study the problem. (Heck, hire two grad students -- they're cheaper!) His point was that if a mathematician could find an interesting property or algorithm, then it would be money well spent, and drastically reduce the time/money/effort needed to develop a system. If they didn't find anything, well, it was only a week or two, and mathematicians are cheap, anyway.

That always struck me as sound advice. Certainly more practical in the early days of computing when everything was new. Today, most projects feel a lot more mundane and predictable, and maybe it isn't as necessary.

But there's always room for good research and deep thought. That's the kind of thinking that gave us relational database engines, regular expressions, automatic garbage collection and compiler generators.

I keep thinking about this anecdote when I describe Haskell to someone for the first time. Instead of taking two grad students for two weeks, get a few dozen of PhDs around the world focused on a problem for a couple of decades. Haskell is one of the things you might end up with. So are Unix, Erlang and Plan9, for that matter.

I wonder what Prof. Maginnis would think of the world of computing if he were alive today. I can't say for sure, but more than a few brilliant computer scientists have been working for more than a few weeks on solving some really hard problems. And I think he would be very happy with the results.

Wednesday, May 23, 2007

Haskell: Ready for Prime Time

Bryan O’Sullivan, Don Stewart and John Goerzen announced today that they are working together on a book for O'Reilly that covers Haskell programming. Their mission is to cover the topics that are left uncovered in Haskell texts that focus on being undergraduate textbooks. The draft outline reads like a set of diffs that a Java/Perl/Python/Ruby programmer needs to understand to solve the problems they already know in a new language. And that is a very good thing indeed.

Last week, Eric also announced that he is working on a Haskell book for the Pragmatic Programmers.

It sounds like at least two publishers think there is pent-up demand for a decent, practical Haskell book that could sell more than "30 units per month". And that is certainly a good thing.

Good luck, everyone. I can't wait to read your books. :-)

Thursday, May 17, 2007

Analyzing Book Sales

O'Reilly does the tech community a great service by publishing their quarterly analyses of tech book sales. Yesterday, Mike Hendrickson posted part 4 of the Q1 07 analysis, which details programming languages.

The book sales statistics aren't meaningful in and of themselves, because they simultaneously overstate and understate what's actually happening within the tech world. The general opinion is that Java is important, Ruby is hot, and Fortran is deader than dead, and the book sales statistics seem to back this up. Other data, like counting job postings on the web, can lead to similar conclusions. However, this isn't the entire story -- Fortran is still an incredibly useful language in certain circles (physics and climate simulations, for example), even if the people who rely on it don't buy books or hire frequently.

Although book sales don't paint the whole story, they do provide show some interesting trends and point to questions that deserve further investigation.

For example, Java and C# books each sell at a combined rate of over 50,000 units per quarter, which reinforces (but does not prove) the view that most of the 'developer mass' is focused on these languages. JavaScript, Ruby, and PHP are among the languages that are now shipping over 25,000 units per language per quarter.

In Mike's 'second tier' grouping are languages like Perl and Python, which are now shipping roughly 10,000 units per quarter. That probably deserves some additional analysis; Perl is somewhat stagnant, due to Perl 5 appearing to be in maintenance mode as Perl hackers wait (and wait, and wait) for Perl 6. Also, many recent Perl titles have been hyper-focused on specific modules that aren't widely used, or are only marginally better than the online documentation. So perhaps this indicates that Perl programmers don't buy many Perl books, or perhaps it means that Perl programmers are moving to Java/C#/Ruby to pay the mortgage. Either way, this data is inconclusive.

In the penultimate tier are languages that sell less than 1,000 units per quarter, and include Tcl, Lisp, Scheme, Lua and Haskell. Interestingly, 345 Haskell books sold in Q1 07, compared to 47 in Q1 06, an over 600% increase year-on-year. However, Mike also points out: "[t]he four Haskell titles are averaging fewer than 30 copies per month".[1]

Again, this data is interesting, but inconclusive. Ruby titles experienced a meteoric rise a couple of years ago, when the Pragmatic Programmers released two bestsellers around Ruby. Before that, Ruby titles sold poorly; recently, they are selling over 25,000 units per quarter, and the trend is increasing.

Maybe the answer here is that Haskell is poised to make a similar meteoric rise, and leave the functional programming ghetto. Maybe the answer is that people interested in learning Haskell aren't buying expensive new books, but rather buying used copies, borrowing books, or learning from online references. Maybe the answer is that the four English-language titles on Haskell aren't delivering what the book-buying public needs, indicating that the time is right for a Pragmatic Haskell to repeat the success of Programming Ruby.

Somehow, I think the answer lies in a combination of these three possibilities.

I know that when I was going through Haskell: The Craft of Functional Programming and The Haskell School of Expression, I read through them to figure out what the compiler was trying to do with the code I was trying to write. Once I got comfortable, I didn't refer back to them at all. Instead, I read through numerous papers and online tutorials. The online docs and ghci were what I referred to most to explain what was going on. Today, I am quite comfortable lending out my Haskell books to a close friend. When I was programming in Perl for a living, I would have been much more likely to order a second copy of a Perl book (through work, of course) to lend to a friend than to let go of my only copy of a book.

I wonder if my experience is typical or aberrant, and what that means to the future prospects of selling Haskell books in 2008 or 2009...

[1] The same analysis points out that Practical OCaml is selling under 40 units per quarter. With so little data to analyze, I wonder if this reflects the lack of buzz around OCaml, a general lack of interest in OCaml, or a general lack of interest in this particular title...

Thursday, May 3, 2007

Parsing JSON

Here is the JSON parser I referred to in the previous post. It's not heavily documented because it's pretty close to a 1:1 translation of the specification.

There are some rough edges in this parser. The test suite includes the number 23456789012E666, which is out of range of IEEE doubles, and is read in as Infinity. While this value can be read in as something meaningful, it cannot be emitted, since there is no provision in JSON to express values like Infinity, -Infinity or NaN. The pretty-printer does not re-encode strings containing Unicode characters or escaped characters into a JSON-readable format. Finally, malformed inputs cause exceptions (error).

module JSON where

import Data.Char
import Data.Map hiding (map)
import Text.ParserCombinators.Parsec hiding (token)


data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)

-- Convenient parse combinators

token :: Parser a -> Parser a
token p = do r <- p
return r

comma :: Parser Char
comma = token (char ',')


parseJSON :: String -> JsonValue
parseJSON str = case (parse jsonFile "" str) of
Left s -> error (show s)
Right v -> v

jsonFile :: Parser JsonValue
jsonFile = do contents <- jsonObject <|> jsonArray
return contents

-- JSON Object
jsonObject :: Parser JsonValue
jsonObject = do pairs <- between open close (sepBy jsonPair comma)
return $ JsonObject $ fromList pairs
open = token (char '{')
close = token (char '}')

jsonPair :: Parser (String, JsonValue)
jsonPair = do key <- token(jsonString)
token (char ':')
value <- token(jsonValue)
return (toString key, value)
toString (JsonString s) = s
toString _ = ""

-- JSON Array
jsonArray :: Parser JsonValue
jsonArray = do values <- between open close (sepBy (token jsonValue) comma)
return $ JsonArray values
open = token (char '[')
close = token (char ']')

-- Any JSON Value
jsonValue :: Parser JsonValue
jsonValue = do spaces
obj <- token(jsonString
<|> jsonNumber
<|> jsonObject
<|> jsonArray
<|> jsonTrue
<|> jsonFalse
<|> jsonNull
return obj

-- JSON String
jsonString :: Parser JsonValue
jsonString = do s <- between (char '"' ) (char '"' ) (many jsonChar)
return (JsonString s)

isValidJsonChar ch = (isAscii ch) && (isPrint ch) && (ch /= '\\') && (ch /= '"')

hexToInt s = foldl (\i j -> (16 * i) + j) 0 (map digitToInt s)

jsonChar = satisfy isValidJsonChar
<|> do char '\\' -- escaping backslash
char '\\' -- escaped character
<|> char '"'
<|> char '/'
<|> (char 'b' >> return '\b')
<|> (char 'f' >> return '\f')
<|> (char 'n' >> return '\n')
<|> (char 'r' >> return '\r')
<|> (char 't' >> return '\t')
<|> do char 'u'
hex <- count 4 (satisfy isHexDigit)
return $ chr (hexToInt hex)

-- JSON Number
jsonNumber :: Parser JsonValue
jsonNumber = do i <- int
frac <- option "" frac
e <- option "" expo
return $ JsonNumber (read (i ++ frac ++ e))

int :: Parser String
int = do sign <- option "" (string "-")
value <- (string "0" <|> many1 digit)
return (sign ++ value)

frac :: Parser String
frac = do char '.'
digits <- many1 digit
return ( '.':digits)

expo :: Parser String
expo = do e <- oneOf "eE"
p <- option '+' (oneOf "+-")
n <- many1 digit
return (e : p : n)

-- JSON Constants
jsonTrue = token (string "true") >> return JsonTrue
jsonFalse = token (string "false") >> return JsonFalse
jsonNull = token (string "null") >> return JsonNull

-- A JSON Pretty Printer
pprint v = toString "" v

toString indent (JsonString s) = show s
toString indent (JsonNumber d) = show d
toString indent (JsonObject o) =
if (o == empty)
then "{}"
else "{\n" ++ showObjs (indent ++ " ") (toList o) ++ "\n" ++ indent ++ "}"
toString indent (JsonArray []) = "[]"
toString indent (JsonArray a) = "[\n" ++ showArray (indent ++ " ") a ++ "\n" ++ indent ++ "]"
toString indent (JsonTrue) = "true"
toString indent (JsonFalse) = "false"
toString indent (JsonNull) = "null"

showKeyValue i k v = i ++ show k ++ ": " ++ toString i v

showObjs i [] = ""
showObjs i [(k ,v)] = showKeyValue i k v
showObjs i ((k, v):t) = showKeyValue i k v ++ ",\n" ++ showObjs i t

showArray i [] = ""
showArray i [a] = i ++ toString i a
showArray i (h:t) = i ++ toString i h ++ ",\n" ++ showArray i t


Namespaces Confusion

I haven't said much about my talk for FringeDC in March. I've been meaning to write this up for a while.

This is a story about a horrific blunder. Thankfully, no Bothans died bringing this information to you.

As I mentioned previously, I wrote a JSON parser to demonstrate how to write a real, live working Haskell program. I started by working off of the pseudo-BNF found on the JSON homepage. From the perspective of the JSON grammar, the constructs it deals with are objects (otherwise known as Maps, hashes, dicts or associative arrays), arrays, strings, numbers, and the three magic values true, false and null.

My first task was to create a data type that captures the values that can be expressed in this language:
data Value = String String
| Number Double
| Object (Map String Value)
| Array [Value]
| True
| False
| Null
deriving (Eq)
With the datatype in place, I then started writing parsing functions to build objects, arrays, and so on. Pretty soon, I had a JSON parser that passed the validation tests.

I used this piece of working Haskell code during my presentation, highlighting how all the parts worked together -- the parsers that returned specific kinds of Value types, those that returned String values, and so on.

Pretty soon I got tongue tied, talking about how Value was a type, and why String was a type in some contexts, and a data constructor for Value types in other contexts. And how Number wasn't a number, but a Value.

I'm surprised anyone managed to follow that code.

The problem, as I see it, is that I was so totally focused on the JSON domain that I didn't think about the Haskell domain. My type was called Value, because that's what it's called in the JSON grammar. It never occurred to me as I was writing the code that a type called Value is pretty silly. And, because types and functions are in separate namespaces, I never noticed that the data constructor for strings was called String.

Thankfully, the code was in my editor, so I changed things on the fly during the presentation to make these declarations more (ahem) sane:
data JsonValue = JsonString String
| JsonNumber Double
| JsonObject (Map String JsonValue)
| JsonArray [JsonValue]
| JsonTrue
| JsonFalse
| JsonNull
deriving (Show, Eq)
I think that helped to clarify that String is a pre-defined type, and JsonString is a value constructor that returns something of type JsonValue.

When I gave this presentation again a couple of weeks ago, the discussion around this JSON parser was much less confusing.

Lesson learned: let the compiler and another person read your code to check that it makes sense. ;-)

Wednesday, April 11, 2007

Fight the next battle

Tim O'Reilly posted some book sales data that indicates PHP is going mainstream:
We've noticed that one of the signs that a language is becoming mainstream (and perhaps being abandoned by the cutting edge developers) is that the For Dummies book becomes the top seller. [...]

In Q1 of 2005, the Dummies book was #7; in Q1 2006, #5; in Q1 2007 it's #1. Not only is the Dummies book now #1, four of the top five titles are now introductory.
Tim raises an important point: PHP is going mainstream, and cutting edge developers are looking elsewhere.

Many of the want ads I've seen for PHP recently read like we're looking for someone young and cheap to work like mad and crank out web pages. This is a little depressing, but not entirely surprising. The web has been the dominant development platform for over a decade; no one needs to pay top dollar to just build web apps (or worse, maintain and configure them).

Why would a professional developer want to use PHP and work on a project where someone who picked up PHP For Dummies last month is nearly as qualified? The whole point behind being a professional developer is accepting progressively more challenging tasks, not switching from one entry level skill to the next.

None of this reflects on PHP itself, just PHP's status as a mainstream tool suitable for beginners. Sure, there's interesting work for PHP experts, but the bulk of PHP work doesn't need an expert of any stripe.

Tim's point is an expression of Paul Graham's Python Paradox, but in reverse:
if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.
Because PHP skews to beginners and is becoming mainstream, professional programmers are moving away from it.

The problem with the Python Paradox is that it's both vague and fuzzy. By Paul's logic, if I'm starting a company, I should be using Factor or Io, because those languages are extremely esoteric.

Somehow, I don't think esoteric-ness is the important quality.

People who program professionally and truly care about their craft want to solve increasingly hard problems. They want to walk away from the old drugery and focus on what matters: solving the problem. Using the same tools over and over again means each project starts by fighting the last battle over again. Only then can the real work start: fighting the next battle, and solving the new problem.

The "fight the last battle" syndrome helps explain why languages like Python and Ruby are on the rise with professional programmers, while languages like Java and PHP are on the decline. Java uses a heavyweight, cumbersome, static object model, while Python and Ruby use a lightweight, dynamic object model, supplemented with powerful core types like lists and dictionaries. PHP makes it possible to build web applications, while Ruby and Python offer a variety of frameworks to that make web applications easier to build with much less effort.

Thus, Java and PHP are on the decline because they use old ideas and enforce older, more cumbersome styles of programming to solve problems than those offered by Python and Ruby.

As Ruby and Python make dynamic languages respectable, the whole "static vs. dynamic" language debate sounds like fighting last battle to me. It's tired. It's boring. It's a solved problem, they both work, and it's time to move onto something new.

What's the next battle? I'm not sure, but I've got a few ideas:
  • Functional programming over OOP and Imperative programming
  • Strong typing over weak static typing or dynamic typing
  • Mathematical foundations for concurrency over ad hoc models (i.e. threads)
  • Parsing over regex hackery
  • Implicit parallelism
I guess this is why I'm more and more comfortable spending time with Haskell these days. Haskell may not be the next thing, but whatever the next big thing is, it's probably going to have Haskelly fingerprints all over it.

Feedback on H:ACT

I haven't posted anything since I gave my talk to FringeDC last month. Here's how it went.

First, I want to thank Tom Moertel for his comment on limiting the amount of material to cover. I had outlined a lot of stuff that I wanted to cover, and it didn't really fit or flow very well. So I just started cutting stuff out. That actually worked very well.

It was an informal talk, and mostly extemporaneous. The talk took about 90 minutes, with a decent amount of time for questions and discussion throughout. The slides were about the first half, and hit the highlights:
  • program at the highest layer of abstraction possible
  • formal mathematical models for programming aren't scary
  • you understand lambda calculus, even if you didn't know you understand it
  • type inferencing is good
  • explicit typing is better, and not odious
  • monads are wonderful
The second half was a walk through of two Parsec parsers I wrote -- one to parse JSON in about a page of code (including a pretty printer), and a roman numeral parser. Both are trivial programs that do something more meaningful than the overused examples of fibonacci and factorial. There was some interesting discussion in this part of the talk.

At the end of it all, Conrad Barski presented his demo of a project he's working on to simplify tagging. It was pretty interesting demo of something that he whipped up in MzScheme.

About 20-25 people showed up for the talk, which is pretty impressive in DC. Many had some Haskell experience, dabbled in it, or were in the process of learning it. (I saw a few copies of various Haskell texts sitting around the room.)

The most embarrassing part of the meeting came during the end of my slides, when I was discussing one benefit of the type system. Programming using the standard types is useful, but doesn't let the type checker know what you're trying to do. The true benefit is defining domain specific types and letting the type checker determine whether your program is sane.

The example I used was the Mars Climate Orbiter, the mission that crashed into Mars because of a confusion between lbf and Newtons. Then I threw up some handwavy examples to show operations could be defined in terms of either Newtons or pounds of force, and let the compiler prevent this kind of mixup. The examples were handwavy because I wanted to refer people to the dimensional library, which automatically converts units of force, and prevents stupid things like adding feet to seconds and coming up with nonsense.

Why was this embarassing? Because I said that this was Bjorn Bringert's library, when actually it's Bjorn Buckwalter's library. And, unbeknownst to me, Bjorn Buckwalter was sitting right in front of me during the entire presentation. What does he use this library for? Making sure things like the Mars Climate Orbiter don't happen again on his watch. :-)

Overall, the talk went rather well. I got a lot of good feedback, and I think I covered just enough material.

Thursday, March 22, 2007

Haskell: A Cook's Tour

(I've been meaning to post this for a while now....)

I'm going to be giving a talk on Haskell for FringeDC, the Washington, DC area group for programmers who like to live on the edge. ;-)

The talk is called Haskell: A Cook's Tour. It's really hard to condense what's so cool about Haskell into an hour or two. (I did a 3 hour tutorial at OSCon last year, and boy, did that feel rushed! And that just scratched the surface!) Instead, I'm going to try my best to hit the high points: the type system, monads, parsec, and one or two more gems.
6pm, Saturday, March 24, 2007

Synergy Workspaces
1325 G St, NW
(Near Metro Center)
More details here.

Slides will be available (at some point) after the talk.

Monday, March 12, 2007

Say what you mean, mean what you say

Reg Braithwaite wrote an interesting tribute to John Hughes' paper Why Functional Programming Matters. In it, Reg points out the difficulty of using iterative methods to express underlying intent:
int numberOfOldTimers = 0;
for (Employee emp: employeeList) {
for (Department dept: departmentsInCompany) {
if (emp.getDepartmentId() == dept.getId()
&& emp.getYearsOfService() > dept.getAge()) {
This just goes to show that there is simply too much code necessary to solve this simple problem in an imperative fashion. It's not an indication that the code is poorly written, but rather, that imperative techniques are a bad fit because they obscure the overall intent.

The nested for loop isn't the goal. The goal is finding the number of employees who have been in the company longer than their department (i.e. "old timers"). As I read this code, I kept re-reading it because I thought I found a bug. I thought the code was counting all employees who have been in the company longer than any department, but that was my misreading. I kept glossing over this clause, which focuses the search to employees who have been with the company longer than their own department:
emp.getDepartmentId() == dept.getId()
Sadly, it's too easy to fall into this trap. There's a lot of stuff to read in this small block of code, and I certainly found it easy to gloss over an important detail. Undoubtedly, the opposite problem is also common: neglecting an important detail in the middle of a series of nested loops; finding those bugs also takes skill and effort to identify and fix.

Reg does a good job explaining why the goal isn't a nested for loop, and why using functional programming techniques (like generating a cartesian product, and filtering the result) bring the code closer to the goal -- counting the number of old timers.

Reg makes the point that what really matters are the functional techniques, not the language or the syntax used. Here's his revised solution, in Ruby (with an extension to support cartesian products on lazy lists):
old_timers = (employees * departments).select do |emp, dept|
emp.department_id == dept.id && emp.years_of_service > dept.age
number_of_old_timers = old_timers.size
Generating cartesian products just might be one of the fundamental ideas of computing. In an imperative language, they usually take the form of nested loops. In a language that supports list comprehensions (like Haskell or Python) it becomes a basic idea that melts away with just a little syntax:
[(x, y) | x <- [1..5], y <- [1..10]]
Not only are the products computed as a list, but in Haskell, they're computed lazily, so you only compute what you need.

Here's a revised solution to the old timers problem using list comprehensions:
length [1 | e <- emps, d <- depts, 
(empDeptId e) == (deptId d),
(yearsOfService e) > (age d)]
Note that the actual values returned in this list are inconsequential; the only thing that matters is counting the number of matches. Using length on this list comprehension highlights the fact that the goal here is to produce a count of matching elements. This intention is muddled within the nested for loops, and isn't precisely clear within the Ruby version.

Fundamental Ideas of Computing

What are the fundamental ideas in computing? The most basic ideas must be boolean algebra and binary numbers. From there, it's a short hop to ASCII and maybe Unicode, and then to arrays. I'm not sure what else falls into the category of 'fundamental ideas of computing', but sorting and searching must be in there somewhere.

I was thinking about what would make the list, and I couldn't come up with a complete list of topics, and two of those topics -- sorting and searching -- are a kind of black magic. Quicksort is a marvelously simple algorithm, but it's so easy to get wrong that the standard approach is to use a library implementation, or, in a worst case scenario, copy code out of a good textbook (ideally Knuth). Writing a quicksort function from memory is almost certainly a way to introduce bugs into a program. Searching a sorted binary tree is less error prone, but still one of those algorithms that's best to reuse or copy instead of writing from memory.

Why is this? Why are there fundamental ideas in computing that are best reused from a library or copied out of a textbook?

I'm not a chemist, nor am I a physicist, but I still remember a great deal from my high school chemistry and physics classes. Certain parts of the periodic table are deeply imprinted in my brain, as are Newton's laws of motion (and the calculus that relates position to velocity to acceleration). Yet there is something so complicated about quicksort and binary search that it's best not to write it down on a cocktail napkin.

Something is wrong here. The basic ideas in a profession should be so, um, basic that any professional should be able to recite them from memory on demand.

The problem isn't so much that these ideas are so arcane that they should come from a textbook whenever they are used. The problem is that our expression of these simple ideas is overly complex.

Here is a simple expression of C.A.R. Hoare's quicksort that's simple and easy to understand (optimized for clarity):
qsort [] = []
qsort (x:xs) = (qsort lesser) ++ [x] ++ (qsort greater)
lesser = filter (< x) xs
greater = filter (>= x) xs
Or, optimized for terseness:
qsort [] = []
qsort (x:xs) = qsort [y | y <- xs, y < x]
++ [x]
++ qsort [y | y <- xs, y >= x]
The problem with quicksort isn't that it's a fundamentally difficult algorithm to understand. As expressed here, the algorithm is quite simple (and fits on a cocktail napkin). As it is typically expressed in a language like C, it is optimized to have an additional property: it sorts the array in place. The typical Haskell version doesn't offer this property, but at least it highlights the algorithm, not the mechanics needed to support the typical optimization.

For more details, see the discussion on the Haskell wiki, which includes a canonical implementation in C for comparison.

Another classic algorithm is a search within a sorted binary tree. Binary trees are another foundation in computing that are frequently more complicated than they should be. Here is a definition that is simple, obvious and easy to remember:
data Tree a = Empty | Node (Tree a) a (Tree a)
That is, a tree is a node with a value and two sub-trees, or it is an empty node. Let's assume that the contents of a tree are always in sorted order. That is, the left subtree contains nodes that are strictly smaller than the current node, and the right subtree contains nodes that are strictly greater than (or equal to) the current node.

Performing a binary search on this structure should be simple, and it is:
bsearch _ Empty = False
bsearch p (Node l a r)
| p < a = bsearch p l
| p == a = True
| p > a = bsearch p r
A quick google turned up a gem from Ralf Hinze: Functional Pearl: A fresh look at binary search trees (JFP, November 2002). Ralf's paper also discusses the other necessary algorithms for dealing with binary trees: insertion, deletion, and balancing.

Snippets like this are one of the reasons why I like Haskell so much. As I was driving around today, I wondered how I would write a binary search, when I visualized these two snippets of Haskell pretty much as you see them here. This is certainly something I would never do if I were still thinking in C, because the pointer manipulation and memory management issues obscure the intent behind a binary search as to render it unrecognizable.

Now I'm wondering how many fundamental, yet "complex and error prone" concepts can be expressed with a screenful of Haskell.

Trends in Programming Languages

Ian Bicking has a nice synopsis of PyCon 2007. In it, he writes:
Another thing I like is that people seemed to really be using Python to do useful things. There wasn't nearly so much time spent talking about Why Python Is Good, or How To Sell Python To Your Boss. That's kind of silly anyway -- if you are at the conference, you don't need to be sold on it. But even so, the language seems to be in a much more self-confident place, and the users don't need to spend so much time justifying their choices.
Ian's comment reminds me of what the bad old days were like ten years ago. Java was the new kid on the block, had a huge marketing budget, and dominated the discussion on new development. Java was a net positive, because it displaced C, C++ (and, to some degree, VisualBasic) on many projects that simply needed a better language. At the same time, it was a huge bunch of hype, and it focused attention away from perfectly capable languages like Lisp, Perl, Python and Tcl.

As a result, there was a movement for 'language advocacy' to convince people that $YOUR_PET_LANGUAGE was demonstrably better than Java in at least some areas. A well-balanced argument might mention that Java was better in some areas, but in practice, those areas weren't particularly important. On a bad day, it was about promoting $YOUR_PET_LANGUAGE over Java and just about everything else.

I haven't seen much language advocacy recently, and I certainly haven't seen much of it in relation to Haskell (or its cousin, OCaml). I, too, think this is a good thing. I'm not sure why it's happening, but I've got some theories. Perhaps it's because I've grown as a programmer, and I avoid those arguments, so I don't notice them anymore. Perhaps it's because 'language advocacy' has gone out of vogue. I have a feeling it's the latter, but I can't prove anything.

Ian certainly brings up an important point. Being successful, building systems, and being confident your tools are capable certainly makes a powerful case. Convincing others that your tools are better, or at least worthy of consideration is a losing position. Some teams are happy and wildly productive with Java (or C, or C++), and it doesn't make sense to switch languages because they might offer something nebulous like more productivity or more joy. Some teams are very risk averse, so sticking to something mainstream like Java or C# makes sense, since there's little doubt someone will be able work on the system five to ten years from now.

Then there's the Paul Graham effect. In his essay Beating the Averages, Paul postulates that some languages are simply more powerful than others, and demonstrates that a good team can always manage to build a system, regardless of the tools they choose.

Since Paul released that essay, Java and C# have also evolved. So maybe there's an expectation of tool choice -- not only whether to use Java or C#, but also which version of the VM to use, or whether it's wiser to avoid those issues entirely and opt for a stable platform like Python or Ruby.

Whatever the reasons are, I'm happy that we're moving past 'language advocacy'.

Tuesday, March 6, 2007

Design Patterns in Haskell: push and pull

I've had a nagging feeling ever since my post on what's wrong with the for loop.

When for loops (and while loops) are the only tools at your disposal, every iteration construct is just an instance of a loop. When you have higher order functions and closures, then you start seeing different kinds of iteration constructs: maps, folds and filters for example. As an added benefit, the higher order approach also also eliminates the scaffolding that leads to bugs.

The problem is, that's not the entire story.

The difference between the for loop and the higher order functions is the same as the difference between push and pull templates in XSLT1.

Push templates are more natural, and easier to write in XSLT. They work with data, instead of against it:
<xsl:template match="para">
XSLT is really just a Scheme dialect that uses an event based virtual machine and an XML syntax. When the XSLT engine finds an element in the XML source document (an XML open tag, a chunk of text, a comment, whatever), it looks for a template rule to determine how to output that element. The template above matches <para> elements in a source document, and replaces them with <p> elements in the output document. The body of the <p> element on output is handled by the <xsl:apply-templates/> instruction, highlighted above. This instruction tells the XSLT engine to start examining the children of the current <para> element in the source document, and look for templates that describe how they are to be formatted in the output document, and place those outputs within a <p>...</p> block.

This leads me to one of my favorite XSLT techniques -- the null template:
<xsl:template match="double-secret-probation"/>
This template doesn't contain any content. Specifically, it doesn't contain an <xsl:apply-templates/> instruction. Therefore, whenever a <double-secret-probation> element is found in a source document that triggers this template, it and all of its children will be pruned from the output document.

Pull templates are the alternative to push templates. Instead of letting data flow through the stylesheet, they pick data out of the source document, and place it directly into the output document:
<xsl:template match="body">
<xsl:for-each select="para">
<xsl:value-of select="text()"/>
This template actively seeks out <para> elements that are children of a <body>, replaces them with <p> tags (as above), and then hunts down the text contents of each <para> element, shoving it directly into the <p>...</p> blocks. The two constructs above, <xsl:for-each> and <xsl:value-of>, are used to pull content out of the input document and into the output document.

If it sounds complicated, it is. XSLT makes it easy to reformat XML documents by providing an implicit processing model that visits every node of an XML document. Input can be rearranged, replaced, or removed in the output. Pull templates, on the other hand, can usually perform the same transformations, except that it generally takes more work to write (and understand) to do it.

In general, it's much better for XSLT stylesheets to use push templates whenever possible, and save pull templates for those rare situations when they are absolutely necessary.

As an exercise for the reader, write a stylesheet that emits this fragment and preserves all of the formatting, but eliminates the chunks labeled <double-secret-probation>:
The 11<sup>th</sup> Annual Homecoming Parade
will take place Thursday at <time>1100 PST</time>.

<double-secret-probation>Under no circumstances will
Delta House participate.</double-secret-probation>
(Hint: the two push templates above will be key parts of the solution. I don't even want to think about how to write a solution using a pull template....)

A few years ago, I taught on-site XSLT training courses. They went from the basic (this is a stylesheet, this is a template) all the way through advanced techniques (how to add commas between elements in a list, but not after the final element). I found that students whose exposure to XSLT started with "view source" always started with pull templates, and translated algorithms out of a Java program into XSLT syntax. It took a while to explain push templates, and appreciate how they simplified the problem domain. (Having copious examples helped, including a few that were nearly impossible with pull templates, but trivial with push templates.)

This is because loops and similar "pull techniques" are some kind of deep knowledge that many of us learn as beginning programmers. It was the only realistic approach when the only available language was some form of assembly, and it propagated through Fortran and ALGOL into every imperative language still in use today.

We can do better now. Computers are plenty fast to handle whatever meager overhead is necessary to apply functional idioms like map, fold/reduce and filter to transform and reduce aggregate values, whether we use those idioms in imperative languages like Perl, Python or Ruby, or in a purely functional language like Haskell.

The good news is that once you understand how to push data through a computation, trivial operations become, well, trivial. Here's the sum of all the odd numbers from 1 to 100:
sum (filter odd [1..100])
Perhaps this is why imperative programmers hesitate to use functional languages: pull techniques are so ingrained that it is hard to leave them behind and start using the push techniques that really make programming easier and more expressive...

1: For a more detailed discussion on push and pull templates with XSLT, see Bob DuCharme's excellent article on XML.com.

Monday, March 5, 2007

Design Patterns in Haskell: bracket

Last week, Pascal Costanza wrote up a nice explanation of Lisp (and Scheme) macros.

It starts with a block of code like this (in Java, in this example):
FileInputStream in = new FileInputStream(filename);
try {
} finally {
The two important things to note are creating the input stream, and doing something with the input stream. Everything else is required scaffolding to make error handling work properly. This example is particularly clean, hiding all of the file operations behind a doSomething method. However, the operation could be written inline, so that instead of writing 6 lines of code to get 2 lines of work done, you see 24 lines of code to get 20 lines of work done.

The problem isn't that, in this example, the scaffolding adds 200% more code (as measured in lines, using typical Java style). The problem is that every time you want to open a file (or connect to a database, or ....), you need to write at least 4 extra lines of scaffolding to handle exceptions.

A common response in the functional programming world is to factor out this common scaffolding into reusable code. Pascal describes how to create a with-open-file macro that lets you write this:
(with-input-file (in filename)
(do-something in))
...which expands into the Lisp equivalent of the Java code above:
(let ((in (open filename)))
(do-something in)
(close in)))
This should be familiar if you've seen Lisp macros before, or come across a discussion of (with-open-file ...). However, the real power is in abstracting the required exception scaffolding, not in using macros to achieve that goal.

The power of (with-open-file ...) is undeniable, and has long since leaked out of the Lisp world. It's a common idiom in Ruby, available to all IO objects:
File.open("testing", "w") {|fh|
fh.puts "Hi!"
It's interesting that Ruby offers the same idiom using blocks (er, closures) and not macros. In fact, it's the same technique used in Haskell. You can find it in Control.Exception.bracket:
bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
As I was writing this post, I went looking through the Haskell Hierarchical Libraries index for a definition of withOpenFile, but I didn't find it. Instead, I found a definition lurking in the documentation for bracket:
withOpenFile:: FileName -> FileMode -> (Handle -> IO a) -> IO a
withOpenFile name mode = bracket (openFile name mode) hClose
If you didn't have a function like bracket available, it would be trivial to write:
bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
bracket acquire release action =
do h <- acquire
result <- action h
release h
return result
Interestingly, bracket is a completely general function that abstracts the idea of wrapping IO operations with an acquire and release action. Due to how errors are handled within the IO monad, processing within the action function terminates at the first error, so no explicit try/finally block is necessary.

It's not absolutely necessary to require a function like withOpenFile appear within the standard library. Instead, you can define as many wrappers as you want, using the precise combination of IO actions you need. For example, you may want to define withInputFile and withOutputFile separately. You may want to use openBinaryFile, openTempFile, openFd, runInteractiveCommand or createPipe instead of openFile. It doesn't really matter; each of these functions are just one-line partial applications of bracket.

Note that bracket only works on IO operations, but it's an implementation of a much more generic idea. Looking over the library index, you'll find many functions in the standard libraries named with...: withArgs, withArray, withCString, withMVar, and withSocketsDo are some interesting examples.

Now, I don't want this to be a "Haskell is better than Lisp" (or "Lisp is better than Haskell," etc.) kind of post. Quite the contrary actually. Lisp offers a set of tools to simplify code, and Lispers have taken to use macros to solve this kind of problem. Haskell doesn't have macros, but still enables these 'with...' bracketing patterns, and makes them downright trivial to write, even if you have to write them from scratch. What matters is identifying this kind of repeated scaffolding is a design pattern, giving it a name, and making it easy to use without writing it out longhand each and every time it's needed.