3.10.2009

97 Things for Software Architects

97 Things Every Software Architect Should Know has finally been published, and it's been worth the wait. I have watched it germinate all the way from a friend's whimsical idea on a mailing list ("Hey, this would make a good book!") to its birth as hardcopy.

I contributed two of the 97 'Things', Context Is King and Heterogeneity Wins. In Context is King, I postulate that there are no global axioms in the domain of software architecture, but rather that (any) ideal does not live out of context. In Heterogeneity Wins, I encourage software architects to think outside of the stack, this because I cannot count the number of times that I have consulted on a project where the problem domain was poorly understood, but the technological solution - or, the correct BigCo Framework Extension piece - had already been chosen.

As far as I am aware, 97 Things is the world's first open-source book, in that all contributions were pre-published and peer-reviewed, and all are Creative Commons-licensed. Furthermore, all of the book's content is available online so you can try before you buy. But of course, you can't curl up in bed with the web (the Kindle notwithstanding), so you'll probably still want a dead-tree version :-).

An interesting aspect of the book is how some of the axioms contradict one another. For example, It's all about performance contradicts my own Context Is King, because I propose that it is only within scope and business objectives (i.e. context) that an architect has the necessary framework to make the right decisions, while the other promotes the idea that - requirements notwithstanding - nobody wants a system that runs like a snail. (And, while I acknowledge the basic truth of that premise, I would point out that it is context that always denotes how fast a system should run, and furthermore that sometimes snail-fast is perfectly acceptable, e.g. if some other over-arching business goal is met).

It is worth noting that 97 Things will have very good shelf-life relative to many other software engineering tomes. The advice it contains is fairly generic and technology-agnostic. I favor books that exhibit these qualities, that I know will still be relevant 10 years from now, or (more realistically) after the 6 month lapse has passed between the time I bought it and when I actually found time to study it :-)

Software architecture is a nascent, black art, and we are still finding our way; I hope within the book you find stimulus and inspiration.

3.26.2008

The *Real* Erlang "Hello, World!"

This *is not* it:


-mod(hello).
-export([start/0]).

start() ->
io:format("Hello, World!").

I propose that the purpose of a "Hello, World!" program is to communicate something essential about the programming language in a small space. The program above does not achieve this - relative to implementations in other languages - predominantly because it omits anything to do with the Actor model, which is a core part of what makes Erlang interesting.


I propose that the following should be considered The Real Erlang "Hello, World!":


-module(hello).
-export([start/0]).

start() ->
spawn(fun() -> loop() end).

loop() ->
receive
hello ->
io:format("Hello, World!~n"),
loop();

goodbye ->
ok
end.

Let's dissect this example to see why. To run this program, install Erlang, fire up the Erlang REPL erl.exe and follow along.

First we compile and load the program with the commandc(). Note that we omit the ".erl" file extension when referring to the module. Also note that I startederl.exe in the directory containing hello.erl such that I was not required to type in the full path.


1> c(hello).
{ok,hello}

Erl responds with ok and the name of the compiled module.

The start() function is the only function we can invoke in the hello module, because it's the only one that is exported, as per the module's export statement. This is how Erlang implements encapsulation, in that the exported functions form the public interface of the module. The list of exported functions are of the form name/arity, where name is the name of the function and arity is a formal way of saying "the number of arguments it takes".

Invoke the start() function within the hello module, assigning the return value to a variable called Pid:


2> Pid = hello:start().
<0.36.0>

The spawn function returns a Pid - a Process Identifier - which is a first-class Erlang data type. We assign this return value to a variable of the same name. (We could just as easily have assigned it to a variable namedFoo, but using Pid is fairly common). Note that variables in Erlang need to start with an uppercase letter.

Erl responds by pretty printing the process identifier <0.36.0>; all valid expressions in Erlang have a return value.

At this juncture, if you try to assign any other value to Pid, you will get a badmatch exception. Once a value has been bound to an identifier, it cannot change: Erlang is a single-assignment language. The benefits of this paradigm include the ability for the compiler and runtime to make fancy optimizations, and it also greatly eases debugging because variables are immutable.


The Sharp End


The spawn invocation starts an Erlang process which wraps the loop() function just below it. (Note that Erlang doesn't impose any order of definition on functions). Erlang processes are the essence of programming in Erlang, and the essential missing element in simpler "Hello, World!" examples. Processes are the Erlang implementation of the Actor model: extremely lightweight concurrency primitives that communicate purely by message-passing. They have nothing whatsoever to do with operating system processes, threads or similar, and are managed entirely by the Erlang runtime.

The process waits (semantically at the receive statement) for a message which matches one of its receive clauses.

We can send a message to the process using an exclamation mark (the message-send operator) followed by the message. We can see that the receive block has two clauses which match both hello andgoodbye.

We invoke the code within the 'hello' clause by sending the corresponding message to our cached Pid:


3> Pid ! hello.
Hello, world!
hello

As we expect, our process responds with, "Hello, World!". And as noted before, Erlang returns a value for all valid statements, this is why we see hello printed out immediately following the output of io:format.

The following line does a tail-recursive call back to loop(). In case you didn't follow the link and aren't completely familiar with tail recursion, you should know that tail-recursion is the bombay duck of computer science: there is no recursion going on, at least in the sense that anything is left on the stack. Tail recursion is a means of efficiently calling the current function, and is more akin to a goto or a jump instruction than the terminology would have you believe.

So, given the tail-recursive call back to loop(), the process is once again put back into the wait state. We could send the hello message to Pid ad nauseum and the process would simply repeat.

Now we send the goodbye message:


4> Pid ! goodbye.
goodbye

The crucial difference between this clause and the clause that matches the hello message is that this clause does not include a tail-recursive call back to loop(). As a result, the process effectively dies. We can confirm this by attempting to invoke the code in the hello clause once again:


5> Pid ! hello.
hello

And we see that no output is generated.

The last important detail that I have omitted is the type of hello and goodbye. These are erlang atoms, an extremely simple data type whose value is itself. Atoms are used heavily in message-passing (and other pattern-matching contexts) and are very easy to work with: you simply declare and go!


Re-Entry Checklist


Although the explanation has been verbose, I hope you agree that this Erlang "Hello, World!" communicates some interesting essentials of the Erlang programming language. These essentials concern in particular how Erlang implements the Actor Model, which is the kernel of its message-passing semantics and a key enabler for Erlang's capability for massively concurrent processing.



3.07.2008

Erlang List Comprehensions: By Necessity

List comprehensions are a powerful programming language construct that provide a concise means of dynamically generating lists of values. Erlang and Python, amongst other languages, support them.

There is an interesting observation to be made about Erlang list comprehensions, namely that emergent properties of Erlang make list comprehensions a practical necessity as opposed to merely being "kewl" syntax.

I had to write a simple function that takes a point tuple {Row,Column} identifying a square on a bounded plane (think checkers) and return the squares that neighbor it to the north, west, south and east (so to speak).

The bounded plane has point {1,1} in the top-left hand corner, with rows that increment downwards (as per Excel) and columns that increment to the right. So, given the point {2,2}, the function should return {1,2} (the point to the 'north'), {3,2} (south), {2,1} (west) and {2,3} (east). Also, the notion of a bounded plane implies that, given an edge point e.g. {1,2}, the function should return only the three points {1,1} (west), {1,3} (east) and {2,2} (south), as there is no point to the north: it is 'out of bounds'. Finally, given a corner point e.g. {1,1}, there are only two neighbors {1,2} (south) and {2,1} (east). Simple enough, yes?

Let's take a look at how this can be implemented in a couple of programming languages before admiring the solution that employs list comprehensions.

Here are the unit tests for what I previously described using C# 2.0 and NUnit 2.4.3.0:


[Test]
public void TestFourNeighboredPoint()
{
IList<Point>neighbors = Point.Neighbors(new Point(2,2));
Point[] expectedNeighbors = new Point[4] { new Point(1,2), new Point(3,2), new Point(2,1), new Point(2,3) };

Assert.IsTrue(neighbors.Count == 4);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}

[Test]
public void TestThreeNeighboredPoint()
{
IList<Point> neighbors = Point.Neighbors(new Point(1,2));
Point[] expectedNeighbors = new Point[3] { new Point(1,1), new Point(1,3), new Point(2,2) };

Assert.IsTrue(neighbors.Count == 3);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}

[Test]
public void TestTwoNeighboredPoint()
{
IList<Point> neighbors = Point.Neighbors(new Point(1,1));
Point[] expectedNeighbors = new Point[2] { new Point(1,2), new Point(2,1) };

Assert.IsTrue(neighbors.Count == 2);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}

(Aside: I use both a generic IList<point> and a Point[] so I can Array.TrueForAll(), which is something of a poor man's Array#map. This way we avoid having to iterate, thus keeping the code very concise, but at the expense of this anomaly).

Now, the implementation of Point.Neighbors written in a curly-brace language - i.e. without the benefit of list comprehensions - could resemble something like the following C# 2.0:

internal static IList<Point> Neighbors(Point p)
{
IList<Point> neighbors = new List<Point>();

if (p.Row - 1 > 0)
neighbors.Add(new Point(p.Row - 1, p.Column)); // north

if (p.Row + 1 <= MAP_SIZE)
neighbors.Add(new Point(p.Row + 1, p.Column)); // south

if (p.Column - 1 > 0)
neighbors.Add(new Point(p.Row, p.Column - 1)); // west

if (p.Column + 1 <= MAP_SIZE)
neighbors.Add(new Point(p.Row, p.Column + 1)); // east

return neighbors;
}

This code obviously works because the temporary variable neighbors is mutable. We mutate neighbors (i.e. change its value) depending on the bounds we check on the Row and Column of the given Point p.

Now, this code doesn't translate into Erlang very elegantly at all (to put it mildly), because Erlang doesn't support mutable variables. In Erlang, after a value has been bound to an identifier, the value cannot change. This language property is known as single-assignment and is not unique to Erlang. So, Erlang doesn't really have variables, at least if we take the term literally, in the sense that they can vary.

To illustrate what impact single assignment has on code written in the style above, here is the closest possible translation of the curly-brace code above in Erlang:

!! DO NOT DO THIS !!

if
% Normal case: not on any boundary
Row > 1, Col > 1, Row < ?MAP_SIZE, Col < ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col+1}, {Row+1, Col}, {Row, Col-1}];

% Row is 1, but Col is not on a boundary
Row == 1, Col > 1, Col < ?MAP_SIZE ->
[{Row, Col+1}, {Row+1, Col}, {Row, Col-1}];

% Col is 1, but Row not on a boundary
Col == 1, Row > 1, Row < ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col+1}, {Row+1, Col}];

% Row is boundary, Col not on a boundary
Row == ?MAP_SIZE, Col > 1, Col < ?MAP_SIZE ->
[{Row, Col-1}, {Row-1, Col}, {Row, Col+1}];

% Row not boundary, Col on boundary
Row > 1, Row < ?MAP_SIZE, Col == ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col-1}, {Row+1, Col}];

% Corner case top left
Row == 1, Col == 1 ->
[{1,2},{2,1}];

% Corner case bottom left
Row == ?MAP_SIZE, Col == 1 ->
[{?MAP_SIZE-1, Col}, {?MAP_SIZE, Col+1}];

% Corner case top right
Row == 1, Col == ?MAP_SIZE ->
[{Row, Col-1}, {Row+1, Col}];

% Corner case bottom right
Row == ?MAP_SIZE, Col == ?MAP_SIZE ->
[{?MAP_SIZE, ?MAP_SIZE-1}, {?MAP_SIZE-1, ?MAP_SIZE}];

% else
true ->
io:format("Logic error! Row: ~p Col: ~p", [Row, Col]),
error
end.

Because Erlang doesn't support mutable variables, we're forced to check every condition prior to assigning the return value. In this instance, we only need to check four basic conditions but the possible permutations of the result set leads to verbose (and unmaintainable) code. So, how can we elegantly satisfy our requirement in Erlang?

List Comprehensions to the Rescue

As you've probably guessed by now, the proper way to satisfy this requirement is to use a list comprehension:

adjacent_points({Row,Col}) ->
[{R1,C1} || {R1,C1} <- [{Row-1,Col}, {Row+1,Col}, {Row, Col-1}, {Row, Col+1}],
R1 > 0, R1 =< ?MAP_SIZE, C1 > 0, C1 =< ?MAP_SIZE].

Believe it or not, these two lines of code replace the dozen or so lines of code of C#, not to mention the naive Erlang implementation.

Anatomy of a List Comprehension

List comprehensions are powerful, because they enable creating lists from one or more source lists, and they emit values that can be selectively filtered using one or more filters.

Erlang list comprehensions are enclosed in square brackets. The initial {R1,C1} preceding the || is the constructor, which can be any expression. In this example, it will simply instantiate the point tuples that we want to generate:

[ {R1,C1} || ... ]

Immediately following the || is a generator which is composed of a pattern {R1,C1}, followed by "<-", followed by the list. In our case, the list is simply in-lined:

[ {R1,C1} || {R1,C1} <- [{Row-1,Col}, {Row+1,Col}, {Row, Col-1}, {Row, Col+1}], ... ]

Each of the generated values in a list comprehension must satisfy all of the filters that are specified in the list comprehension (if any). In our case, there are four filters, each of which equate to the conditions that we check in the curly-brace implementation, but with far less baggage:

R1 > 0, R1 =< ?MAP_SIZE, C1 > 0, C1 =< ?MAP_SIZE

Again, each of the tuples emitted by the generator will be checked against each of these filters. All of the filters must be satisfied (i.e. evaluate to true) for the constructor part to be invoked. These filters will evaluate to false for tuples that are out of bounds.

Even though our example used a single generator and multiple filters, it should be noted that list comprehensions can be any arbitrary number of generators or filters. You should also be aware of Erlang's facility to efficiently generate bit strings with Bit String Comprehensions, which work very similarly to what we have just seen.

Re-Entry Checklist

It is a curious property that the incidence of defects in software remains constant irrespective of the number of lines of code written. Thus as an industry, we should be naturally inclined to use concise languages. List comprehensions are one way that Erlang affords a succinct notation. Succinct programming languages exhibit less defects than their more verbose counterparts because there is simply less room for error. Some programmers assert that this leads to more obtuse syntax, but in consideration of the constant defect ratio, I believe that this trade off is fully justified.

Afterword

My thanks to Daryn Holmes and Darren Bishop for providing several illuminating, alternative implementations of the code above in various programming languages, and to the folks on #erlang IRC.


1.21.2008

History, Santayana, and The Next Big Thing

"Those who cannot remember the past are doomed to repeat it"

- George Santayana (1863 - 1952)

Or put another way, those who do not learn about the past are doomed to repeat it.

And so it goes in our industry. Many of the so-called latest developments are nothing more than a re-hash of ideas from yesteryear. Ask any Smalltalk curmudgeon and she'll tell you this is true.

The statement above is also what makes learning Lisp (or Smalltalk, or ...) really and truly worthwhile: it's not because you're going to get paid to do any work in those languages (and if you are, then more power to you ...and tell me where I can apply...!).

George Santayana certainly had a lot to say about software development.

The Next Big Thing

Closely related to this meme is that of where to find The Next Big Thing. Quite often the answer is sitting on the end of our collective noses. Rather than seek out TNBT in the cyber ether, simply look at your hard drive.

Witness AJAX: the enabling technologies (XMLHttpRequest, XML and Dynamic HTML) were sitting on everybody's machines several years before somebody creative put these existing pieces together in a new and interesting way.

But Anybody Could Have Done That!

Yes, but they didn't.

There is a great story - probably apocryphal - about Christopher Columbus receiving an accolade for his discovery of the Americas at a banquet. A courtier present at the time remarked, "Well, all you really did was to sail in one direction for a long time - anybody could have done that". Columbus paused for a moment and picked up an egg. "Can you make this egg stay upright?", Columbus asked the courtier. "Of course not!", the courtier replied. Columbus then sharply struck the bottom of the egg on the table, making it stand up, breaking it in the process. "Anybody could have done that!" the courtier protested. "Yes", Columbus replied, "but you didn't."

Christopher Columbus certainly had a lot to say... erm, never mind....

And with this I launch my blog, Software Development Redux, with redux taken from the Latin meaning, brought back.

Thanks for visiting.