List comprehensions are a powerful programming language construct that provide a concise means of dynamically generating lists of values. Erlang and Python, amongst other languages, support them.
There is an interesting observation to be made about Erlang list comprehensions, namely that emergent properties of Erlang make list comprehensions a practical necessity as opposed to merely being "kewl" syntax.
I had to write a simple function that takes a point tuple {Row,Column} identifying a square on a bounded plane (think checkers) and return the squares that neighbor it to the north, west, south and east (so to speak).
The bounded plane has point {1,1} in the top-left hand corner, with rows that increment downwards (as per Excel) and columns that increment to the right. So, given the point {2,2}, the function should return {1,2} (the point to the 'north'), {3,2} (south), {2,1} (west) and {2,3} (east). Also, the notion of a bounded plane implies that, given an edge point e.g. {1,2}, the function should return only the three points {1,1} (west), {1,3} (east) and {2,2} (south), as there is no point to the north: it is 'out of bounds'. Finally, given a corner point e.g. {1,1}, there are only two neighbors {1,2} (south) and {2,1} (east). Simple enough, yes?
Let's take a look at how this can be implemented in a couple of programming languages before admiring the solution that employs list comprehensions.
Here are the unit tests for what I previously described using C# 2.0 and NUnit 2.4.3.0:
[Test]
public void TestFourNeighboredPoint()
{
IList<Point>neighbors = Point.Neighbors(new Point(2,2));
Point[] expectedNeighbors = new Point[4] { new Point(1,2), new Point(3,2), new Point(2,1), new Point(2,3) };
Assert.IsTrue(neighbors.Count == 4);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}
[Test]
public void TestThreeNeighboredPoint()
{
IList<Point> neighbors = Point.Neighbors(new Point(1,2));
Point[] expectedNeighbors = new Point[3] { new Point(1,1), new Point(1,3), new Point(2,2) };
Assert.IsTrue(neighbors.Count == 3);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}
[Test]
public void TestTwoNeighboredPoint()
{
IList<Point> neighbors = Point.Neighbors(new Point(1,1));
Point[] expectedNeighbors = new Point[2] { new Point(1,2), new Point(2,1) };
Assert.IsTrue(neighbors.Count == 2);
Assert.IsTrue(Array.TrueForAll(expectedNeighbors, delegate(Point p) { return neighbors.Contains(p); }));
}
(Aside: I use both a generic IList<point> and a Point[] so I can Array.TrueForAll(), which is something of a poor man's Array#map. This way we avoid having to iterate, thus keeping the code very concise, but at the expense of this anomaly).
Now, the implementation of Point.Neighbors written in a curly-brace language - i.e. without the benefit of list comprehensions - could resemble something like the following C# 2.0:
internal static IList<Point> Neighbors(Point p)
{
IList<Point> neighbors = new List<Point>();
if (p.Row - 1 > 0)
neighbors.Add(new Point(p.Row - 1, p.Column)); // north
if (p.Row + 1 <= MAP_SIZE)
neighbors.Add(new Point(p.Row + 1, p.Column)); // south
if (p.Column - 1 > 0)
neighbors.Add(new Point(p.Row, p.Column - 1)); // west
if (p.Column + 1 <= MAP_SIZE)
neighbors.Add(new Point(p.Row, p.Column + 1)); // east
return neighbors;
}
This code obviously works because the temporary variable neighbors is mutable. We mutate neighbors (i.e. change its value) depending on the bounds we check on the Row and Column of the given Point p.
Now, this code doesn't translate into Erlang very elegantly at all (to put it mildly), because Erlang doesn't support mutable variables. In Erlang, after a value has been bound to an identifier, the value cannot change. This language property is known as single-assignment and is not unique to Erlang. So, Erlang doesn't really have variables, at least if we take the term literally, in the sense that they can vary.
To illustrate what impact single assignment has on code written in the style above, here is the closest possible translation of the curly-brace code above in Erlang:
!! DO NOT DO THIS !!
if
% Normal case: not on any boundary
Row > 1, Col > 1, Row < ?MAP_SIZE, Col < ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col+1}, {Row+1, Col}, {Row, Col-1}];
% Row is 1, but Col is not on a boundary
Row == 1, Col > 1, Col < ?MAP_SIZE ->
[{Row, Col+1}, {Row+1, Col}, {Row, Col-1}];
% Col is 1, but Row not on a boundary
Col == 1, Row > 1, Row < ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col+1}, {Row+1, Col}];
% Row is boundary, Col not on a boundary
Row == ?MAP_SIZE, Col > 1, Col < ?MAP_SIZE ->
[{Row, Col-1}, {Row-1, Col}, {Row, Col+1}];
% Row not boundary, Col on boundary
Row > 1, Row < ?MAP_SIZE, Col == ?MAP_SIZE ->
[{Row-1, Col}, {Row, Col-1}, {Row+1, Col}];
% Corner case top left
Row == 1, Col == 1 ->
[{1,2},{2,1}];
% Corner case bottom left
Row == ?MAP_SIZE, Col == 1 ->
[{?MAP_SIZE-1, Col}, {?MAP_SIZE, Col+1}];
% Corner case top right
Row == 1, Col == ?MAP_SIZE ->
[{Row, Col-1}, {Row+1, Col}];
% Corner case bottom right
Row == ?MAP_SIZE, Col == ?MAP_SIZE ->
[{?MAP_SIZE, ?MAP_SIZE-1}, {?MAP_SIZE-1, ?MAP_SIZE}];
% else
true ->
io:format("Logic error! Row: ~p Col: ~p", [Row, Col]),
error
end.
Because Erlang doesn't support mutable variables, we're forced to check every condition prior to assigning the return value. In this instance, we only need to check four basic conditions but the possible permutations of the result set leads to verbose (and unmaintainable) code. So, how can we elegantly satisfy our requirement in Erlang?
List Comprehensions to the Rescue
As you've probably guessed by now, the proper way to satisfy this requirement is to use a list comprehension:
adjacent_points({Row,Col}) ->
[{R1,C1} || {R1,C1} <- [{Row-1,Col}, {Row+1,Col}, {Row, Col-1}, {Row, Col+1}],
R1 > 0, R1 =< ?MAP_SIZE, C1 > 0, C1 =< ?MAP_SIZE].
Believe it or not, these two lines of code replace the dozen or so lines of code of C#, not to mention the naive Erlang implementation.
Anatomy of a List Comprehension
List comprehensions are powerful, because they enable creating lists from one or more source lists, and they emit values that can be selectively filtered using one or more filters.
Erlang list comprehensions are enclosed in square brackets. The initial {R1,C1} preceding the || is the constructor, which can be any expression. In this example, it will simply instantiate the point tuples that we want to generate:
[ {R1,C1} || ... ]
Immediately following the || is a generator which is composed of a pattern {R1,C1}, followed by "<-", followed by the list. In our case, the list is simply in-lined:
[ {R1,C1} || {R1,C1} <- [{Row-1,Col}, {Row+1,Col}, {Row, Col-1}, {Row, Col+1}], ... ]
Each of the generated values in a list comprehension must satisfy all of the filters that are specified in the list comprehension (if any). In our case, there are four filters, each of which equate to the conditions that we check in the curly-brace implementation, but with far less baggage:
R1 > 0, R1 =< ?MAP_SIZE, C1 > 0, C1 =< ?MAP_SIZE
Again, each of the tuples emitted by the generator will be checked against each of these filters. All of the filters must be satisfied (i.e. evaluate to true) for the constructor part to be invoked. These filters will evaluate to false for tuples that are out of bounds.
Even though our example used a single generator and multiple filters, it should be noted that list comprehensions can be any arbitrary number of generators or filters. You should also be aware of Erlang's facility to efficiently generate bit strings with Bit String Comprehensions, which work very similarly to what we have just seen.
Re-Entry Checklist
It is a curious property that the incidence of defects in software remains constant irrespective of the number of lines of code written. Thus as an industry, we should be naturally inclined to use concise languages. List comprehensions are one way that Erlang affords a succinct notation. Succinct programming languages exhibit less defects than their more verbose counterparts because there is simply less room for error. Some programmers assert that this leads to more obtuse syntax, but in consideration of the constant defect ratio, I believe that this trade off is fully justified.
Afterword
My thanks to Daryn Holmes and Darren Bishop for providing several illuminating, alternative implementations of the code above in various programming languages, and to the folks on #erlang IRC.