As bad as anything else

2010 07 22

On Erlang's Syntax

I first planned to release this text as an appendix entry for Learn You Some Erlang, but considering this feels more like editorial content and not exactly something for a reference text, I decided it would fit better as a blog post.

Many newcomers to Erlang manage to understand the syntax and program around it without ever getting used to it. I've read and heard many complaints regarding the syntax and the 'ant turd tokens' (a subjectively funny way to refer to ,, ; and .), how annoying it is, etc.

As mentioned at some point in the book, Erlang draws its syntax from Prolog. While this gives a reason for the current state of things, it doesn't magically make people like the syntax. I mean, I don't expect anyone to respond to this by saying "Oh, it's prolog, I get it. Makes complete sense!" As such, I'll suggest three ways to read Erlang code to possibly make it easier to understand.

The Template

The template way is my personal favorite. To understand it, one must first get rid of the concept of lines of code and think in Expressions. An expression is any bit of Erlang code that returns something.

In the shell, the period (.) ends an expression. After writing 2 + 2, you must add a period (and then press <Enter>) for the expression to be ran to then return a value.

In modules, the period ends forms. Forms are module attributes and function declarations. Forms are not expressions as they don't return anything. This is why they're terminated in a different manner than everything else. Given forms are not expressions, it could be argued that the shell's use of . to terminate expression is what is not standard here. Consequently, I'd suggest not caring about the shell for this method of reading Erlang.

Alright. So the first rule is that the comma (,) separates expressions:

C = A+B, D = A+C

This is easy enough. However, it should be noted that if ... end, case ... of ... end, begin ... end, fun() -> ... end and try ... of ... catch ... end are all expressions. As an example, it is possible to do:

Var = if X > 0  -> valid;
         X =< 0 -> invalid
      end

And get a single value out of the if ... end. This explains why we will sometimes see such language constructs followed by a comma; it just means there is another expression to evaluate after it.

The second rule is that the semi-colon (;) has two roles. The first one is separating different function clauses:

fac(0) -> 1;
fac(N) -> N * fac(N-1).

The second one is separating different branches of expressions like if ... end, case ... of ... end and others:

if X < 0  -> negative;
   X > 0  -> positive;
   X == 0 -> zero
end

It's probably the most confusing role because the last branch of the expression doesn't need to have the semi-colon following it. This is because the ; separates branches, it doesn't terminate them. Think in expressions, not lines. Some people find it easier to illustrate the role of separator by writing the above expression in the following way, which is arguably more readable:

if X < 0  -> negative
 ; X > 0  -> positive
 ; X == 0 -> zero
end

This makes the role of separator more explicit. It goes in between branches and clauses, not after them.

Now, because the semi-colon is used to separate expression branches and function clauses, it becomes possible to have an expression such as a case construct followed by , when followed by another expression, a ; when in the last position of a function clause, or a . when at the last position of a function.

The line-based logic for terminating lines such as in C or Java must go out the window. Instead, see your code as a generic template you fill (hence the name The Template):

head1(Args) [Guard] ->
    Expression1, Expression2, ..., ExpressionN;
head2(Args) [Guard] ->
    Expression1, Expression2, ..., ExpressionN;
headN(Args) [Guard] ->
    Expression1, Expression2, ..., ExpressionN.

The rules make sense, but you need to get into a different reading mode. That's where the heavy lifting needs to be done: moving from lines and blocks towards a pre-defined template. I mean, if you think about it, things like for (int i = 0; i >= x; i++) { ... } (or even for (...);) have a weird syntax when compared to most other constructs in languages supporting them. We're just so used to see these constructs we don't mind them anymore.

The English Sentence

Although this manner is not the one I like the most, I do realize different people have different ways to make sense of logical concepts and this is one manner I've heard being praised many times.

This one is about comparing Erlang code to English. Imagine you're writing a list of things. Well, no. Don't imagine it, read it.

I will need a few items on my trip:
  if it's sunny, sunscreen, water, a hat;
  if it's rainy, an umbrella, a raincoat;
  if it's windy, a kite, a shirt.

An Erlang translation can remain a bit similar:

trip_items(sunny) ->
    sunscreen, water, hat;
trip_items(rainy) ->
    umbrella, raincoat;
trip_items(windy) ->
    kite, shirt.

Here, just replace the items by expressions and you have it. Expressions such as if ... end can be seen as nested lists.

And, Or, Done.

Another variant of this one has been suggested to me on #erlang. The user simply reads , as 'and', ; as 'or' and . as being done. A function declaration can then be read as a series of nested logical statements and affirmations.

In Conclusion...

Some people will just never like "ant turd tokens" or being unable to swap lines of code without changing the token at the end of the line. I guess there's not much to be done when it comes to style and preferences, but I still hope this text might have been useful. After all, "the syntax is only intimidating, it's far from difficult."