Wednesday 4 April 2007

The Atomiser, Part I

Erlang has atoms - one of many language features that I find hard to live without when I have to work with other systems.

But the compiler for Erlang does not check that the atoms you pass to a function are actually used by that function.

While I love programming with Erlang, stuff like this happens to me more often than I care to admit:

1> Greeting = fun
1>     (hello) -> "Hooray! You said hello!";
1>     (_) -> "You said something else."
1>     end.
#Fun

2> Greeting(he11o).
"You said something else."


...

Huh? I said hello!

Didn't I?

The more observant of you will have immediately spotted the number '1's masquerading as 'l's up there, but believe me, depending on the obviousness of the typo and the lateness of the hour this sort of pest can be quite painful to hunt down and eradicate.

It was especially painful when I first started learning the language. I would find some strange behaviour or receive these odd "function_clause" errors, and I had no idea what was wrong.

Wouldn't it be nice if you could optionally give the compiler a list of valid atoms for a source file, and have the compiler warn you if you were using an atom that was not in that list?

I thought so too.

Let's fix it.

5 comments:

  1. Very cool!

    Check out my (unfortunately, not finished) type inferring parse transform, Recless: http://code.google.com/p/recless. I think you'll like it.

    ReplyDelete
  2. Hi Yariv. Thank you for checking out my work.

    I have just had a quick look at Recless, and I like how you have merged gathering record information with walking the AST. I should probably do the same for the Atomiser, rather than doing two passes.

    I will be going though the details of how you parse/walk the AST with a fine-toothed comb. :-)

    ReplyDelete
  3. Couldn't you just avoid using the "_" case ?

    ReplyDelete
  4. Hi Zimbatm.

    > Couldn't you just avoid using the "_" case?

    Good question. I had to think about it for a bit, and my short answer is "not always".

    True, if we had a known set of acceptable input atoms to a function and I used an invalid one, a runtime exception would be thrown. That is the "function_clause" error I mentioned above. I have to admit that even with this hint from the Erlang runtime, when I was learning the language I just didn't know what to look for properly (I would be looking at the function definition instead of the function call, for instance).

    I believe that the Dialyzer will catch this problem and alert you to an uncaught function call. It would be good practice to avoid global match clauses whenever possible.


    A possible extension of this problem is with sending messages from one process to another. I do not know if the Dialyzer will pick up messages that will not be handled by the target process. The Atomiser will let you know if you are using an unknown atom in a message. (It will not stop you sending a known atom that will still not be handled by the receiver, though.)


    And sometimes you do actually need a 'default' case. This is where you will get the problem of code that does not crash, but also does not behave as you thought it should.

    For an example of a necessary default case in action, see my (to be posted soon) article on EMP1. :-)

    ReplyDelete
  5. Great stuff! Thank you very much an keep the good stuff coming!
    I just looked briefly and i could only wish that more people produce such extensive tutorials...

    ReplyDelete

Obligatory legal stuff

Unless otherwise noted, all code appearing on this blog is released into the public domain and provided "as-is", without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author(s) be liable for any claim, damages, or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.