Saturday, 7 April 2007

The Atomiser, Part IV

The story so far:

Our intrepid (also whiny, and quite lazy) developer suffers from a serious case of fat fingers and failing eyesight. Mistyped Erlang atoms have caused him countless hours of anguish and hair loss, and the compiler does not even have the decency to emit so much as a warning or apology.

In an effort to fill this massive hole in his development tools, the developer has managed to bolt together the bare bones of a parse transformation module. Soon this unnatural creation will be lumbering around the neighbourhood, causing all sorts of havoc.

The Atomiser currently has just enough intelligence to recognise valid atom lists specified in a given source file, but it is not able to actually do anything with that information. We need to walk the AST and locate the atoms used in the source code.

This walk_ast function will be the work-horse of the Atomiser module:

walk_ast([], Atoms) -> Atoms;

walk_ast([Node|ASTRest], Atoms) ->
    io:format("Unknown node:~n~p~n", [Node]),
    walk_ast(ASTRest, Atoms).

And we will modify parse_transform to call it:

parse_transform(AST, _Options) ->
    Atoms = atoms_from_ast(AST),
    _AtomsMarked = walk_ast(AST, Atoms),

With this simple walk_ast function we will get a huge output list of 'unknown' Abstract Syntax Tree nodes when we compile atomiser.erl[1]. The last unknown node will appear as:

Unknown node: {eof,41}

Obviously we will need to add more function clauses to cater for the different nodes in the Abstract Syntax Tree. Rather than starting from the top of the output list, I will work my way up from the bottom. (It saves scrolling up... see reference to 'lazy' above.)

Keeping the empty-list and the unknown-node function clauses as the first and last entries respectively, we add a clause entry for the eof node:

walk_ast([{eof,_Line}|ASTRest], Atoms) ->
    walk_ast(ASTRest, Atoms);

Compiling the atomiser.erl program again[2] shows that, indeed, the {eof,41} node is now missing from the 'unknown node' output.

Continuing on we can enter some more function clauses. Something like this:

walk_ast([], Atoms) -> Atoms;
walk_ast([{call,_Line,Fun,Args}|ASTRest], Atoms) ->
    walk_ast(ASTRest, walk_ast(Args, walk_ast([Fun], Atoms)));
walk_ast([{clause,_Line,Args,Guards,Exprs}|ASTRest], Atoms) ->
    walk_ast(ASTRest, walk_ast(Exprs, walk_ast(Guards, walk_ast(Args, Atoms))));
walk_ast([{cons,_Line,Head,Tail}|ASTRest], Atoms) ->
    walk_ast(ASTRest, walk_ast([Tail], walk_ast([Head], Atoms)));
walk_ast([{eof,_Line}|ASTRest], Atoms) ->
    walk_ast(ASTRest, Atoms);
walk_ast([{function,_Line,_Fun,_Arity,Clauses}|ASTRest], Atoms) ->
    walk_ast(ASTRest, walk_ast(Clauses, Atoms));
walk_ast([{nil,_Line}|ASTRest], Atoms) ->
    walk_ast(ASTRest, Atoms);
walk_ast([{var,_Line,_Name}|ASTRest], Atoms) ->
    walk_ast(ASTRest, Atoms);
walk_ast([Node|ASTRest], Atoms) ->
    io:format("Unknown node: ~p~n", [Node]),
    walk_ast(ASTRest, Atoms).

Hang on, I am typing the same stuff over and over again... this is way too inefficient.

It looks like there is a bit of a pattern going on here. Something along the lines of:

walk_ast([PatternTuple|ASTRest], Atoms) ->
PossiblyNestedCallsToWalkAST(SomeValueInThePattern, Atoms));

Where the possibly-nested calls to walk_ast are based on the elements of the pattern tuple that we are interested in processing recursively.

Now, Erlang does not have a built-in full macro system like Lisp does, but it does at least have basic substitution macros. I think that we can make the code above much nicer to read by using a macro like this:

-define(WALK_AST(Pattern, Expressions),
    walk_ast([Pattern|ASTRest], Atoms) ->
                fun(AST, AtomsMarked) ->
                    walk_ast(AST, AtomsMarked)

Now that hideous function clause

walk_ast([{clause,_Line,Args,Guards,Exprs}|ASTRest], Atoms) ->
    walk_ast(ASTRest, walk_ast(Exprs, walk_ast(Guards, walk_ast(Args, Atoms))));

may be written like this:

?WALK_AST({clause,_Line,Args,Guards,Exprs}, [Args, Guards, Exprs]);

Continuing to write a function clause for every unknown node in the AST[3], we get:

walk_ast([], Atoms) -> Atoms;
?WALK_AST({atom,_Line,Atom}, []); % Need to check for valid atom.
?WALK_AST({attribute,_Line,file,_File}, []);
?WALK_AST({attribute,_Line,module,_Module}, []);
?WALK_AST({attribute,_Line,export,_ExportList}, []);
?WALK_AST({attribute,_Line,compile,_CompilerDirective}, []);
?WALK_AST({attribute,_Line,atoms,_AtomList}, []);
?WALK_AST({call,_Line,Fun,Args}, [[Fun], Args]);
?WALK_AST({'case',_Line,Test,Clauses}, [[Test], Clauses]);
?WALK_AST({clause,_Line,Args,Guards,Exprs}, [Args, Guards, Exprs]);
?WALK_AST({cons,_Line,Head,Tail}, [[Head], [Tail]]);
?WALK_AST({eof,_Line}, []);
?WALK_AST({'fun',_Line,{clauses,Clauses}}, [Clauses]);
?WALK_AST({function,_Line,_Fun,_Arity,Clauses}, [Clauses]);
?WALK_AST({match,_Line,Left,Right}, [[Left], [Right]]);
?WALK_AST({nil,_Line}, []);
?WALK_AST({remote,_Line,_Module,_Function}, []);
?WALK_AST({string,_Line,_String}, []);
?WALK_AST({tuple,_Line,Elements}, [Elements]);
?WALK_AST({var,_Line,_Name}, []);
walk_ast([Node|ASTRest], Atoms) ->
    io:format("Unknown node: ~p~n", [Node]),
    walk_ast(ASTRest, Atoms).

And this is all we need to walk through the Atomiser's current code, with no unknown node messages appearing.

[1] We need to compile atomiser.erl twice to pick up the changes and see them in action. The first time the compilation is performed the new code is compiled and loaded. The second time the compilation is performed the new code is actually run against the source.

[2] Twice!

[3] Rather than consisting of a bunch of pattern matching clauses, the walk_ast function could be made "smarter" by transforming the given node tuple into a list, and applying some rules-based logic to the elements of that list (from the third element onwards). I chose to do things the function-clause way mainly because I am not completely familiar with all of the elements in an Erlang AST. Having newly-encountered nodes come up as unknown elements is a good way to be exposed to the underlying structure of the parse tree, and to ensure that we have caught everything we need to.


  1. I guess that this:

    walk_ast({[{eof,_Line}|ASTRest], Atoms}) ->
    walk_ast({ASTRest, Atoms});

    Was meant to be:

    walk_ast([{eof,_Line}|ASTRest], Atoms) ->
    walk_ast(ASTRest, Atoms);

  2. Bard: Yes indeed, it was meant to be. Thanks for bringing it to my attention.


Obligatory legal stuff

Unless otherwise noted, all code appearing on this blog is released into the public domain and provided "as-is", without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author(s) be liable for any claim, damages, or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.