Saturday 21 April 2007

Erlang Macro Processor (v2), Part II

You know we need to do it eventually, so let's get the boring "find -macro_modules attributes and store their values" bit out of the way so we can move on to some more interesting stuff. Here we go:


-module(emp2).
-export([parse_transform/2]).

parse_transform(AST, _Options) ->
Mods = lists:flatten([Mods || {attribute,_Line,macro_modules,Mods} <- AST]),
io:format("Macro Modules: ~p~n", [Mods]),
AST.

Ah, whoops. One of those list comprehension thingies seems to have slipped into the parse_transform function. To get rid of it we just have to change that line into something like this:

    Mods = lists:flatten(lists:map(
fun ({attribute,_Line,macro_modules,Mods}) -> Mods;
(_Node) -> []
end,
AST)),

Hmmm. On second thoughts, maybe we should keep the list comprehension.

I believe that list comprehensions are a relatively new feature in Erlang so you may not see too many of them in existing code, but they really are worth learning. (Erlang is in good company: Python and Haskell have list comprehensions too.)


Back from that tangent and to the program at hand, we see that the macro module names are being stored in an ordinary list. I expect that only a few macro modules (probably only one at most) will be specified in any given module, and looking for an element in a one-element list is pretty quick, so we should not be needing the indexing overhead of a dictionary. I also don't particularly mind if someone specifies a macro module more than once, or if a specified macro module is never used. (If we were really concerned about duplicate macro module names then we could use one of the list module functions to easily remove them.)

We could also roll the gathering of the macro_modules attributes up into the AST-walking code, but conceptually it is nicer to keep it up here and out of the way. Also, as this code only traverses the very top level of the AST it should be quite quick. Pattern-matching one entry per module attribute and function definition is not a computationally expensive task.

Right, the boring stuff is done; let's get into parsing that AST.


As I briefly mused at the bottom of a previous Atomiser post:

Rather than consisting of a bunch of pattern matching clauses, the walk_ast function could be made "smarter" by transforming the given node tuple into a list, and applying some rules-based logic to the elements of that list (from the third element onwards).

I reckon we could give this a go and see where we end up. (Either it will work and we have learned something new, or it won't work and we will have learned something new, so it is a win-win situation either way.)

You might recall that the Atomiser walk_ast function had a clause for each node type. This was a great way for me to implement the Atomiser because I got to see the AST nodes that made up my programs, but in the end it has turned out to be a pretty ugly function.

Here are a few lines of the walk_ast function as a quick refresher (the substitution macro actually makes the code nicer than it could be):

?WALK_AST({call,_Line,_Fun,Args}, Args); % Handles local and module calls.
?WALK_AST({'case',_Line,Test,Clauses}, [Test|Clauses]);
?WALK_AST({'catch',_Line,Expr}, Expr);
?WALK_AST({char,_Line,_Char}, []);

And those clauses go on (and on!) for about forty different node types...

I would much rather only have specific clause for handling each node that we are interested in, and use some generic code to handle the rest. But if we want to create some rules to manage these nodes generically then we had better find some patterns in all of that mess.

...

The first (and blindingly obvious) thing to notice about the nodes is that - without exception - they are all tuples. (I know, I know: I am a genius. Applause is not strictly necessary. Really. Oh, all right then, a little bit of applause is okay, if you insist.)

Two of these tuple nodes are not quite the same as the others: {error, Details} and {warning, Details}. In all of the other nodes the first element is the node type and the second element is the line number of the source file that the node appears in. After that there are a variable number of elements (possibly none) with node-specific meanings.

We are interested in catching -macro attributes (so EMP2 can do the work of EMP1) as well as remote function call nodes that are calling a macro function. Everything else is irrelevant, except that we want to recursively descend into sub-nodes to keep searching for other remote macro function calls.

If we take a closer look at the elements of nodes we will note that the element is always either a list, a tuple, or atomic (i.e.: an atom or an integer). These elements might have a special meaning to the compiler (depending on their location in the current node tuple) but to us they are just potential sub-nodes. If the node does not match an attribute or remote function call pattern then the elements have no meaning to EMP2 and we can treat them as homogenous lumps of node matter.

Of the additional elements in a node (if any), they are either
  • a list, which we can parse as its own sub-AST,
  • a tuple, which we can parse as another node, or
  • atomic (or integer), which we can pass back as-is.

I think that all of these notes are probably enough to get us started coding.

No comments:

Post a Comment

Obligatory legal stuff

Unless otherwise noted, all code appearing on this blog is released into the public domain and provided "as-is", without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author(s) be liable for any claim, damages, or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.