Arc internals: Macros

This article describes how the Arc language handles macros internally. It explains the steps of macro processing in Arc, and how macros are translated to Scheme functions. This article doesn't provide useful knowledge of how to use macros; for that, see chapter 7 of On Lisp. The macros below are not intended as examples of good macro style or usefulness; instead they illustrate points of macro processing. This article also assumes some previous understanding of macros, which can be obtained from the Arc Tutorial.

To summarize macros briefly, a macro is like a function that generates Lisp code, which is then substituted in place of the macro. The first phase of macro processing is macroexpansion, in which the macro generates an expression. The second phase is evaluation, in which the generated expression is evaluated.

How macros work

It is very important to understand when the macroexpansion and evaluation phases take place. An example may help clarify this:
arc> (mac foo () (prn "macro executed") '(prn "generated code executed"))
#3(tagged mac #<procedure>)
arc> (foo)
macro executed
generated code executed
"generated code executed"
The macro foo does two things: it prints "macro executed" when it is executed during macroexpansion, and it returns the list: (prn "generated code executed"). When that list is executed in evaluation, it will display "generated code executed". (Since prn returns that string, the string is also displayed a second time as the result of the command.)

Howerver, the two phases of macroexpansion and evaluation can be separated in time. If we use the macro inside a function definition, things get interesting:

arc> (def bar () (prn "bar executed") (foo))
macro executed
#<procedure: bar>
Macroexpansion takes place above, but evaluation of the generated code does not take place; the generated code becomes part of bar. This is demonstrated by the prn statements, which show that only the macro itself is executed. Next, the macro itself can be destroyed, to illustrate that it is not necessary for the evaluation phase:
arc> (= foo nil)
nil
Finally, evaluation can take place. If procedure bar is executed three times, note that the body of bar is executed, as well as the code generated by foo during macroexpansion:
arc> (repeat 3 (bar))
bar executed
generated code executed
bar executed
generated code executed
bar executed
generated code executed
nil
This illustrates the two phases: first, the macro is executed and generates code during macroexpansion. Second, the generated code is executed, potentially much later, during evaluation. Note that the macro itself does not take part in the second step. In the example above, the macro has been destroyed. A more subtle issue is if the macro definition is changed, these changes will have no effect on previous callers of the macro; this can lead to confusion, when the old version of a macro appears to be active.

Another important difference between macros and functions is the arguments to a function are evaluated before passing them to the function, while the arguments to a macro are passed unprocessed. This is vital for macros that execute their arguments multiple times (e.g. repeat), macros that execute their arguments conditionally (e.g. and), and macros that process their arguments (e.g. =). To see the difference, note that the function receives the result of evaluating (+ 1 1), while the macro receives the list (+ 1 1):

arc> (def f args (prn args) nil)
#<procedure: f>
arc> (mac m args (prn args) nil)
#3(tagged mac #<procedure>)
arc> (f (+ 1 1))
(2)
nil
arc> (m (+ 1 1))
((+ 1 1))
nil

Like functions, macros support destructuring bind on the arguments. Despite its intimidating name, destructuring bind is simply means that the list of parameters can be a complex nested list. The arguments passed in will be "destructured" and mapped onto the parameter list. For example:

arc> (mac db (a (b c d) e) (prs a b c d e) nil)
arc> (db 1 (2 3 4) 5)
1 2 3 4 5
Note that the list (2 3 4) has been destructured and mapped onto individual arguments. In a more complex example, note that the expressions are unevaluated, expressions can be destructured, and any extra arguments are discarded.
arc> (db (+ m n o) (+ p q r) (+ s t u))
(+ m n o) + p q (+ s t u)

Now that the phases of macro processing are clear, these phases can be illustrated in Arc.

Creation of a macro

Macro creation in Arc is, somewhat surprisingly, not part of the ac.scm foundation, but is implemented in arc.arc out of annotated functions. A macro can be generated "manually", illustrating the steps. Create baz, which is like the macro foo above, except it is a function and not a macro. Note that when executed, it returns the generated code:
arc> (def baz () ((prn "macro executed") '(prn "generated code executed"))
arc> (baz)
macro executed
(prn "generated code executed")
To turn this function into a macro, it is simply annotated as type 'mac. The resulting mbaz functions exactly like the macro foo:
arc> (= mbaz (annotate 'mac baz))
#3(tagged mac #<procedure: baz>)
arc> (mbaz)
macro executed
generated code executed
"generated code executed"
To summarize, any function that generates Arc code can be annotated as type 'mac, and the result is a macro. The annotation process is the key to forming a macro. There is nothing special about using mac to create a macro; it is just a "convenience" macro that performs this annotation. The fact that mac is a macro may seem circular, but since it is implemented using annotate directly, there is no circularity.

The annotate function provides a general typing system. It is implemented in Scheme by ar-tag, which creates a Scheme vector of length 3: 'tagged, the type, and the contents. For macros, the type is 'mac, but annotate supports arbitrary types. (For detailed background on annotate, see Some Work on Arc.)

The internal representation of a macro as a vector can be seen by entering a macro name:

arc> do
#3(tagged mac #<procedure>)

Because macros are implemented by functions, the body of the macro is initially processed by ac-fn. A simple function is turned into a Scheme lambda function, while a function with complex (destructuring) arguments is processed by ac-complex-fn. The destructuring is implemented by inserting the appropriate combinations of car and cdr to pull each argument out of the list. In other words, destructuring is just syntactic sugar; the same effect can be obtained by manually using car and cdr to extract each argument from a rest argument.

The net result is that a Arc macro becomes an Arc procedure, tagged as 'mac. Internally, the macro is a Scheme vector, with the third argument a Scheme procedure that will take the macro arguments, perform any necessary destructuring, and execute the macro code.

Macroexpansion

The macroexpansion takes place when Arc code is converted to Scheme by ac. (See Arc Internals, Part 1 for background.) Code of the form (foo ...) is passed by ac to ac-call. If the first argument is a macro (i.e. a vector tagged with 'mac), it is evaluated by ac-mac-call, which evaluates the macro function on the arguments, and then passes the macroexpanded result to ac to be converted to Scheme. The net result is the code generated by the macro gets treated as if it were part of the original expression being processed.

Comparing ac-mac-call to ac-call shows why macros receive their arguments unevaluated. ac-mac-call applies the macro function to the arguments, while ac-call maps ac on the arguments before applying the function, causing the arguments to be evaluated.

Evaluation

Nothing partcularly special happens for macro code during evaluation. At this point it is Scheme code, and it gets executed the same as any other code. Specifically, arc-eval calls eval on the Scheme code generated by ac. It may use ar-funcalln to execute functions, and execution will typically bottom-out with foundation operations implemented in native Scheme. (See Arc Foundation Documentation for details.)

macex and macex1

The macex and macex1 functions perform macro expansion. If the outermost form is a macro, macex1 expands it once, while macex expands it repeatedly until it is no longer a macro. These functions are typically used for debugging, to see what a macros is doing. Note that neither macex nor macex1 expand inner macros.
arc> (macex1 '(or 1 2 3))
(let gs2418 1 (if gs2418 gs2418 (or 2 3)))
arc> (macex '(or 1 2 3))
((fn (gs2419) (if gs2419 gs2419 (or 2 3))) 1)

The macex and macex1 functions are implemented by ac-macex, which, if passed a macro expands it. Expansion is done by applying the macro function to the arguments. For macex, ac-macex calls itself recursively, terminating when the outer call is no longer a macro. However, macex1 passes a 'once flag to ac-macex, so only one step of macro expansion is performed.

The set special form takes a symbol and a value, and sets the variable indicated by the symbol to the value. It is implemented by ac-set, which uses ac-macex. This ensures the first argument to set is a symbol, or a macro that macroexpands to a symbol. Note that something (other than a macro) that evaluates to a symbol is not permitted as the first argument to set. Thus, macro processing for set uses a separate path from the rest of Arc's macro processing.

In conclusion, Arc's macro processing is implemented relatively straightforwardly. The biggest surprise is the representation of macros as tagged procedures, which are implemented as vectors. Macros may be somewhat nonintuitive, but understanding the internal implementation may help with writing and understanding code that makes use of macros.

Copyright 2008 Ken Shirriff