Procedural Macros in Rust

Published on to joshleeb's blog

Procedural macros are a really powerful language feature in Rust and something I haven’t seen in many other languages.

There are a heap of tutorials out there for procedural macros, including in The Rust Reference, and the first edition of the Rust Book. One of the more entertaining (and useful) posts is by Zach Mitchell where you get to “learn Rust procedural macros with Nic Cage”.

I won’t go into depth about what procedural macros are and why they’re so powerful. But basically they allow you to tell the compiler to take in some code, analyse it, and generate some more code. To me, that sounds pretty powerful already.

VariantEq

I recently put up a crate called VariantEq which exposes a Custom Derive type procedural macro called VariantEq.

Custom Derive macros are used with derive like #[derive(Debug, ...)] above your struct or enum. And the job of this is usually to implement a trait for you! In this case it’s the Debug trait.

Examples are better than explanations so this is what deriving VariantEq on your enum will allow you to do.

#[macro_use]
extern crate varianteq;

#[derive(Debug, VariantEq)]
enum E {
    A(i32),
    B(i32),
    C(u32, bool),
}

fn main() {
    assert_eq!(E::A(1), E::A(1));
    assert_eq!(E::A(1), E::A(2));

    assert_ne!(E::A(1), E::B(1));
    assert_ne!(E::A(1), E::C(1, false));
}

Pretty much it implements the PartialEq and Eq traits in a way that only the variant is considered, and the variant fields are ignored.

With all the tutorials and examples around on the web I thought I would have no trouble implementing this. But turns out, there were some recent changes to the most up to date way of implementing these macros. The docs on the web hadn’t been fully updated (or maybe I just couldn’t find up to date examples) so it became a bit harder than I thought.

In any case this can be yet another example of using procedural macros in Rust.

Exposing the macro

A good place to start is by exposing the macro you are creating. This will tell the compiler what to run when you use #[derive(...)].

To find a good example of how to do with with proc_macro2, I ended up looking through the diesel_derives source which sets out the code to do this pretty nicely.

First we define the entry point varianteq_derive. This function actually has a procedural macro on itself, which marks the function to be called whenever we #[derive(VariantEq)].

#[proc_macro_derive(VariantEq)]
pub fn varianteq_derive(tokens: TokenStream) -> TokenStream {
    expand_derive(tokens, varianteq::derive)
}

The expand_derive functions is fairly straight forward. It takes the TokenStream from proc_macro, converts it into a proc_macro2::TokenStream, parses it, and then calls our derive function, in this case varianteq::derive.

fn expand_derive(tokens: TokenStream, derive: DeriveFn) -> TokenStream {
    let item = parse2(tokens.into()).unwrap();
    match derive(item) {
        Ok(tokens) => tokens.into(),
        Err(err) => handle_derive_err(err),
    }
}

Proc Macro 2

From alexcrichton/proc-macro2:

proc_macro2 is a small shim over the proc_macro crate in the compiler intended to multiplex the current stable interface and the upcoming richer interface.

Deriving the Macro

Within src/varianteq.rs we have the derive function that was being called earlier. In the real world this could just be a method than generates an implementation of PartialEq using the mem::discriminant method. But that wouldn’t make for a very interesting procedural macro, so instead we’ll assume that this method doesn’t exist in the stdlib. So then, the logic can be broken up into three stages.

First, we gather information from the DeriveInput, which was parsed out of the TokenStream earlier on. For VariantEq specifically we just need the enum identifier, and the variants of the enum.

Next, we construct the list of variants. This is essentially a mapping of each enum variant into our EnumVariant type which can be used to generate tokens. More on that soon.

Finally, we generate our tokens with the quote! macro. This macro takes Rust code, and parses it into the Tokens that we need to give back to the compiler. This is to avoid manually specifying each individual token of code to generate.

This last part is fairly straight forward. But it has one line which is a bit mystifying. The #(#enum_variants => true,)* is a special syntax used by the quote! macro to bring values from outside its scope into scope.

For a specific explanation of what this line, and similar syntax, does: from the the docs for quote/quote on interpolation:

This iterates through the elements of any variable interpolated within the repetition and inserts a copy of the repetition body for each one.

Our Own EnumVariant

Now back to the EnumVariant type I mentioned earlier, set out in src/token.rs. This struct is just an abstraction to make it easier to use that special interpolation syntax in the quote! macro.

The important bit is that EnumVariant implements the ToTokens trait which defines how it gets generated into tokens.

Let’s say we have this enum:

enum E {
    A,            // Unit variant.
    B(i32, i32),  // Unnamed variant with 2 fields.
    C{x: i32},    // Named variant.
}

The ToTokens implementation for EnumVariant will spit out the tokens for this Rust code, generating a different line for each variant based on the variant type:

match (self, other) {
    (E::A, E::A) => true,
    (E::B(_, _), E::B(_, _)) => true,
    (E::C{..}, E::C{..}) => true,
}

Wrapping Up

Running back up the function calls, this output from EnumVariant::to_tokens is plugged back into the quote! block defined in varianteq::derive.

Now we have PartialEq and Eq implemented for the enum that derived VariantEq. So we turn that Rust code into Tokens and send it back to the compiler.