TL; DR: add this build.rs and put include!(concat!(env!("OUT_DIR"), "/protos.rs")); somewhere in lib.rs.

Preface

Recently, I’ve been playing with the protobuf/gRPC ecosystem in Rust, building a gRPC client against a message bus service at work. Through lib.rs I found the most popular stack in Rust - tonic and prost, and overall they are extremely good foundation and very easy to use, with various examples showcasing all use cases I need.

However, one major pain point is the initial code generation and importing the generated code into a Rust library. My target service’s protobuf and gRPC definitions are stored in a central repo and spread across many many files and multiple levels of directories. tonic-build - which is powered by prost-build - is currently oriented around code generation from a defined list of source .proto files, this works fine if an application does not need to interact with too many definitions in different files, but it clearly doesn’t scale (our repo already contains several hundreds of proto files).

So, without support such as the glob pattern, I spent a few hours to figure out a solution (basically build.rs) that automatically compiles all the proto definitions, and also organize them correctly and make it easy to import. There are also several caveats that I encountered so you don’t have to discover them again.

As a disclaimer, the code in this post is only tested for my case, though I believe it should only require minimal change to adapt to many other cases. I’m also not sure if there exists built-in support or other prior work, which could be better than what I have here. Let me know!

The Basics

Let’s start with a quick look at the basic API. Following various tutorial and documentations it basically works as follows:

With some .proto files:

syntax = "proto3";

package parent.example;

message Example {}

Compile them through build.rs:

tonic_build::configure()
  .build_server(false)
  .compile(&["base/parent/example.proto"], &["base"])?;

Import the generated code in lib.rs:

pub mod parent {
  tonic::include_proto!("parent");

  pub mod example {
    tonic::include_proto!("parent.example");
  }
}

And finally the proto structs can be used:

let e = project::parent::example::Example {};

It’s clear that we not only want to automate the code generation in build.rs, but also all those include_proto! calls in lib.rs. It’s worth noting that the layers of pub mod are also crucial, otherwise inter-package references won’t work, all the more reason to automate this part.

The Generation

It’s easy to expand the basic example to compile all proto files, all we need is to recursively find and collect all the .proto files, then finally feed them to the compiler. It’s possible to roll your own iterator, but I decided to just use walkdir (add it to [build-dependencies] in Cargo.toml).

There isn’t much “magic” involved so I won’t show the code here, but there’s a caveat you may encounter - all proto files need to declare the pacakge specifier as elaborated in the README of prost. This is not a requirement for example in Elixir which I’ve used extensively. Fortunately in my case, out of the hundreds of proto files, only one of them is missing package, it’s likely a neglect in the early days and we now added linter rule to avoid this in the future.

The Include

While finding all the files to compile is rather easy, building the correct module hierarchy turns out to be quite involved. I wasted some time initially thinking that it’s based on the file path, but eventually realized it’s based on the package hierarchy. This requires reading the file content 1 and some parsing, we use formatters strictly which alleviates some issue, but it’s still tricky for example I need to purge the comments.

Multiple files can combine into a single package so a HashSet is used to avoid duplication. A simple Vec is used to collect all files.

use std::{collections::HashSet, env, fmt::Write, fs, path::Path};
use walkdir::WalkDir;

type Res = Result<(), Box<dyn std::error::Error>>;

fn main() -> Res {
  let mut protos = vec![];
  let mut pkgs = HashSet::new();

  for entry in WalkDir::new("ROOT DIR")
    .into_iter()
    .map(|e| e.unwrap())
    .filter(|e| {
      e.path()
        .extension()
        .map_or(false, |ext| ext.to_str().unwrap() == "proto")
    })
  {
    let path = entry.path();
    protos.push(path.to_owned());

    let content = fs::read_to_string(&path).unwrap();
    let pkg = content
      .lines()
      .find(|line| line.starts_with("package "))
      .unwrap()
      // remove comment
      .split("//")
      .next()
      .unwrap()
      // extract package
      .trim()
      .trim_start_matches("package ")
      .trim_end_matches(";");

    pkgs.insert(pkg.to_string());
  }

  tonic_build::configure()
    .build_server(false)
    .compile(&protos, &[Path::new("INCLUDE DIR").into()])?;

  write_protos_rs(pkgs)?;

  println!("cargo:rerun-if-changed=REPO DIR");

  Ok(())
}

With all the packages specifiers collected, next is to figure out how to derive the hierarchy from it and write the pub mod accordingly. I initially thought about using a tree but eventually landed on a solution based on a conceptual “stack”, with all packages sorted, it’s the same as a depth-first tree traversal.

The stack contains “segments” of a package path, while iterating it first pops the stack to the “common ancestor” then pushs whatever segments left into the stack. With each push we open a module and with each pop we close one. This repeats for all packages and finally pop all the segments left to properly close all modules.

For example, when generating for parent.example the stack should contain ["parent", "example"], then for the next package parent.another the stack first pop to only ["parent"], closing pub mod example {, next push "another" into the stack and open the module pub mod another {, finally fill in the proper include macro.

This is the write_protos_rs function:

fn write_protos_rs(pkgs: HashSet<String>) -> Res {
  let ref mut protos_rs = String::new();

  let mut packages: Vec<String> = pkgs.into_iter().collect();
  packages.sort();

  let mut path_stack: Vec<String> = vec![];

  for pkg in packages {
    // find common ancestor
    let pop_to = pkg
      .split(".")
      .map(|seg| map_keyword(seg))
      .enumerate()
      .position(|(idx, pkg_seg)| {
        path_stack
          .get(idx)
          .map_or(true, |stack_seg| stack_seg != &pkg_seg)
      })
      .unwrap_or(0);

    // pop stack
    while path_stack.len() > pop_to {
      path_stack.pop();
      writeln!(protos_rs, "}}")?;
    }

    // now push stack
    for seg in pkg.split(".").skip(pop_to).map(|seg| map_keyword(seg)) {
      writeln!(protos_rs, "pub mod {} {{", &seg)?;
      path_stack.push(seg);
    }

    // write include_proto! inside module
    writeln!(
      protos_rs,
      "tonic::include_proto!(\"{}\");",
      path_stack.join(".")
    )?;
  }

  // pop all stack
  while path_stack.len() > 0 {
    path_stack.pop();
    writeln!(protos_rs, "}}").unwrap();
  }

  fs::write(format!("{}/protos.rs", env::var("OUT_DIR")?), protos_rs)?;

  Ok(())
}

The function map_keyword() is necessary because prost transforms file names and path components that conflict with Rust keywords such as type, this applies to both the include path as well as the module name. I simply copied over the code from prost-build.

There is another caveat at the time of writing - oneof field can conflict with embedded message names, I opened an issue in prost and I’m working on a PR. There is only one case in the repo I’m working against, and for my purpose I simply ignored (deleted) those protos, but I hope to get a working solution merged soon.

The Conclusion

The full build.rs file can be found here and with it all the generated code and modules can be imported by a single line in lib.rs:

include!(concat!(env!("OUT_DIR"), "/protos.rs"));

The generated code can also be segregated to a child module to avoid mixing with other code:

pub mod protos {
  include!(concat!(env!("OUT_DIR"), "/protos.rs"));
}

And that’s it! Now all messages and services are automatically generated, imported and ready for use. Hope this post is useful for those with similiar use case, but let me know if there are better solutions!

  1. I do think this would be a lot cleaner if integrated inside prost-build