Got Parameters? Just Use Docopt

Written by J. David Smith
Published on 07 September 2017

It's one of those days where I am totally unmotivated to accomplish anything (despite the fact that I technically already have – the first draft of my qual survey is done!). So, here's a brief aside that's been in the back of my mind for a few months now.

It is extremely common for the simulations in my line of work Or our, hi fellow student! to have a large set of parameters. The way that this is handled varies from person to person, and at this point I feel as though I've seen everything; I've seen simple getopt usage, I've seen home-grown command-line parsers, I've seen compile-time #defines used to switch models! fml Fig 1: Me, reacting to #ifdef PARAM modelA #else modelB #endif Worse, proper documentation on what the parameters mean and what valid inputs are is as inconsistent as the implementations themselves. Enough. There is a better way.

Docopt is a library that is available in basically any language you care about This includes C, C++, Python, Rust, R, and even Shell! Language is not an excuse for skipping on this. that parses a documentation string for your command line interface and automatically builds a parser from it. Take, for example, this CLI that I used for a re-implementation of my work on Socialbots: See here for context on what the parameters (aside from ζ, which has never actually been used) mean.

Simulation for <conference>.

Usage:
    recon <graph> <inst> <k> (–etc | –hmnm | –zeta <zeta>) [options]
    recon (-h | –help)

Options:
    -h –help                   Show this screen.
    –etc                       Expected triadic closure acceptance.
    –etc-zeta <zeta>           Expected triadic closure acceptance with ζ.
    –zeta <zeta>               HM + ζ acceptance.
    –hmnm                      Non-Monotone HM acceptance.
    –degree-incentive          Enable degree incentive in acceptance function.
    –wi                        Use the WI delta function.
    –fof-scale <scale>         Set B_fof(u) = <scale> B_f(u). [default: 0.5]
    –log <log>                 Log to write output to.

This isn't a simple set of parameters, but it is far from the most complex I've worked with. Just in this example, we have positional arguments (<graph> <inst> <k>) followed by mutually-exclusive settings (–etc | –hmnm | ...) followed by optional parameters ([options]). Here is how you'd parse this with the Rust version of Docopt:

const USAGE: &str = ""; // the docstring above

#[derive(Serialize, Deserialize)]
struct Args {
    // parameter types, e.g.
    arg_graph: String,
    arg_k: usize,
    flag_wi: bool,
    // ...
}

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.deserialize())
                            .unwrap_or_else(|e| e.exit());
}

This brief incantation:

  1. Parses the documentation string, making sure it can be interpreted.
  2. Correctly handles using recon -h and recon –help to print the docstring.
  3. Automatically deserializes every given parameter.
  4. Exits with a descriptive (if sometimes esoteric, in this implementation) error message if a parameter is missing or of the wrong type.

The same thing, but in C++ is:

static const char USAGE[] = R""; // the docstring above

int main(int argv, char* argv[]) {
    std::map<std::string, docopt::value> args 
        = docopt::docopt(USAGE, 
                         {argv + 1, argv + argc},
                         true,
                         "Version 0.1");
}

Although in this version type validation must be done manually (e.g. if you expect a number but the user provides a string, you must check that the given type can be cast to a string), this is still dramatically simpler than any parsing code I've seen in the wild. Even better: your docstring is always up to date with the parameters that you actually take. Of course, certain amounts of bitrot are always possible. For example, you could add a parameter but never implement handling for it. However, you can't accidentally add or rename a flag and then never add it to the docstring, which is far more common in my experience. So – for your sanity and mine – please just use Docopt (or another CLI-parsing library) to read your parameters. These libraries are easy to statically link into your code (to avoid .dll/.so not found issues), and so your code remains easy to move from machine to machine in compiled form. Please. You won't regret it.