A History of the Gladiator LeagueBot

Written by J. David Smith
Published on 25 July 2020

A bit over one week ago—July 16th—we launched the inaugural Gladiator league using my LeagueBot. Despite some initial difficulties with JMP quirks, Arena exports multiple cards with the same set code and collector number. For example: and Goblin Oriflamme and Rhox Faithmender both get exported as 1 ..... (JMP) 130. the launch was fairly smooth. In this short time we've had 251 league completions, Only decks with the full 5 matches played are counted here. which has been very gratifying for me. You see: while the LeagueBot may appear to have sprung into existence fully formed, this couldn't be further from the truth.

I want to take this opportunity to walk through the history of the bot. Its history is closely tied to my own involvement in singleton formats on Arena, and so at the same time I'm going to cover some Gladiator pre-history.

In the Beginning...

The first Singleton event I can find record of on MTG Arena took place from September 27th–October 1st, 2018. Cards in Format: 1,006 The event offered a play queue for 60-card no-banlist singleton games. Wizards would repeat this event later that year from December 3rd–7th and again two months later from February 14th–18th, 2019. This is where my involvement begins.

I created a Discord server for Singleton shortly after Valentine's Day event ended and posted about it on the MagicArena subreddit. At this point in time, I stuck with the 60-card format due to the small number of cards in the format. Cards in Format: 1,526 Given that Petitioners and Rats were the two most popular (and most annoying) decks in each Singleton event, I might have also banned them to try to encourage people to give it a chance—but I honestly don't remember.

This discord never really took off, and while I will never know for sure why that happened, I do know one reason. This is also the point at which I built the precursor to the LeagueBot we all know and love: the Sharktocrab Bot. I'd intended to build it out into a LeagueBot inspired by Penny Dreadful, but ended up being overrun with work shortly after and never quite got around to it. At this point in time, I had initial plans for how to record matches in a safe manner, but hadn't figured out some details like deck submission (would we use the rising star AetherHub? the old guard Goldfish?). However, I also made a fatal mistake with this bot.

Those of you that've used Discord for a while have probably seen the "React to get speaking permissions" gatekeeping bots. That is one of the few features I got implemented into Sharktocrab—and it worked! Except I would occasionally get DMs about how it didn't—DMs that I could never reproduce. Later, I realized the likely culprit was my use of Heroku's free hosting—which, though free, comes with the caveat that your app may be put to sleep if there is no activity. While I can't know for sure, I believe that this is the cause of those unreproducible bug reports.

Enter: Arena Highlander

Between my own lack of time and the obvious failure to start, the server rotted away for the better part of the year. Then, on September 28th, 2019 corporat created the Historic Canadian Highlander discord, and advertised it on reddit a short time later. Cards in Format: 2,038
Petitioners and Rats were both on the initial points list, effectively banning those decks for much the same reason that I had previously.
The first tournament ran that weekend, a round-robin with 6 participants that was won by Oogablast with a 4-1. Tournament Results:
4-1 Oogablast
3-2 Roseface
3-2 Jesse
2-3 Corporat
2-3 Bertxav
1-4 Yuin

Within two weeks, a pool of regulars was already forming. You'll likely recognize some of them, like Roseface and GoblinMatron, from Gladiator discord. Wheeler joined fairly early on, but wasn't an active participant in the format til much later. I joined about 3 weeks into the format, and won a tournament for the first (and only) time playing Abzan Field of the Mythics. The deck wasn't actually very good, but it and tournament results around this time highlight the problems with 100-card singleton with a small card pool: at some point you just need good stuff to put in your deck. Midrange was clearly very good, with a couple of aggro decks with good top-ends (Gruul, Boros, Mono-Black) also putting up results.

Anyway, this core of players formed the bulk of tournament participation over the next few months. Participation ebbed and flowed, but never really took off. To corporat's credit, he was actively trying to generate & maintain interest in the format with both weekly tournaments and periodic posts about said tournaments on reddit. While the weekly tournaments certainly helped maintain interest, there was a chronic lack of pick-up games during the week that, in my opinion, were both a cause for and caused by the limited number of players.

The Wheeler Effect and the Birth of the LeagueBot

All that changed on April 19th, 2020, Cards in format 2,622 when Wheeler streamed Arena Highlander (then called Historic Highlander) for the first time. The format exploded. The discord gained hundreds of users basically overnight. At the same time, I had just graduated, COVID-19 had frustrated my job search, and I suddenly found myself with copious free time. With the sudden influx of players, I decided to finally finish what I'd started more than a year prior and build the LeagueBot.

During that year, designs for the bot had been in the back of my mind with some regularity. I had mapped out the flows for match reporting, and had ideas for the registration process. Putting those designs into practice was mostly straightforward, with Discord itself giving me the ultimate assist by making uploads so seamless that I could use them for deck registration. One week later, the week-long April 2020 Test League went live—but not before the format split.

I was not involved in many of the conversations preceding the split between Arena Highlander and Gladiator, so I'll refrain from commenting on it. I will say, though, that the timing was incredibly frustrating. I began work on the bot on April 20th, and had made substantial progress on it by the time that the split happened on the 22nd. While I'd hoped that the launch of the League on the 26th would help drive continued interest in Arena Highlander, it became apparent within about a month that the bulk of the new blood had followed Wheeler to Gladiator. To his credit: there was a clear effort to accomodate the existing Arena Highlander format. For example, Gladiator tournaments were deliberately not run at the same time as the existing weekly Arena Highlander tournament, despite Saturday afternoon being an excellent time for tournaments across time zones.

From the League FAQ: an attempt to encourage Gladiator players to participate in the Arena Highlander League. Can you use your Gladiator Deck in the Arena Highlander League? Yes!

Transitioning to Gladiator

Around this time, I took a break. I found work, played other games, and let my frustration with the failure of the league and with Magic in general WAR–IKO was an awful time to be playing Magic. subside. I popped into the Gladiator discord in early June when there was discussion of starting a league, but aside from that was largely uninvolved.

Then, on July 7th I (finally) reached out to Roseface about moving the LeagueBot over to Gladiator. The only thing that really needed changing was a few names that read "Arena Highlander" and the domain name it used. Two days later, I heard back: 👍. The plan was to launch on July 16th to coincide with the release of Jumpstart—giving just over a week to switch things over. Cards in Format: 3,109

This was plenty of time, and I had the opportunity to add in a couple of new features like the API. Aside from the previously mentioned issues with Jumpstart card exports, launch went smoothly. Seeing the Gladiator community jump into the league and play more than I'd ever expected was…nice. I'll take it.


You might have noticed that throughout this I've tracked the number of cards in the format. In just over a year, the number of cards on Arena has doubled. With remastered Pioneer sets, it may well double again by next year.

I think this is relevant to understanding the evolution of the greater Arena-singleton format. For instance: prior to ELD, a 100-card singleton deck would include anywhere from 3-5% of all cards on Arena, 60/2,038 is about 3%, 100/2,038 is about 5%. A typically Gladiator deck has around 60 non-lands, and 3c decks may legitimately play 90+ nonbasic cards. which means you would often dip into borderline (or even outright) draft chaff for playables, especially in mono-colored decks. This means that your card quality was relatively low compared to now (when one plays 2-3% of all cards), and led to an abundance of 2-3c Goodstuff decks along with legitimate concerns that the consistency of Petitioners / Rats decks could be problematic despite the obviously low card quality. I'm glad those days are behind us, and am looking forward to seeing how the increasing card pool lets the format further diversify.

Missing the ASONAM'19 Deadline

Written by J. David Smith
Published on 07 May 2019

In late January, I started working on a new research project. Without getting into too much detail, it involves network analysis on blockchain-based systems. My opinion on blockchain tech has not noticeably improved.After some initial results, my advisor & I decided to target ASONAM 2019 for publication, which has submissions in mid-late April. The exact, final deadline fell on the 28th this year.

I didn't make it.

I would like to briefly explore the reasons that I missed the deadline. Like prior deadlines I've missed, there is some element of research being simply unpredictable. However, there are larger errors that I made that would be valuable to not repeat.

I Was Indecisive

The first, and in my opinion largest, mistake boils down to indecision. While the project started in late January, I never committed to a single research question. Rather, I continued with various bits of exploratory analysis up until very shortly before the deadline.

While much of this exploratory analysis holds small bits of value, it doesn't come together as a cohesive whole. It isn't a paper. I knew this, experientially, from the project I spent most of last year on. In that project, I did a lot of initial analysis that never ended up in the paper despite informing my understanding of the problem space. Some of that will be going back in as I finish dealing with revisions for journal publication in the next couple of weeks. Despite knowing that much of my exploratory work would find its final resting place on the cutting room floor, I persisted with it up until about 12 days before the submission deadline.

Some of these results were quite strong, One of our key results is that prior network analysis had missed a key behavior native to Bitcoin (one-time-use change addresses), which skew the "user" network substantially unless corrected for. which gave me hope that I'd still be able to put together something compelling. Had the small extension I spent several of those 12 days on panned out, I might be writing a different post. However: it didn't, and I'm writing this post because those results were not enough, and I didn't commit to this idea until it was too late to fully realize it.

I Was Passive

As I worked through my analysis and began forming the project, it became clear (to me) that the initial direction that I'd worked out with Dr. Thai wasn't going to be viable. There were several alternatives, and earlier in the semester I suggested each of them in turn to Dr. Thai in our meetings.

However, I didn't do so forcefully. I'm saying now that the direction was inviable, but in early March I was not nearly so clear. I danced around the point, and when she pushed back against it I wilted. Had I been clear, I could've easily been spending six-to-eight weeks fleshing out one of these more promising alternatives. It may still not have panned out, but it at least wouldn't have felt like as much of a waste.

I Did Not Understand the Limitations of My Tools

Over the past year, I have spent a lot of time working within the pandas ecosystem. It's been great! Pandas is a great library for many tasks, with a flexible API that has allowed me to do a lot of analysis much more quickly than I would've been able to otherwise—especially when paired with plotnine to quickly generate complex visualizations. However, pandas has a bit of a problem with large data.

Specifically: pandas' memory usage is highly variable, difficult to predict, and impossible to control. An operation may have minimal memory overhead and take less than a second to compute—but a small modification to it may instead take hours to run and result in a deep copy of some or all of the data. When the first copy of the data clocks in at 120GB, doing a deep copy automatically slows things to a crawl, and very quickly led to OOMing the server. The most common culprit was the .groupby(...) method, though I had issues with some chained aggregations via .apply(...) as well. Unfortunately, groupby(...) is a fundamental operation necessary for my work, so many of attempts to finalize results in the final days before the deadline simply fell apart.

Long nights and wasted hours could've been recovered if I'd realized the cause of these memory issues. While I'd encountered performance issues with pandas before, I had largely attributed them to hitting slow paths in what is ultimately a Python library. During this project, I stumbled upon this post by Pandas creator Wes McKinney that hits on the reasons for many of the issues I've faced. As useful as Pandas is, it became clear to me that it currently isn't going to be a viable option for analysis of this particular dataset. Not that I'm giving up on Pandas. In fact, I still use it heavily. Rather, I now am better-equipped to identify which problems it is ill-suited for. This one in particular has (a) a large memory footprint, and (b) a heavy reliance on groupby(...) operations for reasons intrinsic to the data. The combination of these two means that pandas is simply not the right tool

Ultimately, I ended up rewriting many of these bits of analysis in Rust and doing the aggregations manually, then loading and plotting the results. These simple Rust programs took were not too difficult to write thanks to the hdf5 and csv libraries, and even with repeated data loading they are substantially faster than my Python/pandas code. This let me complete part of my analysis, but ultimately I lost too much time to struggling with MemoryErrors to be able to complete all of it.

Wrapping Up

Despite this failure, I'm not particularly upset. I am frustrated, but am trying to channel this frustration into dealing with the problems I faced productively. I am particularly glad that I have had an excuse to basically ignore research work for the past week, between grading exams and preparing to teach a summer class. It has given me time to reflect on the factors that led to this failure and realize that—even though it is my fault—it is something that I can learn from and improve on subsequent papers.

Brewmaster: Legion in Review & BfA Wishlist

Written by J. David Smith
Published on 12 February 2018

I rerolled from my Hunter (MM lyfe) to a Brewmaster shortly after the Nighthold released, and holy hell I didn't expect this ride. I was going to just play my already-110 BDK, but our other tank was a BDK and after the incident with Cenarius A bug/weird interaction on Cenarius removed Marrowrend stacks much faster than they could be applied. This made tanking Cenarius pre-fix on a BDK really awkward and more difficult than necessary. the raid team was kind of sketchy on having two of the same tank. So, I powerleveled my formerly-Windwalker Monk from 88 to 110 over a weekend and started tanking.

Since then, I've become active in the #brewlounge and #brewcraft channels in the Peak of Serenity Discord, tanked a bit of Mythic 5/9M T20. Less than I'd have liked., and picked up maintenance of the WoWAnalyzer modules for Brewmaster. All in the space of less than a year. Whew.

Now BfA has been announced and changes are rolling out on the alpha. Since Brewmasters have received no super noteworthy changes yet, it's the perfect time for a wishlist! Before we get to that, however, I'd like to break down what the Brewmaster kit looks like right now so we have some context for my BfA wishlist.

What We Have

Stagger & Mastery

While meatballs Guardian Druids have held the title of "Dodge Tank" in the past, it now firmly belongs to Brewmasters. It is not uncommon to see people reach 70% or more dodge rates in practice thanks to our Mastery: Elusive Brawler. EB provides a stacking bonus to our dodge chance each time we either fail to dodge a dodgeable ability or cast certain spells. The current list is Blackout Strike, Breath of Fire, and (with Blackout Combo and rarely used) Blackout Strike + Purifying Brew. This mastery performs three functions: first, it boosts our overall dodge percentage; second, it allows use to consistently deal with dodgeable mechanics through proactive play (banking EB-generating abilities); third, it mitigates downsides of randomness-driven mitigation.

That last point is fundamental. Historically, dodge-based mitigation has been much inferior to reduction-based mitigation due to shear randomness: if you fail dodge several hits in a row, you can simply die. Those more familiar with pre-MoP tanking may be confused by this. Dodge was a good stat! It is more useful to compare Dodge for brewmasters with Defense prior to its removal: defense capping prevented crushing blows from randomly spiking you. Reaching the defense cap was considered mandatory. Brewmasters (along with VDH) have by far the least armor of any tank (I have 2,940 armor without trinkets, while a similarly geared DK has 5k+ and a Paladin has 7k+), meaning that pre-Stagger melee hits are much larger. Mastery prevents this from happening altogether. At current gear levels, many BrMs are guaranteed to dodge at least every 3rd hit without taking any actions. With the low cooldown of Blackout Strike, this gets close to a guaranteed every-other-hit dodge rate.

While Elusive Brawler significantly smooths our damage intake from melees, without further mitigation we would still be extremely spikey. Smooth damage intake can be healed much more efficiently, lowering the strain on your healer's mana and reaction times. Stagger, our second "mitigation" method not only addresses this, but in fact has led us to be the smoothest tank in terms of damage-intake despite our reliance on the inherently random nature of dodge. Stagger causes every hit to be partially absorbed (currently 75% with Ironskin Brew up and no specific talents or gear), with the absorbed damage spread over the next ten seconds. The exact mechanics of it aren't quite straightforward. emptyrivers has an excellent overview of how it works on the Peak of Serenity. Functionally, this means that every would-be spike of damage instead is translated into a much lower damage-taken-per-second effect.

Despite not reducing total damage taken, Stagger enabled ludicrous cheesing Creative strategies that make otherwise-difficult mechanics nearly non-existant. early in the expansion. A cap on the total staggered damage (at most 10x player HP) was patched in after Tier 19 ended. Though it doesn't affect normal gameplay, it does mean that previous cases of stagger cheesing were much riskier if not impractical.

On the whole, Elusive Brawler and Stagger together give Brewmasters a unique way of smoothing damage taken – turning what would otherwise be the spikiest tank into the smoothest. In particular, Elusive Brawler gives rise to gameplay that no other tank presently has. If you haven't known the joy of tanking Tyrannical Blackrook Hold as a BrM: Smashspite's charge can be dodged. It still applies stacks, and will rapidly reach the point of one-shotting you through any and all cooldowns. This leads to an intense do-or-die minigame.

Brews & CDR

The second, active, half of Brewmasters' mitigation toolkit comes in the form of Ironskin Brew & Purifying Brew. Ironskin Brew increases the amount of damage absorbed by Stagger from 40% to 75%. Note again that this does not mitigate anything. Our active mitigation is instead packed into Purifying Brew, which removes 40% (without traits or talents) the damage that has been absorbed by Stagger but not yet dealt over time. These two abilities share 3 charges with a cooldown of 21 seconds each (reduced by haste). However, Ironskin Brew only lasts for 8 seconds, meaning that based on these charges alone we are not capable of maintaining the buff – let alone Purifying.

However, this cooldown is reduced significantly by correctly performing the conventional rotation since key rotational abilities (Tiger Palm and Keg Smash) reduce the cooldown of brews by a significant amount. This situation is further helped by the presence of the (mandatory) Black Ox Brew talent, which restores all 3 charges on use, on an effective 40-45 second cooldown. BoB benefits from the cooldown reduction of TP and KS. Combined, this gives each Brew charge an effective cooldown of around 6-7 seconds. Or, put another way: exactly enough to keep Ironskin Brew up 100% of the time and have occasional charges to spend on Purification.

This system sounds great. It leaves us with manageable DTPS over the course of a fight, while allowing us to mitigate large spikes such as Fel Claws with Purifying Brew. Unfortunately, it suffers from a fundamental flaw: prior to Tier 20, there was no cap on the duration of the Ironskin Brew buff. This led to effective BrMs ending fights with several minutes of time remaining on the buff, while still having plentiful brews to spend on-demand for Purification. In response to this, Blizzard implemented a cap of three times the buff duration (24 seconds with traits). This turned the previously-comfortable Brew system into one with an annoying buff to maintain and another button you rarely use. And when I say rarely, I mean rarely: pre-Mythic it is possible to go entire fights without needing to Purify due to low damage intake, and on Mythic bosses Purification may only occur a couple of times a minute. This leads to an unsatisfying situation where we have to jump through a bunch of hoops to hit 100% ISB uptime – which is nearly mandatory to avoid being DK- or DH-levels of spikey without the self-sustain, cooldowns, or utility to compensate – but then have excess charges that we worked to generate but can't use effectively.

On the whole, I like the Brew/CDR system. However, its flaws are noticeable and have no easy solutions. In conjunction with this system, Stagger has been on the razor's edge between hopelessly busted and borderline useless since its value fundamentally scales with the amount of incoming damage. On high damage fights, Purifying Brew provides excellent mitigation while Ironskin Brew keeps our damage intake smooth and easily healable. Below that point, Purifying Brew becomes worthless because we cannot reach a high enough level of staggered damage for purifying to be worth more than pressing Ironskin Brew again.

The UI

I won't spend long on this, but the current UI situation for Brewmasters is preeeeettty bad. The maintenance buff of Ironskin Brew is difficult to track on the default UI – a situation made worse by the fact that dropping it can lead to a quick, untimely demise. Worse: the Stagger bar caps out at 100% of your HP, which is quite frankly nothing. It is common for experienced brewmasters on #brew_questions to recommend not purifying at less than 100% of your HP staggered.

In Summary...

...Brewmasters are really good. They're a strong tank class for challenging content, fun to play, and have a distinct playstyle not shared with other classes. While flaws exist, the spec works despite them. There is still room for improvement, however, so now we get on to the part you're really reading this for: what I want in Battle for Azeroth.

What I Want

1. Death to Ironskin

The biggest complaint I have with the playstyle of BrM is the annoyance that is ISB. It is only interactive insofar as it requires not pressing Purify too much and executing your rotation well. Pressing the ISB button is merely an annoyance that is necessary to confirm you've done the other two things.

As we saw in Nighthold, we clearly can't leave it uncapped either. So, what do we do? An oft-suggested option is to make ISB baseline and reduce the CDR available for Purifying Brew. However, this seems unlikely to be effective either given how infrequently we have to Purify. It is clear from the Legion Beta posts that they intended for Ironskin Brew to be used as an active mitigation ability like Shield Block, but doing so left Brewmasters uncomfortably spikey in the interim.

My main issue with ISB as it stands is the needlessness of pressing that extra button during an already high-APM rotation. I have logs with negative overall downtime since brews are off the GCD. What if Ironskin Brew were just removed, an equivalent and appropriately-themed buff given to Tiger Palm / Keg Smash, and Purifying Brew's cooldown reduced to compensate? While clearly not a perfect solution (it doesn't solve the fundamental balance issues with Stagger, for example), it does eliminate my major annoyance with current Brewmastery.

2. Better Utility

You might notice that in the What We Have segment I never once mentioned utility. While we have some unique utility currently, it mostly boils down to (a) cheesing fights with Dodge & Stagger, and (b) Ring of Peace – both of which are highly situational. This is in stark contrast to Meatball's Savage Roar, Protadin's Healing & Bubbles, and Blood DKs. My wish for BfA here is just to have a bit to set us apart from other tanks.

On the BfA Alpha, we're getting a bit of "extra" utility in the form of Leg Sweep being made baseline. This is supported by AoE stuns being stripped from other classes, which means that the presence of Leg Sweep in our kit becomes more noticeable – especially if we are allowed to go into BfA with both Ring of Peace and Leg Sweep available together.

3. Revisiting Dead Talents

On the whole, Brewmasters don't have many talents that are just totally completely unusable garbage. Even Chi Wave, which is used in neither dungeons nor raid content, can be a decent talent out in the world. I've used every talent we have at some point for a reason beyond just trying it out – every talent, that is, except for the Level 45 Row.

The Level 45 row has exactly 1 talent in it: Black Ox Brew. Don't @ me. I will fight you. BoB is better, both empirically and in feelycraft, than either of the alternatives in every feasible situation. Going into BfA, I'd like to see that change. That may mean buffing the alternatives, I'd love to have access to proper self-sustain through a buffed GotM but more realistically it means that BoB just needs to get hit. The cooldown reduction it provides is simply too high – it is both immensively powerful at smoothing an entire fight due to its low cooldown, and excellent for dealing with specific mechanics like KJ's Fel Claws.

4. (Just for Me) Better Transparancy

As I mentioned in the intro, I maintain the WoWAnalyzer modules for Brewmaster. This has been a lot of fun: the dev team is fantastic, #brewcraft has been super helpful in filling in the gaps in my knowledge, and the codebase is generally nice to work with. That said, it has not been without frustrations. Stagger in particular has been an immense nuisance. I rewrote what the previous maintainer (WOPR) had done so that I could add some extra features that were infeasible before, but that implementation sometimes breaks down for no apparent reason.

In BfA, I'd like to see the amount of staggered damage reported in the logs. There is already a classResources field that records things like energy, runes, mana, etc. While there may be technical limitations to adding Stagger to that list, I'd very much like to see it happen. It'd make my job significantly easier.

Wrapping It Up

Playing Brewmaster during Legion has been one of the highlights of my WoW career, and I'm excited to see where we end up in BfA. Although I'm obviously hoping that not much changes since BrM is currently a lot of fun, I am hopeful that we can have some wrinkles ironed out in the transition to the next expansion.

Got Parameters? Just Use Docopt

Written by J. David Smith
Published on 07 September 2017

It's one of those days where I am totally unmotivated to accomplish anything (despite the fact that I technically already have – the first draft of my qual survey is done!). So, here's a brief aside that's been in the back of my mind for a few months now.

It is extremely common for the simulations in my line of work Or our, hi fellow student! to have a large set of parameters. The way that this is handled varies from person to person, and at this point I feel as though I've seen everything; I've seen simple getopt usage, I've seen home-grown command-line parsers, I've seen compile-time #defines used to switch models! fml Fig 1: Me, reacting to #ifdef PARAM modelA #else modelB #endif Worse, proper documentation on what the parameters mean and what valid inputs are is as inconsistent as the implementations themselves. Enough. There is a better way.

Docopt is a library that is available in basically any language you care about This includes C, C++, Python, Rust, R, and even Shell! Language is not an excuse for skipping on this. that parses a documentation string for your command line interface and automatically builds a parser from it. Take, for example, this CLI that I used for a re-implementation of my work on Socialbots: See here for context on what the parameters (aside from ζ, which has never actually been used) mean.

Simulation for <conference>.

    recon <graph> <inst> <k> (–etc | –hmnm | –zeta <zeta>) [options]
    recon (-h | –help)

    -h –help                   Show this screen.
    –etc                       Expected triadic closure acceptance.
    –etc-zeta <zeta>           Expected triadic closure acceptance with ζ.
    –zeta <zeta>               HM + ζ acceptance.
    –hmnm                      Non-Monotone HM acceptance.
    –degree-incentive          Enable degree incentive in acceptance function.
    –wi                        Use the WI delta function.
    –fof-scale <scale>         Set B_fof(u) = <scale> B_f(u). [default: 0.5]
    –log <log>                 Log to write output to.

This isn't a simple set of parameters, but it is far from the most complex I've worked with. Just in this example, we have positional arguments (<graph> <inst> <k>) followed by mutually-exclusive settings (–etc | –hmnm | ...) followed by optional parameters ([options]). Here is how you'd parse this with the Rust version of Docopt:

const USAGE: &str = ""; // the docstring above

#[derive(Serialize, Deserialize)]
struct Args {
    // parameter types, e.g.
    arg_graph: String,
    arg_k: usize,
    flag_wi: bool,
    // ...

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.deserialize())
                            .unwrap_or_else(|e| e.exit());

This brief incantation:

  1. Parses the documentation string, making sure it can be interpreted.
  2. Correctly handles using recon -h and recon –help to print the docstring.
  3. Automatically deserializes every given parameter.
  4. Exits with a descriptive (if sometimes esoteric, in this implementation) error message if a parameter is missing or of the wrong type.

The same thing, but in C++ is:

static const char USAGE[] = R""; // the docstring above

int main(int argv, char* argv[]) {
    std::map<std::string, docopt::value> args 
        = docopt::docopt(USAGE, 
                         {argv + 1, argv + argc},
                         "Version 0.1");

Although in this version type validation must be done manually (e.g. if you expect a number but the user provides a string, you must check that the given type can be cast to a string), this is still dramatically simpler than any parsing code I've seen in the wild. Even better: your docstring is always up to date with the parameters that you actually take. Of course, certain amounts of bitrot are always possible. For example, you could add a parameter but never implement handling for it. However, you can't accidentally add or rename a flag and then never add it to the docstring, which is far more common in my experience. So – for your sanity and mine – please just use Docopt (or another CLI-parsing library) to read your parameters. These libraries are easy to statically link into your code (to avoid .dll/.so not found issues), and so your code remains easy to move from machine to machine in compiled form. Please. You won't regret it.

Rusting My (Academic) Code

Written by J. David Smith
Published on 23 February 2017

A couple of weeks ago I mentioned several methods that I'd been (trying) to use to help improve the verifiability of my research. This is the first in an (eventual) series on the subject. Today's topic? Rust.

One of the reasons it took me so long to write this post is the difficulties inherent in describing why I chose Rust over another, more traditional language like Java or C/C++. Each previous time, I motivated the selection by first covering the problems of those languages. I have since come to realize that that is not a productive approach – each iteration invariably devolved into opining about the pros and cons of various trade-offs made in language design. In this post, I am instead going to describe the factors that led to me choosing Rust, and what practical gains it has given me over the past two projects I've worked on.

In the Beginning...

First, some background on who I am, what I do, and what constraints I have on my work. I am a graduate student at the University of Florida studying Optimization/Security on Online Social Networks under Dr. My T. Thai. Most problems I work on are framed as (stochastic) graph optimization problems, which are almost universally NP-hard. We typically employ approximation algorithms to address this, but even then the resulting algorithm can still be rather slow to experiment with due to a combination of difficult problems, large datasets and the number of repetitions needed to establish actual performance.

This leads to two often-conflicting constraints: the implementations musts be performant to allow us to both meet publication deadlines and compete with previous implementations, and the implementations must also be something we can validate as correct. Well, ideally. Often, validating code is only done by the authors and code is not released. Most code in my field is in either C or C++, with the occasional outlier in Java when performance is less of a concern, in order to satisfy that first constraint. However, after repeatedly having to work in other's rushed C/C++ codebases, I got fed up. Enough of this! I thought, There must be something better!

I set about re-implementing the state-of-the-art approximation algorithm (SSA) for the Influence Maximization problem. Hung Nguyen, Thang Dinh, My Thai.
“Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-Scale Networks.”
In the Proceedings of SIGMOD 2016.
This method has an absurdly-optimized C++ implementation, using all sorts of tricks to eek out every possible bit of performance. Solving this problem – even approximately – on a network with 40+ million nodes is no small feat, and the implementation shows the effort they put into getting there. This implementation formed my baseline both for performance and correctness: every language I rewrote it in should produce (roughly) identical solutions, The algorithm uses non-deterministic sampling, so there is room for some error after all the big contributors have been found. and every feasible language would have to get in the same ballpark in terms of memory usage and performance.

In my spare time over the space of a couple of months, I implemented SSA in Chez Scheme, Clojure, Haskell, OCaml and, eventually, Rust. My bent towards functional programming shows clearly in this choice of languages. I'm a lisp weenie at heart, but unfortunately no Scheme implementation I tried nor Clojure could remotely compete with the C++ implementation. Haskell and OCaml shared the problem that graph processing in purely-functional languages is just an enormous pain in the ass, in addition to not hitting my performance goals Some of this is due to my unfamiliarity with optimizing in these languages. However, I didn't want to get into that particular black magic just to use a language that was already painful for use with graphs. – though they got much closer. Then I tried Rust. Holy. Shit.

The difference was immediately apparent. Right off the bat, I got performance in the ballpark of the baseline (once I remembered to compile in release mode). It was perhaps 50% slower than the C++ implementation. After a few rounds of optimization, I got that down to about 20-25%. Unfortunately, I no longer have the timing info. As that isn't the focus of this post, I also haven't re-run these experiments. Had I been willing to break more safety guarantees, I could have applied several of the optimizations from the baseline to make further improvement. Some of this performance is due to the ability to selectively use mutability to speed up critical loops, and some is due to the fact that certain graph operations are simply easier to express in an imperative fashion, leading me to write less wasteful code. What's more: it also has a similar type system – with its strong guarantees – to the system of ML-family languages that I adore, combined with a modern build/packaging tool and a thriving ecosystem. It seemed that I'd found a winner. Now, for the real test.

The First Paper: Batched Stochastic Optimization

I was responsible for writing the implementation for the next paper I worked on. The code – and paper – aren't out yet. I'll hear back in early March as to whether it was accepted. We were extending a previous work Xiang Li, J David Smith, Thang Dinh, My T. Thai. “Privacy Issues in Light of Reconnaissance Attacks with Incomplete Information.”
In the Proceedings of WI 2016.
to support batched updates. As the problem was again NP-hard with dependencies on the ordering of choices made, this was no small task. Our batching method ended up being exponential in complexity, but with a strict upper bound on the size of the search space so that it remained feasible. This was my first testbed for Rust as a language for my work.

Immediately, it paid dividends. I was able to take advantage of a number of wonderful libraries that allowed me to jump straight from starting work to implementing our method. Performance was excellent, parallelism was easy, and I was able to easily log in a goddamn parseable format. It was wonderful. Even the time I spent fighting the borrow checker A phase I will note was very temporary; I rarely run into it anymore. failed to outrun the benefits gained by being able to build on others' work. I had a real testing framework, which caught numerous bugs in my batching code. Even parallelism, which generally isn't much of a problem for such small codebases, was no harder than applying OpenMP. Indeed, since I needed read-write locks it was in fact easier as OpenMP lacks that feature (you have to use pthread instead). It was glorious, and I loved it.

Still do, to be honest.

The Second Work: Generic Analysis

The next paper I worked on was theory driven. I showed a new method to estimate a lower bound on approximation quality for a previously unstudied Not exactly new, but general bounds for this class of problem had not really been considered. As a result, the greedy approximation algorithm hadn't seen much use on it. class of problems. The general idea is that we're maximizing an objective function f, subject to some constraints where we know that all maximal solutions have the same size. I needed to be able to transparently plug in different values of f, operating on different kinds of data, with different constraints. Further, for efficiency's sake I also needed a way to represent the set of elements that would need updating after each step.

Rust gave me the tools to do this. I defined a trait Objective to abstract over the different possible values of f. Each implementor could handle building their own internal representation, and with associated types I could easily allow each kind to operate on their own kind of data.

It worked. Really well, actually.

I wrote a completely generic greedy algorithm in terms of this trait, along with some pretty fancy analysis. Everything just...worked, and with static dispatch I paid very little at runtime for this level of indirection. At, least, as soon as it compiled.

Not All Fun & Games

As great as my time with Rust has been, there are still a few flaws I feel compelled to point out.

Borrow Checking is Sometimes Extremely Painful

While in most cases the borrow checker became something I dealt with subconsciously, in a few it was still an extraordinary pain in the ass. The first, and what seems to be the most common, is the case of having callbacks on structs. Some of this comes down to confusion over the correct syntax (e.g. Fn(...) -> X or fn(...) -> X), but I mentally recoil at the thought of trying to make a struct with callbacks on it. This was not a pleasant experience.

The second arose in writing the [constructor][ssabuilder] for an InfMax objective using the SSA sampling method. There are multiple different diffusion models for this problem, each represented as a struct implementing Iterator. I wanted my InfMax struct to own the Graph object it operated on, and pass a reference of this to the sampling iterator. The borrow checker refused.

To this day, I still don't know how to get that to work. I ultimately caved and had InfMax store a reference with lifetime 'a, giving the sampler a reference with a lifetime 'b such that 'a: 'b (that is, 'a is at least as long as 'b). While this workaround took only a moderate amount of time to find, I still lost quite a bit trying to get it to work as I wanted. Giving InfMax a reference never caused any issues, but I still find it annoying.

Trait Coherence

In order to use the bit-set library, I wanted to force the Element associated type of each Objective to satisfy Into<usize>. The NodeIndex type from petgraph does not, but has a method fn index(self) -> usize. Due to the rules about when you can implement traits (namely, that you can't implement traits when both the trait and the type are non-local), this was impossible without forking or getting a PR into the petgraph repository.

Forking was my ultimate solution, and hopefully a temporary one. There isn't a clear way to work around this Implementing a helper trait was frustrating because you can't implement both From<T: Into<usize>> for Helper and From<NodeIndex> for Helper, presumably because at a future point NodeIndex could implement From/Into<usize>?, and yet there also isn't a clear way to make this work without allowing crates to break each other.

Fortran FFI & Absent Documentation

I almost feel bad complaining about this, because it is so, so niche. But I'm going to. I had to interface with a bit (read: nearly 4000 lines) of Fortran code. In theory, since the Fortran ABI is sufficiently similar to C's, I ought to be able to just apply the same techniques as I use for C FFI. As it happens, this is mostly correct. Mostly.

Fortran FFI is poorly documented for C, so as one might imagine instructions for interfacing from Rust were nearly absent. It turns out that, in Fortran, everything is a pointer. Even elementary types like int. Of course, I also had the added complexity that matrices are stored in column-major order (as opposed to the row-major order used everywhere else). This led to a lengthy period of confusion as to whether I was simply encoding my matrices wrong, wasn't passing the pointers correctly, or was failing in some other way (like passing f32s instead of f64s).

It turns out that you simply need to flatten all the matrices, pass everything by pointer, and ensure that no call is made to functions with unspecified-dimensional matrices. Calling functions with matrices of the form REAL(N, M) works provided N and M are either variables in scope or parameters of the function.


The inability to multiply numbers of different types together without as is immensely frustrating. I know why this is – I've been bitten by a / b doing integer division before – but formulae littered with x as f64 / y as f64 are simply difficult to read. Once you've figured out the types everything needs to be, this can be largely remedied by ensuring everything coming into the function has correct type and pre-casting everything else. This helps dramatically, though in the end I often found myself just making everything f32 to save myself the trouble (as that was what most values already were). The inverted notation for things like pow and log (written x.pow(y) and x.log(y) in Rust) further hinders readability.

Verifiability & The Big Wins for Academic Use

I began this post by referencing the verifiability of simulations (and other code). Through all this, I have focused more on the usability wins rather than the verifiability. Why is that? Fundamentally, it is because code is only as verifiable as it is written to be. The choice of language only impacts this in how well it constrains authors to “clearly correct” territory. In comparison to C/C++, Rust has clear advantages. There is no risk of accidental integer division, very low risk of parallelism bugs due to the excellent standard primitives, and nearly-free documentation. Some commenting required. Accidentally indexing outside of bounds (a problem I've run into before) is immediately detected, and error handling is rather sane.

Further, the structure of Rust encourages modularization and allows formerly-internal modules to be pulled out into their own crates, which can be independently verified for correctness and re-used with ease – even without uploading them to crates.io. A concrete example of this is my Reverse Influence Sampling Iterators, which were originally (and still are, unfortunately) a local module from my most recent paper and have subsequently found use in the (re-)implementation of several recent influence maximization algorithms.

This modularity is, I think, the biggest win. While reviewing and verifying a several-thousand-line codebase associated with a paper is unlikely to ever be practical, a research community building a set of common libraries could not only reduce the need to continually re-invent the wheel, but also limit the scope of any individual implementation to only the novel portion. This may reduce codebase sizes to the point that review could become practical.

Case in Point: the original implementation of the TipTop algorithm was nearly 1500 lines of C++. My more recent Rust implementation is 253. Admittedly, this is without the weighted sampling allowed in the C++ implementation. However, that isn't hard to implement.

(Please forgive my lack of comments, that was tight paper deadline.)
This gain is due to the fact that I didn't have to include a graph representation or parsing, sampling, logging (the logs are machine-readable!), or command-line parsing.

In Conclusion...

For a while, I was worried that my exploration of alternate languages would be fruitless, that I'd wasted my time and would be stuck using C++ for the remainder of grad school. Words cannot describe how happy I am that this is not the case. Despite its flaws, using Rust has been an immeasurable improvement over C++ both in terms of productivity and sanity. Further, while verifiability is very much secondary in my choice of language, I believe the tools and safety that Rust provides are clearly advantageous in this area.