Rusting My (Academic) Code

Written by J. David Smith
Published on 23 February 2017

A couple of weeks ago I mentioned several methods that I'd been (trying) to use to help improve the verifiability of my research. This is the first in an (eventual) series on the subject. Today's topic? Rust.

One of the reasons it took me so long to write this post is the difficulties inherent in describing why I chose Rust over another, more traditional language like Java or C/C++. Each previous time, I motivated the selection by first covering the problems of those languages. I have since come to realize that that is not a productive approach – each iteration invariably devolved into opining about the pros and cons of various trade-offs made in language design. In this post, I am instead going to describe the factors that led to me choosing Rust, and what practical gains it has given me over the past two projects I've worked on.

In the Beginning...

First, some background on who I am, what I do, and what constraints I have on my work. I am a graduate student at the University of Florida studying Optimization/Security on Online Social Networks under Dr. My T. Thai. Most problems I work on are framed as (stochastic) graph optimization problems, which are almost universally NP-hard. We typically employ approximation algorithms to address this, but even then the resulting algorithm can still be rather slow to experiment with due to a combination of difficult problems, large datasets and the number of repetitions needed to establish actual performance.

This leads to two often-conflicting constraints: the implementations musts be performant to allow us to both meet publication deadlines and compete with previous implementations, and the implementations must also be something we can validate as correct. Well, ideally. Often, validating code is only done by the authors and code is not released. Most code in my field is in either C or C++, with the occasional outlier in Java when performance is less of a concern, in order to satisfy that first constraint. However, after repeatedly having to work in other's rushed C/C++ codebases, I got fed up. Enough of this! I thought, There must be something better!

I set about re-implementing the state-of-the-art approximation algorithm (SSA) for the Influence Maximization problem. Hung Nguyen, Thang Dinh, My Thai.
“Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-Scale Networks.”
In the Proceedings of SIGMOD 2016.
This method has an absurdly-optimized C++ implementation, using all sorts of tricks to eek out every possible bit of performance. Solving this problem – even approximately – on a network with 40+ million nodes is no small feat, and the implementation shows the effort they put into getting there. This implementation formed my baseline both for performance and correctness: every language I rewrote it in should produce (roughly) identical solutions, The algorithm uses non-deterministic sampling, so there is room for some error after all the big contributors have been found. and every feasible language would have to get in the same ballpark in terms of memory usage and performance.

In my spare time over the space of a couple of months, I implemented SSA in Chez Scheme, Clojure, Haskell, OCaml and, eventually, Rust. My bent towards functional programming shows clearly in this choice of languages. I'm a lisp weenie at heart, but unfortunately no Scheme implementation I tried nor Clojure could remotely compete with the C++ implementation. Haskell and OCaml shared the problem that graph processing in purely-functional languages is just an enormous pain in the ass, in addition to not hitting my performance goals Some of this is due to my unfamiliarity with optimizing in these languages. However, I didn't want to get into that particular black magic just to use a language that was already painful for use with graphs. – though they got much closer. Then I tried Rust. Holy. Shit.

The difference was immediately apparent. Right off the bat, I got performance in the ballpark of the baseline (once I remembered to compile in release mode). It was perhaps 50% slower than the C++ implementation. After a few rounds of optimization, I got that down to about 20-25%. Unfortunately, I no longer have the timing info. As that isn't the focus of this post, I also haven't re-run these experiments. Had I been willing to break more safety guarantees, I could have applied several of the optimizations from the baseline to make further improvement. Some of this performance is due to the ability to selectively use mutability to speed up critical loops, and some is due to the fact that certain graph operations are simply easier to express in an imperative fashion, leading me to write less wasteful code. What's more: it also has a similar type system – with its strong guarantees – to the system of ML-family languages that I adore, combined with a modern build/packaging tool and a thriving ecosystem. It seemed that I'd found a winner. Now, for the real test.

The First Paper: Batched Stochastic Optimization

I was responsible for writing the implementation for the next paper I worked on. The code – and paper – aren't out yet. I'll hear back in early March as to whether it was accepted. We were extending a previous work Xiang Li, J David Smith, Thang Dinh, My T. Thai. “Privacy Issues in Light of Reconnaissance Attacks with Incomplete Information.”
In the Proceedings of WI 2016.
to support batched updates. As the problem was again NP-hard with dependencies on the ordering of choices made, this was no small task. Our batching method ended up being exponential in complexity, but with a strict upper bound on the size of the search space so that it remained feasible. This was my first testbed for Rust as a language for my work.

Immediately, it paid dividends. I was able to take advantage of a number of wonderful libraries that allowed me to jump straight from starting work to implementing our method. Performance was excellent, parallelism was easy, and I was able to easily log in a goddamn parseable format. It was wonderful. Even the time I spent fighting the borrow checker A phase I will note was very temporary; I rarely run into it anymore. failed to outrun the benefits gained by being able to build on others' work. I had a real testing framework, which caught numerous bugs in my batching code. Even parallelism, which generally isn't much of a problem for such small codebases, was no harder than applying OpenMP. Indeed, since I needed read-write locks it was in fact easier as OpenMP lacks that feature (you have to use pthread instead). It was glorious, and I loved it.

Still do, to be honest.

The Second Work: Generic Analysis

The next paper I worked on was theory driven. I showed a new method to estimate a lower bound on approximation quality for a previously unstudied Not exactly new, but general bounds for this class of problem had not really been considered. As a result, the greedy approximation algorithm hadn't seen much use on it. class of problems. The general idea is that we're maximizing an objective function f, subject to some constraints where we know that all maximal solutions have the same size. I needed to be able to transparently plug in different values of f, operating on different kinds of data, with different constraints. Further, for efficiency's sake I also needed a way to represent the set of elements that would need updating after each step.

Rust gave me the tools to do this. I defined a trait Objective to abstract over the different possible values of f. Each implementor could handle building their own internal representation, and with associated types I could easily allow each kind to operate on their own kind of data.

It worked. Really well, actually.

I wrote a completely generic greedy algorithm in terms of this trait, along with some pretty fancy analysis. Everything just...worked, and with static dispatch I paid very little at runtime for this level of indirection. At, least, as soon as it compiled.

Not All Fun & Games

As great as my time with Rust has been, there are still a few flaws I feel compelled to point out.

Borrow Checking is Sometimes Extremely Painful

While in most cases the borrow checker became something I dealt with subconsciously, in a few it was still an extraordinary pain in the ass. The first, and what seems to be the most common, is the case of having callbacks on structs. Some of this comes down to confusion over the correct syntax (e.g. Fn(...) -> X or fn(...) -> X), but I mentally recoil at the thought of trying to make a struct with callbacks on it. This was not a pleasant experience.

The second arose in writing the [constructor][ssabuilder] for an InfMax objective using the SSA sampling method. There are multiple different diffusion models for this problem, each represented as a struct implementing Iterator. I wanted my InfMax struct to own the Graph object it operated on, and pass a reference of this to the sampling iterator. The borrow checker refused.

To this day, I still don't know how to get that to work. I ultimately caved and had InfMax store a reference with lifetime 'a, giving the sampler a reference with a lifetime 'b such that 'a: 'b (that is, 'a is at least as long as 'b). While this workaround took only a moderate amount of time to find, I still lost quite a bit trying to get it to work as I wanted. Giving InfMax a reference never caused any issues, but I still find it annoying.

Trait Coherence

In order to use the bit-set library, I wanted to force the Element associated type of each Objective to satisfy Into<usize>. The NodeIndex type from petgraph does not, but has a method fn index(self) -> usize. Due to the rules about when you can implement traits (namely, that you can't implement traits when both the trait and the type are non-local), this was impossible without forking or getting a PR into the petgraph repository.

Forking was my ultimate solution, and hopefully a temporary one. There isn't a clear way to work around this Implementing a helper trait was frustrating because you can't implement both From<T: Into<usize>> for Helper and From<NodeIndex> for Helper, presumably because at a future point NodeIndex could implement From/Into<usize>?, and yet there also isn't a clear way to make this work without allowing crates to break each other.

Fortran FFI & Absent Documentation

I almost feel bad complaining about this, because it is so, so niche. But I'm going to. I had to interface with a bit (read: nearly 4000 lines) of Fortran code. In theory, since the Fortran ABI is sufficiently similar to C's, I ought to be able to just apply the same techniques as I use for C FFI. As it happens, this is mostly correct. Mostly.

Fortran FFI is poorly documented for C, so as one might imagine instructions for interfacing from Rust were nearly absent. It turns out that, in Fortran, everything is a pointer. Even elementary types like int. Of course, I also had the added complexity that matrices are stored in column-major order (as opposed to the row-major order used everywhere else). This led to a lengthy period of confusion as to whether I was simply encoding my matrices wrong, wasn't passing the pointers correctly, or was failing in some other way (like passing f32s instead of f64s).

It turns out that you simply need to flatten all the matrices, pass everything by pointer, and ensure that no call is made to functions with unspecified-dimensional matrices. Calling functions with matrices of the form REAL(N, M) works provided N and M are either variables in scope or parameters of the function.

Numeric-Cast-Hell

The inability to multiply numbers of different types together without as is immensely frustrating. I know why this is – I've been bitten by a / b doing integer division before – but formulae littered with x as f64 / y as f64 are simply difficult to read. Once you've figured out the types everything needs to be, this can be largely remedied by ensuring everything coming into the function has correct type and pre-casting everything else. This helps dramatically, though in the end I often found myself just making everything f32 to save myself the trouble (as that was what most values already were). The inverted notation for things like pow and log (written x.pow(y) and x.log(y) in Rust) further hinders readability.

Verifiability & The Big Wins for Academic Use

I began this post by referencing the verifiability of simulations (and other code). Through all this, I have focused more on the usability wins rather than the verifiability. Why is that? Fundamentally, it is because code is only as verifiable as it is written to be. The choice of language only impacts this in how well it constrains authors to “clearly correct” territory. In comparison to C/C++, Rust has clear advantages. There is no risk of accidental integer division, very low risk of parallelism bugs due to the excellent standard primitives, and nearly-free documentation. Some commenting required. Accidentally indexing outside of bounds (a problem I've run into before) is immediately detected, and error handling is rather sane.

Further, the structure of Rust encourages modularization and allows formerly-internal modules to be pulled out into their own crates, which can be independently verified for correctness and re-used with ease – even without uploading them to crates.io. A concrete example of this is my Reverse Influence Sampling Iterators, which were originally (and still are, unfortunately) a local module from my most recent paper and have subsequently found use in the (re-)implementation of several recent influence maximization algorithms.

This modularity is, I think, the biggest win. While reviewing and verifying a several-thousand-line codebase associated with a paper is unlikely to ever be practical, a research community building a set of common libraries could not only reduce the need to continually re-invent the wheel, but also limit the scope of any individual implementation to only the novel portion. This may reduce codebase sizes to the point that review could become practical.

Case in Point: the original implementation of the TipTop algorithm was nearly 1500 lines of C++. My more recent Rust implementation is 253. Admittedly, this is without the weighted sampling allowed in the C++ implementation. However, that isn't hard to implement.

(Please forgive my lack of comments, that was tight paper deadline.)
This gain is due to the fact that I didn't have to include a graph representation or parsing, sampling, logging (the logs are machine-readable!), or command-line parsing.

In Conclusion...

For a while, I was worried that my exploration of alternate languages would be fruitless, that I'd wasted my time and would be stuck using C++ for the remainder of grad school. Words cannot describe how happy I am that this is not the case. Despite its flaws, using Rust has been an immeasurable improvement over C++ both in terms of productivity and sanity. Further, while verifiability is very much secondary in my choice of language, I believe the tools and safety that Rust provides are clearly advantageous in this area.

Adding New Capabilities to Kiibohd

Written by J. David Smith
Published on 13 January 2016

One of the reasons I bought my ErgoDox was because I'd be able to hack on it. Initially, I stuck to changing the layout to Colemak and adding bindings for media keys. Doing this with the Kiibohd firmware is reasonably straightforward: clone the repository, change or add some .kll files KLL itself is pretty straightforward, although the remapping rules at times are cumbersome. They follow the semantics of vim remap rather than the more-sane-for-mapping-the-entire-keyboard noremap, and then recompile and reflash using the provided shell scripts.

This week, I decided to finally add a new capability to my board: LCD status control. One thing that has irked me about the Infinity ErgoDox is that the LCD backlights remain on even when the computer is off. As my computer is in my bedroom, this means that I have two bright nightlights unless I unplug the keyboard before going to bed.

Fortunately, the Kiibohd firmware and KLL language support adding capabilities, which are C functions conforming to a couple of simple rules, and exposing those capabilities for keybinding. This is how the stock Infinity ErgoDox LCD and LED control is implemented, and how I planned to implement my extension. However, the process is rather poorly documented and presented some unexpected hurdles. Ultimately, I got it working and wanted to document the process for posterity here. Once I get a better understanding of the process, I will contribute this information back to the Github wiki The rest of this post will cover in detail how to add a new capability LCDStatus(status) that controls the LCD status. LCDStatus(0/1/2) will turn off/turn on/toggle the LCD.

Background & Setting Up

Before attempting to add a capability, make sure you can compile the stock firmware and flash it to a keyboard successfully. The instructions on the Kiibohd repository are solid, so I won't reproduce them here.

An important note before beginning is that it is possible to connect to the keyboard via a serial port. On Linux, this is typically /dev/ttyACM0. The command screen /dev/ttyACM0 as root will allow one to connect, issue commands, and view debug messages during development.

This post is specifically concerned with implementing an LCDStatus capability. If you don't have an LCD to control the status of, then this obviously will be nonsense. However, much of the material (e.g. on states and state types) may still be of use.

The Skeleton of a Capability

Capabilities in Kiibohd are simply C functions that conform to an API: void functions with three parameters: state, stateType, and args. At the absolute minimum, a capability will look like this:

void my_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
}

The combinations of state and stateType describe the keyboard state: This information is buried in a comment in Macro/PartialMap/kll.h

`stateType` `state` meaning
0x00 (Normal)0x00key depressed
0x00 (Normal)0x01key pressed
0x00 (Normal)0x02key held
0x00 (Normal)0x03key released
0x01 (LED)0x00off
0x01 (LED)0x01on
0x02 (Analog)0x00key depressed
0x02 (Analog)0x01key released
0x02 (Analog)0x10 - 0xFFLight Press - Max Press
0x03-0xFEReserved
0xFF (Debug)0xFFPrint capability signature

Every capability should implement support for the debug state. Without this, the capability will not show up in the capList debug command.

void my_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
        return;
    }
}

Within this skeleton, you can do whatever you want! The full power of C is at your disposal, commander.

Turning Out the Lights

The LCD backlights have three channels corresponding to the usual red, green, and blue. These are unintuitively named FTM0_C0V, FTM0_C1V, and FTM0_C2V. These names refer to the documentation for the LCD itself, so in that context they make sense. To turn them off, we simply zero them out:

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
        return;
    }
    FTM0_C0V = 0;
    FTM0_C1V = 0;
    FTM0_C2V = 0;
}

With this addition, we have a capability that adds new functionality! I began by adding this function to Scan/STLcd/lcd_scan.c, because I didn't and still don't want to mess with adding new sources to CMake. Now we can expose this simple capability in KLL:

LCDStatus => LCD_status_capability();

It can be bound to a key just like the built-in capabilities:

U"Delete": LCDStatus();

If you were to compile and flash this firmware, then pressing Delete would now turn off the LCD instead of deleting. On the master half. We will get to communication later.

Adding Some Arguments

The next step in our quest is to add the status argument to the capability. This is pretty straightforward. First, we will update the KLL to reflect the argument we want:

LCDStatus => LCD_status_capability( status : 1 );

The status : 1 in the signature defines the name and size of the argument in bytes. The name isn't used for anything, but should be named something reasonable for all the usual reasons.

Then, our binding becomes:

U"Delete": LCDStatus( 0 );

Processing the arguments in C is, unfortunately, a bit annoying. The third parameter to our function (*args) is an array of uint8_t. Since we only have one argument, we can just dereference it to get the value. However, there are examples of more complicated arguments in lcd_scan.c illustrating how not-nice it can be.

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
    }
}

Figuring out how to restore the LCD to a reasonable state is less straightforward. What I chose to do for my implementation was to grab the last state stored by LCD_layerStackExact_capability and use that capability to restore it. In practice, it doesn't matter if you even can restore it: any key that changes the color of the backlight also changes its magnitude. The default ErgoDox setup has colors for each partial map, and I'd imagine most people would put a function like this off of the main typing map because of its infrequent utility. As a result, the mere act of pressing the modifier to activate this capability will turn the backlight back on. However, I implemented it anyway just in case. layerStackExact uses two variables to track its state:

uint16_t LCD_layerStackExact[4];
uint8_t LCD_layerStackExact_size = 0;

It also defines a struct which it uses to typecast the *args parameter.

typedef struct LCD_layerStackExact_args {
	uint8_t numArgs;
	uint16_t layers[4];
} LCD_layerStackExact_args;

We can turn the LCD back on by calling the capability with the stored state. Note that I copied the array, just to be safe. I'm not sure if it is necessary but I didn't want to have to try to debug corrupted memory.

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
    }
}

Now binding a key to LCDStatus(1) would turn on the LCDs.

Creating Some State

Like most mostly-functional programmers, I abhor state. Don't like it. Don't want it. Don't want to deal with it. However, if we want to implement a toggle that's exactly what we'll need. We simply create a global variable (ewww, I know! But we can deal) LCD_status and set it to the appropriate values. Then toggling is as simple as making a recursive call with !LCD_status.

uint8_t LCD_status = 1; // default on
void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
        LCD_status = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
        LCD_status = 1;
    } else if ( status == 2 ) {
        status = !LCD_status;
        LCD_status_capability( state, stateType, &status );
    }
}

Binding a key to LCDStatus(2) will now...do nothing (probably). Why? The problem is that the capability will continuously fire while the key is held down, and the microcontroller is plenty fast enough to fire arbitrarily many times during even a quick tap. So, we will guard the toggle The other two options move the keyboard to a fixed state and thus don't need to be protected. with an additional condition:

else if ( status == 2 && stateType == 0 && state == 0x03 ) {
    // ...
}

Release (0x03) is the only state that fires only once, so we check for that. Alas, even after fixing this, we still only have one LCD bent to our will! What about the other?

Inter-Keyboard Communication

The two halves of an Infinity ErgoDox are actually completely independent and may be used independently of one another. However, if the two halves are connected then they can communicate by sending messages back and forth. If both halves are separately plugged into the computer, then they can't communicate. I haven't delved into the network code, but I assume it is probably serial like the debug communication.

Very Important Note: You must flash both halves of the keyboard to have matching implementations of the capability when using communication.

The code to communicate changes relatively little from case to case but is rather long to reconstruct by hand. Therefore, I basically just copied it from LCD_layerStackExact_capability, changed the function it referred to, and called it a day. Wonderfully, that worked! Well, sort of. It turns out that not guarding against recursion caused weird issues where it would work with the right-hand being master, but not the left. It took a long time to debug because the error was unrelated to the fix (guarding against the recursive case).

#if defined(ConnectEnabled_define)
  // Only deal with the interconnect if it has been compiled in
  if ( status == 0 || status == 1 ) {
     // skip in the recursive case

    if ( Connect_master )
      {
        // generatedKeymap.h
        extern const Capability CapabilitiesList[];

        // Broadcast LCD_status remote capability (0xFF is the broadcast id)
        Connect_send_RemoteCapability(
            0xFF,
            LCD_status_capability_index,
            state,
            stateType,
            CapabilitiesList[ LCD_status_capability_index ].argCount,
            &status);
      }
  }
#endif

The magic constant LCD_status_capability_index ends up available through some build magic that I haven't delved into yet.

The Final Result

Putting all of that code together, we get:

uint8_t LCD_status = 1; // default on
void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
        LCD_status = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
        LCD_status = 1;
    } else if ( status == 2 && stateType == 0 && state == 0x03 ) {
        status = !LCD_status;
        LCD_status_capability( state, stateType, &status );
    }

#if defined(ConnectEnabled_define)
    // Only deal with the interconnect if it has been compiled in
    if ( status == 0 || status == 1 ) {
       // skip in the recursive case

      if ( Connect_master )
        {
          // generatedKeymap.h
          extern const Capability CapabilitiesList[];

          // Broadcast LCD_status remote capability (0xFF is the broadcast id)
          Connect_send_RemoteCapability(
              0xFF,
              LCD_status_capability_index,
              state,
              stateType,
              CapabilitiesList[ LCD_status_capability_index ].argCount,
              &status);
        }
    }
#endif
}

I have a working implementation of this on my fork of kiibohd. I'm looking forward to adding more capabilities to my keyboard now that I've gotten over the initial learning curve. I've already got a couple in mind. Generic Mod-key lock, anyone?

Thoughts on XCOM: Enemy Within and XCOM: Long War

Written by J. David Smith
Published on 05 January 2016

At this point, I consider XCOM: Enemy Unknown/Within to be one of my favorite games of the past few years, if not an all-time favorite. I'm not going to talk much about why XCOM is a good game; that has been covered more than adequately. I am rather disappointed that it took til 2015 for me to discover it, but I am immensely glad that I did. I have over 100 hours on Steam at this point, far surpassing any other recent single-player game. Prior to XCOM the most recent game I'd put this much time into was Dragon Age: Origins. I beat the game on Classic Ironman recently, and after a few several many attempts at Impossible Ironman, I was in the mood for something new. Still XCOM, mind you, but new. After looking a bit at the Second Wave options, XCOM has a variety of options that tweak the gameplay, such as reducing the accuracy of injured soldiers ("Red Fog") or gradually increasing accuracy as you approach a complete flanking ("Aiming Angles"). I am looking forward to unlocking item-loss on death ("Total Loss"; an omission that surprised me until I learned about the Second Wave options) when I get around to beating Impossible I decided to take a look at that mod I kept hearing about: Long War.

Ho. Lee. Shit.

More Like Long Changelog

To say that the number of changes made by Long War is many does a great disservice to the word. The changes in Long War are legion. They are multitudes. This isn't merely a set of tweaks and additions. Rather, it is very nearly a total conversion.

Once I recovered from my shock and the number of changes, my first fleeting thought was one of concern. Was this just a kitchen sink mod? A realization of some long-time fan's laundry-list of changes to make XCOM more like its predecessor? An in-my-opinion misguided attempt to make a game about saving Earth from space aliens more realistic? Further inspection only increased these concerns. Why have Assault Rifles and Battle Rifles and Carbines? What did adding 4 more classes bring to the table? Why require 10 corpses for an autopsy instead of 1? Not that it matters, I have so fucking many corpses and wrecks that they could ask for 50 and I could still do most of the autopsies. These thoughts made me hold off on diving into it. I started (and failed) another Impossible Ironman campaign first, then I downloaded and installed Long War.

They strongly recommend that you start back on Normal, because as I mentioned above: the changes are significant. So I did. I learned about the new systems, and steamrolled mission after mission with the new classes. That isn't to say they're imbalanced, just that if you play enough Impossible then Normal becomes pretty straightforward, even if it is a bit harder than Enemy Within Normal. One thing that struck me very early on is the quality of several of the UI/UX changes made by the Long War team. Scanning for contacts now stops right before a mission expires, giving you the chance to wait for a new tech to finish or a soldier to get their ass out of bed. When a unit (alien or human) enters Overwatch, it is now shown next to their healthbar, which means that the player is no longer out of luck if they happen to look away during the alien turn. They also added a "Bronzeman" mode that strikes a good medium between savescum and Ironman modes. It that behaves very much like the default in Fire Emblem: you are able to restart the mission, but not re-load in-mission saves. I would love to have these changes by themselves in the base game, and I do believe that some (like the Overwatch change) made it into XCOM 2. The changes made to the actual gameplay don't fundamentally alter the way it plays at a tactical level, although they do change the squad compositions that are effective. However, there are long-term ramifications to some of the changes that notably impacted my enjoyment of the game.

To Know Your Face

Far and away the most damaging change to the game is Fatigue. Not the only bad change, though. I could probably write an entire post on why Steady Weapon is awful design. In XCOM, when a soldier is injured in a mission they must take some time off after to heal up. As a result, it is prudent to have at least a B-list of soldiers that you can sub in for that Major Sniper that you barely saved from bleeding out. In Long War, when a soldier is not injured they still must take 3-5 days off (more for psionic soldiers actually wanting to use their psychic powers). This means that not only do you need a well-prepared B-list, but also a C-list. On the surface, this seems kind of cool. I would phrase that more subtly, but I already gave my thesis away. It means that you have to try more strategies, with more combinations of units as they rotate through various states of wounded, gravely wounded, and fatigued. However, it also means that you need a lot more soldiers. I never had more than 20 at a time in Enemy Unknown or Enemy Within. In Long War, you start with something like 40.

This Other changes, like increased squad size and muddy class identities, also play into this. However, fatigue is far and away the biggest cause, so I'm focusing on that. unintentionally changes something that I very much liked about XCOM: the impractical, unsustainable, and outright damaging attachment I had to my soldiers. I don't always know their names (I'm really very bad with names), but I know their faces. I remember The Volunteer from my first successful (Ironman) campaign. I remember the struggles, the near misses. She was the sole survivor of the tutorial mission, which I'd forgotten to disable and one of the only women I recruited in the entire campaign. Yet she turned out be psychic and despite numerous barely-stopped bleed-out timers When an XCOM solder is reduced to 0 HP, they have a chance to instead bleed out. They become effectively dead (as far as your tactics are concerned), but unless you get to them with a Medkit and stabilize them within 3 turns, they become actually dead. she managed to survive the entire campaign.

After ten hours with Long War, I didn't know any of my soldiers. I lost a Corporal (rank 3 in LW) and two Lance Corporals (rank 2) in a single mission (two in a single turn) due to lucky shots from Thin Men, and didn't feel the urge to restart it. It wasn't even that I had replacements for them all, as I didn't have any medics at all once my Medic Corporal died. I was utterly detached from them. I didn't know any of them, not like before. This ultimately seemed to neuter the tension of each and every mission, turning a game whose bog-standard abduction missions I could play for hours into one where ten hours over two sessions felt like a slog. The moment-to-moment tension of one missed shot dooming a soldier is lost when you no longer care particularly about any of your soldiers. I uninstalled the mod after the second session – and then immediately spent several hours longer than I'd intended playing a new Classic Second Wave Ironman campaign.

This, of course, does not make Long War bad. As I mentioned above, it nears the level of being a total conversion. In fact, I think it is most aptly called exactly that. The Polygon quote on the project page is quite telling:

"Turns XCOM: Enemy Within into nothing short of a serviceable turn-based military alien invasion strategy wargaming simulator." - Polygon

I did not enjoy Long War because I was looking for more of what I liked about XCOM: Enemy Within. I wanted more of the XCOM that was almost Fire Emblem with guns and aliens, not a "military alien invasion strategy wargaming simulator". All told, I think Long War is one of the best mods I've ever seen. However, this is not the mod I was looking for.

The Actual Point

But that's not what I wanted to write about. All of this is just the backdrop. You see, XCOM 2 is coming out soon, and internet comment sections – being the cesspits that they are – are full of a specific breed of comment that gets under my skin. Not the comments recommending that fans try Long War. No, those are fine, at least on principal if not in practice. Good, even, as the mod they are pushing is in fact rather good.

My problem is the legion of comments that follow one of a few varieties: Paraphrased because this scrublord didn't bother screenshotting or bookmarking the comments when they were first seen, and digging through internet comments for another couple of hours doesn't seem particularly appetizing. Nobody needs screenshots of commenters being shitlords anyway.

All have the same underlying assumption: that people have the same tastes as the commenter. This self-projection is unfortunately endemic online, especially within the gaming community (where I've seen more "you're wrong because you like a thing that I don't" fights than anywhere else by a large margin).

Empathize, for a moment, with a mythical person that is considering picking up XCOM. Saving the Earth from aliens sounds cool, and they like strategy and tactical games, so it seems like a natural fit. Maybe this person that would agree with these comments; they would find Long War to be generally superior to the base and might skip the sequel in favor of the Long War team's game. Although from the sounds of it, the new kid on the block is just getting started. I would honestly be surprised if most Long War fans didn't pick up both XCOM 2 and the Long War team's product. But maybe – maybe – they wouldn't. This isn't merely hypothetical: if I had installed Long War immediately, I never would have gotten the hundred-plus hours of gameplay out of XCOM that I did. I barely lasted ten hours in Long War. The moment I knew it was over for me is when I began spending more time wondering when LW was going to get interesting than thinking about optimal strategies for filling aliens with holes. Again, I'm not saying that Long War is bad, merely that it isn't what I'm looking for.

The moral here is that internet commenters need to stop being shitlords and consider the fact that not all players – even only considering those that like a particular game – like games for the same reasons. So don't say "Just install Long War. You'll thank me later." Instead, consider "If you like XCOM, try the Long War mod. It's bloody fantastic." Don't imply that the intersection of people who like XCOM and people who like Long War is total. It isn't – I am proof of that – and it could drive people away from experiencing a pretty fantastic game.

Looking Back on 2015

Written by J. David Smith
Published on 01 January 2016

It seems that every year of my life is more eventful than the last. 2015 was a big year in a lot of ways. I graduated from the University of Kentucky, moved away from home for good, and started grad school at the University of Florida. I worked 3 different jobs in 3 different states. I finally got my drivers license, and over the course of the year bought 2 cars and wrecked 1.

My Last Semester in Kentucky

I could've had an easy last semester at UK, but of course I opted not to. I continued learning German (something that I have let lapse, unfortunately), working in Dr. Jacobs lab, and did multiple projects for classes. The project of note was learn2play, which was an attempt at learning to play Hearthstone by watching the screen. Despite all of the changes that have happened to that game this year, my project actually should still work. This pleases me. The skills I gained in this semester have been invaluable. I learned how to use scikit-learn, which has paid more dividends than any other library I've ever used save numpy.

I also made a point of overcoming my aversion to lists during this semester. I was given a notebook for my birthday, and began using it for, well, everything. In particular, whenever I had a set of tasks to do, I'd write down the list, with blank checkboxes next to each item. This immediately helped me stay on top of the many non-school tasks I had, and is probably the reason that I managed to make it to Boston without forgetting anything.

I've Never Been to Boston in the Fall

After the semester ended, I started a second but very different internship at IBM. Instead of being in the ExtremeBlue program, I was working with the AppScan Source team in Littleton, MA near Boston. AppScan Source is a security-oriented static analysis tool. Going in, I had some idea of what I'd be doing, Machine learning to reduce the false positive alerts given to users by the tool which turned out to be entirely wrong. I actually ended up working on a somewhat blue-skies In that the form of the resulting visualization was unknown, as was the set of inputs to build it. We knew that it would be a visualization, and hoped that it'd be helpful for understanding the product's reports project.

This project is probably my all-time favorite. Although I had no idea going in, it turns out that I really like working on data visualization. I got a couple of Edward Tufte's data visualization books for Christmas and have already gotten most of the way through one of them (The Visualization of Quantitative Information). I got to do a ton of experimentation on not only different ways of viewing the data, but also different ways of constructing it. I spent 4 days one week writing finite-domain Prolog code. It was glorious, although the result was impractical. The final version of the visualization ended up being beautiful, and I wish I could put an image here. Once it gets deployed in the product or I get some other indication that I'm legally allowed to, I'm going to see about printing a poster-sized visualization of something.

This internship was great not only because of the project, but also because of the team. Kris Duer was my mentor for the project, and was great to work with. I wasn't his only intern over the summer, and people constantly joked about him building an army. The team as a whole was great to work with, and when pass through the Boston area again, I'm definitely going to try and see them.

The summer wasn't all work, though. I got to hang out with my good friend John Bellessa, who was one of my teammates from the previous summer. Doing the Freedom Trail with him and his fiancé Lorraine was one of the highlights of my summer.

During this time I also began learning Brazilian Jiu-Jitsu at Fenix in Lowell. I had no idea what I was getting into, but I'm glad that I did. BJJ is much different than the martial arts I'd done in the past because it focuses almost exclusively on ground work. When rolling (BJJ-ese for sparring), we'd start on the ground and stay there 99% of the time. The instructor at Fenix is great, and if you're in the area I'd highly recommend checking out his gym. I've continued practicing BJJ now that I'm in Gainesville, and plan to do so in the future.

Of course, I have to mention that I totaled my first car while I was in the Boston area. By rear-ending someone at a red light. Oops. I really liked that car, A 2003 Honda Civic Hybrid. I drove from Boston to just south of DC (~500mi) without stopping for gas. On the trip back I took the scenic route. I got about 10mpg better on the way down. and am really disappointed that I only got to use it for about two months. Having to buy another in short order was a stressful experience that I hope to never have to repeat. I have had a lot more trouble with the car I bought to replace it, which overheated multiple times on the trip from Boston to Florida. Being stuck on the side of the road many hours drive from anyone you know is by far my least favorite part of the year.

Florida

Once I actually got to Florida, things looked up. I didn't have any more car trouble. I ended up renting a room in a house from a fellow graduate student, Elaine. She's an older student who is finishing up her PhD in Journalism/Anthropology this semester. Having someone to talk to that was familiar with the area was invaluable. It was also nice to get to talk to someone whose research area is so far removed from my own.

Classes at UF haven't been particularly remarkable. I took grad-level classes at UK, so I wasn't surprised at all by the level of difficulty. I did have to do two projects this semester, both of which turned out reasonably well. For one, I used Markov Chains to show that the performance of attacks on Mix networks changes when multiple adversaries act independently. For the other, I showed that by replacing words with synonyms authorship attribution on Twitter can be defeated. In the latter project, I constructed a visualization that I quite liked.

I found an advisor, Dr. My Thai, relatively quickly. She works on social network-related projects, which is what I'm really interested in. I started working in her lab in late October. I'm pretty happy with my decision thus far, but it is very early in my career. I like my lab-mates and Dr. Thai seems to be understanding and in general nice to work under.

Applying for the NSF Fellowship

One thing that UF has that UK didn't is a course dedicated to helping students write the NSF Fellowship applications. Run by Dr. Mazyck (who definitely has a strong personality), the course focused on helping us avoid common and not-so-common mistakes in the application process by starting early and constantly revising our essays. I spent so. much. time. on those essays over the first ¾ semester, it isn't even funny. I'm hopeful that it will pay off, but I won't find out for another few months. I based my application on detecting throwaway harassment accounts without compromising user privacy, which made it easy to cover broader impacts but more difficult to detail the intellectual merits of my proposed work. The biggest thing I got from this was from the personal essay. In writing it, I discovered that much of the work that I've done is actually relatively well tied together – and that it rather clearly points in the direction I'm heading now.

Progress on Goals for 2015

The goals I set for 2015 were simple: get over my list-phobia and become more consistent. I can't say that I really succeeded at the second, but I definitely succeeded at the first. I pretty routinely write out lists in my notebook to organize my thoughts. As for consistency and self-discipline? I still don't have a decent daily routine, so that one goes down as partial failure. However, I do at least have a pretty reliable weekly routine which will be getting upset next week by my class schedule change.

My overarching 'Otherness' goal is still just that: a goal. Thinking back on the year, I feel that there has been very little interpersonal conflict. The majority of difficulties I faced this year were from events such as wrecking my car, not people. However, I do still catch myself snapping at people. It is rare, but it happens. Mostly to my baby brother when he gets really talkative in the middle of me trying not to die to Zed mid-lane. Not an excuse, but context is important. I am making an effort to better hold my tongue.

Looking Forward to 2016

Somehow, I doubt this year will be crazier than last year. Only time will tell, but I wouldn't really mind either way. I do have a few goals for 2016.

My 'career' goals are to publish at least one paper first author, and to finish my quals (which are now an extensive lit review for your proposal at UF). Pretty self-explanatory. I simply want to make progress on my PhD. I would also like to look into teaching a class myself. I TA'd this past semester, which was an experience I enjoyed, and will be TAing this coming semester. I'd like to take the logical next step and teach a class myself. It is unlikely that I will get to do so this year, but I'd like to make progress on getting to do so.

My only other goal this year will seem a bit odd to people that know me, as I'm really a rather indoors-y person: I'd like to go backpacking. Not like backpacking on a mountainside for a day, but more like backpacking from one state to another. I've not done much looking into this yet, but I'm hoping to take a couple of weeks this summer and go somewhere (the Pacific Northwest? Europe?) to do this.

The past year has left me very hopeful for the future. I managed to survive the general insanity of moving cross-country twice in a year, and have acclimated reasonably well to first-year-grad-student life. The future is bright, and full of potential. Here's hoping that I won't play too much XCOM to take advantage of that.

Representative Computer Science Courses

Written by J. David Smith
Published on 12 November 2015

I spent about 6 hours TAing today. It's kind of exhausting, but I enjoy it. I believe that over the course of this semester, I've developed a decent rapport with my students because I frequently I get asked for advice on things that aren't strictly class-related. Today, one student mentioned to me that they were switching majors away from computer science. Of course I asked why, and they explained it to me. I was slightly surprised, but understood completely. Their reason was they didn't enjoy the CS coursework, and didn't want their career to be in something they disliked.

One problem: the coursework is not at all representative of what one actually encounters in a job in the technology industry.

Representative?

The course I am TAing is the second programming course at UF. It's a weird class, and probably best described as a C++ course with a very wide variety of projects. For reference: the first course is in Java. Students are expected to know programming fundamentals a priori, and expect to learn how to program in C++ and some more advanced programming techniques. The projects they've done in this course so far have been computing magic squares, mimicking memory allocation using custom-built linked lists, and a lexer. Students are also working on a group project concurrently with the third (and upcoming fourth) project.

My first thought on seeing the project list was of the vi learning curve. The first and second projects are appropriately difficult, but the lexer is on another level entirely. They "just" have to implement one without any formal language definition or any knowledge of the theory behind how one would typically construct a lexer. My knowledge on this is sketchy at best, given that I haven't actually taken a compilers course. I believe the typical tool is a pushdown automata, which are generally automatically generated from an EBNF. I would consider this project if not difficult then overly time-consuming for me, which indicates that it is almost certainly not appropriately scaled for the second programming course.

That isn't what I came here to talk about, though. My problem with the course setup is that students are on their second year and have absolutely no idea how to do anything but read from and write to a terminal. And to top that off: they don't know anything but Java and C++, which are two of the worst languages in existence. Excuse my hyperbole, but I would seriously turn down any job offer if they said I'd be stuck writing Java. C++ would be negotiable if I got to do cool HPC stuff with it, but I have a great dislike for the language. The experiences they have in these classes are not only unrepresentative of work in industry, but are so far removed as to be mis-representing it!

Nobody, outside of very specialized areas, implements their own data structures. These are important, but we have a data structures class. There is no need for entry-level students to rebuild the wheel from messy blueprints when they could be doing something interesting. Very few people write command line programs for a living. They are common in OSS because libraries make them simple to write. But students don't get to use libraries outside of <iostream>, so they don't see the benefits. Not only that, but working in the terminal is easily the least intuitive part of our field. It is for power users, but command-line programs all but necessitate that style of work. You could use an IDE, but consider the bugs these can introduce. Literally today, one student's program was hanging. We tried it from cmd.exe and on the lab machines, and it worked perfectly! The IDE simply didn't handle the program's exit properly.

I had an epiphany today when the student said that. I have wanted to understand why students leave computer science, and here I found a reason I hadn't considered. It wasn't difficulty (the student is doing well in the course and is switching to a major that will probably be harder), it was dislike. They didn't like what they were shown! And I am left to wonder? Would they like what programming is actually like?

We Can Do Better

This changes my perspective. There are a lot of topics that are front-loaded but don't need to be. There is something to be said for having weed out classes, but oughtn't those be based on difficulty, not liking something? The command line, Java, C++, esoteric projects, library restrictions: all of these get in the way. What do we gain from them? The first three give marketable skills, which are important, and may warrant some early coverage for those students wishing to have an internship between their sophomore and junior years. However, are these the best skills for the jobs students want? And the latter two: are they even contributing?

My alma mater's sequence was 1 semester of Python, followed by 2 of C++. At least that covers a not-shit language, and if memory serves the projects were actually interesting. There was one lecturer in particular that stands out, although I did not get to take their course. Dave Brown taught the second C++ course, and the students worked individually on one project over the course of the semester. They implemented a roguelike game in C++, complete with a test suite. There were milestones for each stage of the project, and every student I spoke with I was a tutor at UK and so got to talk to a lot of fellow students about what classes they liked/didn't like actually enjoyed it.