Aaron’s blog

Make: It’s not that bad

Make is an ancient command-line utility often used to automate compiling software. Essentially you specify the commands to run that get the job done, but it’s a bit more fancy than that because you actually design a dependency graph of files that need to be created (more on this later).

It’s also a useful tool for data analysis because it allows you to document the steps used, all the way from downloading the data files right through to analysis and producing output. This makes it easier if you want to repeat the process later with updated data, and just simply to remember what you’ve done.

Yesterday I needed to transform a number of shapefiles from the LINZ Data Service to use in a map I was making in Mapbox. The transformations were pretty straightforward — just changing the coordinate reference system to WGS84 because I can’t seem to get Mapbox to work with the New Zealand Transverse Mercator 2000 projection, dropping some unneeded fields to make the files smaller, and adding a calculated field of geographic area for some features.

I could have done all by hand this in a GUI app like QGIS, but I had quite a few files to process so it would’ve been tedious, and then there would be no record of exactly what I had done. So, it was time to learn make.

Mike Bostock has a really good introduction to make and I also found the Minimal Make tutorial by Karl Broman to be very good.

In the end it was pretty straightforward, but I got tripped up by a couple of things:

  • When you run make it pays close attention to the timestamps on files to see if it needs to process that file. You can use the touch command to update the timestamp without modifying a file. Mike Bostock explains this in a bit more detail in his tutorial.
  • Although it looks a bit like one, the makefile that you create is not a script that runs from top to bottom. As I mentioned before, you are actually specifying a dependency graph of files, with a root (or roots) and branches. If you have multiple possible roots, if you don’t tell make which point(s) to start from, it will start from the first possible root in your makefile and only work through that tree of files. All other commands will get ignored and this stumped me for about an hour when I couldn’t figure out why make wasn’t processing all the commands in my makefile. This question on Stack Overflow finally got me out of that quagmire.

So, make looks a bit daunting at first encounter, but give it a go, it’s not that bad!

Update: Here’s my makefile in case it’s of any use (it was quite specific to what I was doing though).