Aaron Schiff’s blog

Some New Zealand migration data

International migration to and from New Zealand is a popular topic right now. Statistics New Zealand publishes loads of data on migration in its Infoshare system but the larger data tables are not particularly easy to use out of the box.

I’ve downloaded all the data in the “Permanent & long-term migration by EVERY country of residence and citizenship (Monthly)” table and processed it in R into a clean and tidy CSV file. The code, source data, and cleaned data are on GitHub.

This data breaks down monthly permanent and long term arrivals and departures by citizenship and country of residence. Citizenship distinguishes New Zealand and Australian citizens and has a total for all citizenships, from which you can calculate non-New Zealand and/or Australian citizens. Country of residence has 253 different categories, including some regional totals so be careful if you are summing across countries. The data runs from April 1978 to June 2016.

Previously, Harkanwal Singh made some nice visualisations of this and other NZ migration data showing the trends for some types of migrants.

A similar annual migration data file is available from Figure.NZ, covering the period from 1979 to 2015, if you just want the annual totals.

Open data leads to useful stuff

Last Friday, Auckland Council published the recommended Unitary Plan geospatial data files.

Before this data came out, the Council provided an interactive map tool to view the data, which looked like this:


This is obviously comprehensive but I certainly found it quite difficult to make sense of. There are so many colours, many of them are hard to distinguish, and there’s no information about what is allowed in each zone (though you can look that up in separate documents).

After the geographic data was published, there was a veritable explosion of interesting maps and analysis. For The Spinoff, Chris McDowall made some beautiful maps:


For many purposes, Chris’s maps are much more useful and usable than the original Council map viewer. There are fewer map colours and they are legible. There’s an index of suburbs, allowing you to find a place easily even if you don’t know exactly where it is. Each suburb has a percentage breakdown of the main zone types, and there’s brief descriptions of the zones. And you can find suburbs that are “similar” to each other, which is a feature that I love.

Harkanwal Singh of the NZ Herald also made some simplified interactive maps that focus on residential and town centre zoning. What I really like about these maps is the ability to select one zone type at a time and see the map change. For example, here’s a map of the “terraced housing and apartments” zone that allows low-rise apartment buildings of between five and seven stories high:


Harkanwal’s map also has a helpful one-line description of what each zone allows in terms of residential buildings. This enables quick and easy understanding of the residential zoning in the recommended plan.

Will Taylor made several interactive maps, including one showing all residential zones, one showing where the residential height limit is two storeys, and one showing where it will be permitted to build higher than three storeys. That last map shows that, in spite of the hype, the recommended UP is quite limited in the areas where buildings above three storeys will be allowed:


Will also made an animated GIF of how the “single house” zone changed between the Council’s proposed plan and the ultimate recommended plan:

Will has summarised his analysis in a blog post on TVHE.

Finally, I made a few quick maps of the “overlays” in the recommended plan. Rather than defining zoning, these overlays impose additional restrictions on activities, to protect things like special character and heritage. Here’s a map showing most of the overlays combined:

And just for fun, I made some simple line maps showing all of the recommended unitary plan geospatial data at one, to illustrate how complex it is.

The above are just the things that I saw and there might be others that I missed. So, in just a week since Auckland Council published the data, people have created a bunch of useful, interesting maps and analysis, at no cost to the Council. This clearly demonstrates the value of open data and I’d encourage the Council to publish more of it (e.g. rating valuations … nudge nudge, wink wink).

Unitary spiderweb

There’s a fun article in The Spinoff today by Niko Elsen that explains some of the important bits of Auckland’s Unitary Plan with animated GIFs. Niko mentioned a tweet of a map I made of all the Unitary Plan geodata. In case anyone wants to print a tea towel or whatever, here’s a high resolution version of the whole region (click for bigger):


And a close-up of the central area:


These maps show all the shapes, points, and lines associated with the zoning boundaries and overlays in the Unitary Plan recommended by the Independent Hearings Panel. The source data is available here and is provided by Auckland Council under a Creative Commons Attribution 3.0 NZ licence.

If you have the data files, you can use this R code to create the maps yourself.

Unitary Plan overlays

In the Herald on Friday, Don Stock wrote about the Auckland Unitary Plan as recommended by the Independent Hearings Panel:

It has ignored the principles of democracy and natural justice by denying communities any say in the radical upzoning introduced outside the public consultation process. It has removed any requirement for good design in developments. It has removed any protection for old buildings. It has removed minimum sizes for apartments. It has removed requirements for off-street parking.

Shamubeel Eaqub has already called bullshit on the “undemocratic” claim. The other claims are mostly not true as well.

Auckland Council has published the geospatial data for the version of the Unitary Plan recommended by the Independent Hearings Panel. I thought I’d use this to take a quick look at the recommended plan’s provisions for protection of things like character and heritage.

As well as defining zones where various types of residential and commercial development are allowed, the recommended plan defines a number of “overlays” that restrict building and other activities in certain areas. Each of these overlays comes with its own set of restrictions. In many areas there are multiple overlapping overlays, creating multiple restrictions.

The red areas in the following map show many of the overlays defined in the recommended UP for the whole Auckland region:


Here’s a close-up of the central area:


For example, the UP defines “viewshafts” protecting views of volcanic cones and the Auckland museum. Within these (mostly central) areas, there are height restrictions on building:


Another set of overlays protects “special character” and “historic heritage” in certain (again, mostly central) areas by restricting demolition and development:


There are also a significant number of overlays protecting the natural environment, including “significant ecological areas”, “outstanding natural character”, “outstanding natural features”, “outstanding natural landscapes”, and the like, throughout the region:


And there are overlays covering airport approaches, national grid corridors, buffers around quarries, and places of significance to Mana Whenua:


There are some other overlays but I think the above are the most important. While the protections defined by each overlay are different and some are more restrictive than others, it’s simply not true that the recommended UP allows development to occur unchecked. There are restrictions that will affect both intensification and expansion of the city in many areas.

If you’re interested, the R code I used to make these maps is here.

Exciting new stuff on Figure.NZ

Over the past few months I’ve had a lot of fun working with Figure.NZ on some really exciting projects.

There’s a brilliant new feature called Business Figures. By answering a couple of questions, you can instantly see relevant data for all kinds of businesses in New Zealand. The team put a lot of hard work into this and I think it’s amazing.

Until now, business data was not very easy to get — it’s spread across a bunch of sources, and a lot of it is wrapped up in the somewhat cryptic ANZSIC system for classifying businesses. One of the really clever things that Business Figures does is make it easy to find data in ANZSIC categories, even if you have no idea what ANZSIC means.

I helped to make some interactive data maps that are now available on Figure.NZ and in Business Figures. So far we have data from the Census and business demographics datasets. For example, here is business growth in Auckland:

AKL business growth

(click the link above to explore this as an interactive map on the Figure.NZ site)

We worked hard on the design of the maps to present the data together with local context like roads and rivers, but not so much context that the data gets overwhelmed. I’m proud of how the maps turned out, thanks especially to the deft touches of Nat Dudley and Chris McDowall. Nat spent a lot of time to choose colours for the maps that are legible by anyone with any of eight different types of colour-blindness.

Another new feature is that you can now easily save a collection of your favourite data for your reference or to share with others. For example: my favourites.

All these things are in preview mode now until they are officially launched, so go check them out, and please give feedback so we can make them better.

ASB and Statistics New Zealand supported the development of these features — thank you both so much! You’ve made great things happen!

Make: It’s not that bad

Make is an ancient command-line utility often used to automate compiling software. Essentially you specify the commands to run that get the job done, but it’s a bit more fancy than that because you actually design a dependency graph of files that need to be created (more on this later).

It’s also a useful tool for data analysis because it allows you to document the steps used, all the way from downloading the data files right through to analysis and producing output. This makes it easier if you want to repeat the process later with updated data, and just simply to remember what you’ve done.

Yesterday I needed to transform a number of shapefiles from the LINZ Data Service to use in a map I was making in Mapbox. The transformations were pretty straightforward — just changing the coordinate reference system to WGS84 because I can’t seem to get Mapbox to work with the New Zealand Transverse Mercator 2000 projection, dropping some unneeded fields to make the files smaller, and adding a calculated field of geographic area for some features.

I could have done all by hand this in a GUI app like QGIS, but I had quite a few files to process so it would’ve been tedious, and then there would be no record of exactly what I had done. So, it was time to learn make.

Mike Bostock has a really good introduction to make and I also found the Minimal Make tutorial by Karl Broman to be very good.

In the end it was pretty straightforward, but I got tripped up by a couple of things:

So, make looks a bit daunting at first encounter, but give it a go, it’s not that bad!

Update: Here’s my makefile in case it’s of any use (it was quite specific to what I was doing though).

Untangling CSV files

Statistics New Zealand has started publishing some raw CSV data files. Hooray! However, these files are quite large, and I’ve found it challenging to make sense of them. For example, the GDP file includes basically every GDP data series that Stats NZ publishes, and weights in at over 68,000 rows.

The different data series are distinguished by categorical (non-numeric) fields in the file. To make sense of these it’s helpful to understand what the categorical variables are and what values these variables take. For example in the GDP file there are categorical variables called “Series_reference”, “STATUS”, “UNITS”, “Subject”, “Group”, and the more cryptic “Series_title_1”, “Series_title_2”, and “Series_title_3”. If you know the values that these variables can take, you can get a sense of the coverage of data in the file.

With this in mind, I wrote a bit of R code to parse a CSV file, identify the categorical variables, and report the unique values of each variable, as well as the number of times that each value appears in the file. It will also optionally plot bar charts of the most common values for each categorical variable.

So I think this is useful for getting a quick overview of the types of data that may be in a CSV file. Something I hope to add in future is an overview of the hierarchy of values of the categorical variables, if they are part of a nested classification like ANZSIC. The end result could be something like this.

Talking about data

Herald Insights today has a nice interactive story about the “pay gap” in New Zealand, i.e. differences in rates of pay for men and women.

The pay gap is a bit complicated — there are lots of possible reasons why people in the same occupation category might get paid differently, regardless of their gender. The Insights piece doesn’t attempt to do a very detailed analysis of the causes of pay differences in occupations or trends in these differences over time.

So it’s not a perfect analysis but arguably that’s asking too much — good data journalism like this encourages us to talk about an issue and explore further. It gives us more information than anecdotes and gut feelings but it is just a starting point. It doesn’t have to give us the complete answer; we’ll need to look in academic journals to find that.

So in that sense, the pay gap piece seems to have been quite successful at stimulating discussion. Here’s just a few tweets that I personally saw, and I’m sure there were lots of others:

Also, it would be great if the headline writers didn’t feel they have to hook people in to interactives like this with offering a concrete answer. The headline on the main Herald site was “Do you get paid fairly? This will tell you”. The interactive sort of answers that question (based on your occupation and gender only), but the headline puts people in the wrong frame of mind for thinking more deeply about the issue.

Visualising metadata


Data visualisation typically focuses on the data itself, but in some cases the metadata is interesting in its own right. Lately I’ve been thinking a bit about how to visualise the structure of metadata. Ultimately this could be combined with a data visualisation to make something that coherently shows both the data and its structure.

A couple of interesting examples are the harmonised trade system, and ANZSIC. Both of these are hierarchical classification systems, for international trade and industries respectively.

Harmonised trade is particularly fascinating. It classifies products into a bewildering array of categories. I can’t imagine the effort required to create and maintain this system. As an experiment I made a visualisation of the New Zealand flavour of harmonised trade. Above is a static picture — each circle represents a category and shows its sub-categories. An interactive version is here, where you can click a category and see what it is.

The code is here. I also experimented with an HCL colour space interpolation in D3, for the outer circles.

Loading NZ harmonised trade data

I’ve been working with “harmonised trade” exports & imports data for a project recently, making use of the huge collection of CSV files that Statistics New Zealand has published. All up there’s about 2GB of data in those files.

I’ve written some code to load selected fields from this data into R for analysis. I’ve only loaded up dates, commodities, countries, and dollar values, but the code can be easily modified to load other fields (eg volumes) if needed.

Also included is a quick demonstration of using the dataset, by creating these simple small-multiples charts of imports and export value by country over time, for the top 20 countries based on 2014 trade values (click for bigger):