Data Explorers newsletter no. 10

Kia ora,

I’m Aaron Schiff and you’ve received this email because you subscribed to the Data Explorers Club newsletter on my website. If you’re no longer interested, there’s a one-click unsubscribe link at the bottom of this email.

The New Zealand Census 2018

You’ve probably seen the news. The independent review of the 2018 Census found critical problems with Stats NZ’s management of last year’s Census that had a significant impact on the response rate. Of the estimated 4,760,000 people who should have been counted on Census Night, 3,972,000 actually sent in a response (online or paper), for an overall response rate of 83.3%. Around 788,000 people didn’t get directly counted. In comparison, in 2013 the overall response rate was 92.2%, so while there is always some non-response, it was a lot worse in 2018 than 2013.

The non-respondents to the 2018 Census are not randomly chosen – they are more likely to be from minority groups, and are more likely to be younger people. A key table from the review report, reproduced below with my highlights, shows the response rates for various population groups and for small geographic areas (the higher response rates in brackets are if you count people who provided partial responses).

For the missing 788,000 people, Stats NZ is using other methods to try to fill the data gaps. Around 203,000 people didn’t return an individual census form but some information is available about them on household forms. But the household form doesn’t give much information about people, so more detailed information about these 203,000 people still has to be found elsewhere. A further 526,000 people have been counted entirely from other sources of government administrative data that Stats NZ has access to, such as birth records and migration data. An estimated 58,000 people won’t be counted at all.

(The figures in the previous three paragraphs are from this workshop given by Stats NZ in May 2019. The 2018 Census numbers may have changed a little since then.)

This is not all bad news. In 2013, most of the people who didn’t respond to the census were counted via “imputation”, i.e. by using statistical methods to fill the gaps. The 2018 Census will use other reliable sources of actual data to fill some of the gaps, which should be more accurate than statistical imputation. The current estimated coverage rate of the 2018 Census (i.e. the proportion of people who should have been counted who were actually counted) is 98.6%, versus 97.6% for 2013.

The main problem for those of us who work with census data is that the administrative data used to fill the gaps is of varying quality, and some gaps can’t be filled at all. In the workshop I linked above, the Stats NZ people say that the geographic accuracy of some administrative data sources is limited, which may make it difficult to produce accurate statistics for geographic areas smaller than Territorial Authorities. In addition, around 357,000 people who were added to the 2018 Census data from administrative data could not be linked to a household or a family. Combined, these two issues suggest there may be serious problems in doing analysis at the household level in small geographic areas, which is very important for lots of things like urban planning and choosing locations for new businesses.

Another issue is that available administrative data simply doesn’t contain information about some things that the 2018 Census aimed to shed light on. These include questions around people living with disabilities, housing quality, people’s place of usual residence in the past (important for studying migration within New Zealand), and iwi affiliation. Stats NZ has already decided that the iwi affiliation data is so poor that it will not be released as official statistics. For other variables we’ll have to wait until the more technical 2018 Census data quality review panel reports back in September.

In summary, 2018 Census data is going to be a complicated beast to work with. Below is a slide I grabbed from the Stats NZ workshop video linked above that summarises where the 2018 Census dataset for people has come from. Various characteristics of people are represented horizontally. The green areas show where information came from census forms. The blue bits show where administrative data was used to fill gaps. The white bits are missing data that couldn’t be filled. The true picture is more complicated as the bits that came from administrative data have come from various sources using various methods and with varying quality. Stats NZ publish data quality metrics for census data, and anyone working with 2018 Census data will need to pay careful attention to these.

We Are Here: An atlas of Aotearoa

ICYMI, data sorcerer Chris McDowall and designer extraordinaire Tim Denee have produced a truly wonderful book of maps, data graphics, and essays about New Zealand. Chris was kind enough to give me a preview of the book recently and it is exceptional. It is so much more than a book of beautiful and engaging data visualisations. It tells a fascinating and coherent story, and it has a point (more than one point, in fact). I’m almost as impressed by what was intentionally left out of the book as what was included. You should definitely buy it when it goes on sale soon.

Explore competition in New Zealand industries

Recently I had the pleasure of working with Harkanwal Singh on an interactive tool to explore measures of the intensity of competition in New Zealand industries. The tool shows various views of a new dataset of competition in New Zealand industries that Richard Fabling and Dave Maré of Motu produced from firm-level data in the longitudinal business database. Harkanwal did most of the work on the visualisation tool and I used his work to make a summary report.

Two things to read

While you’re waiting for Chris and Tim’s book to come out, here’s a couple of essays I liked recently.

Parametric Press is new online magazine heavy on data visualisations. I liked The Myth of the Impartial Machine by Alice Feng and Shuyan Wu.

I’m a data scientist who is sceptical about data is an essay by NYU professor Andrea Jones-Rooy that made me nod my head a lot.

Who am I?

I’m an independent consulting data scientist and economist. I use data to do things like forecasting, evaluate government policies and investments, and help businesses to make better decisions. I use R. I work with all kinds of data – big data, small data, geographic data, economic data and data that has nothing to do with economics – and I’m good at turning analysis into words and graphs that people can understand. Drop me a line if you need help with any of these things.

Ngā mihi Aaron

Back to home