Machine readable data needs to be human usable

This is perfectly machine readable data:

Month,Year,Region,Value
1,2016,Auckland,1000
1,2016,Wellington,2000
3,2017,Auckland,1500
4,2017,Wellington,
5,2017,Auckland,1.2

R, Python, Excel, STATA, etc, will all have no trouble reading such a CSV data file.

Any human trying to use this data will have many questions:

So this data is perfectly readable by a machine but it is not useable by a human. Since ultimately almost everything that machines do is decided by humans, for data to be useful it needs to be both machine readable and human usable. This applies to open data and data used within organisations.

Human usability requires careful documentation of the data’s characteristics and quirks. This needs to be recorded separately from the data (ie as metadata and/or documentation) but be easily findable and accessible by all users of the data. It should be in a single place or file, not scattered about, and definitely not only stored in people’s heads. People should be able to find documentation and metadata in the same place that they find data itself. They shouldn’t have to go hunting for it elsewhere on a website or server.

Some metadata can also be machine readable, eg whether the years are calendar or financial years could be recorded in a standard format and the machine could “understand” this when reading the data. However, in almost all cases what the machine does with the data still has to be specified by a human so ultimately a human needs to understand the metadata too. And some of the more features a dataset, such as the process by which it was collected, are best recorded as free text that will be difficult for a machine to “understand” anyway. In other words, you can’t write the human out of the equation (not yet …).

Making data machine readable is largely a mechanical process of ensuring it conforms to appropriate standards. Making that data also human useable is more difficult. It requires thinking about and answering the types of questions listed above. If you are already quite familiar with a dataset, it may be hard to know which features are not obvious to a newcomer.

Making data human useable can be a tedious and boring process, but without this work, data is not valuable. It often seems like data providers devote too many resources to technical solutions to make their data machine readable, like complicated APIs, while devoting too few resources to metadata and documentation. In many cases, data would be more valuable and would get more use if the technology for sharing it was simpler but the documentation was better.

Back to home