22 July 2017

The slow road to getting open data from the Government's Clean Water 2017 water quality monitoring sites

Who remembers the National Government's consultation over it's proposed Clean Water package 2017?

Who remembers the headline announcement of the proposal? - that there would be a 'target', that 90% of rivers and lakes would be swimmable by 2040?

The environmental NGOs were very critical of the target (and the proposal as a whole).

The Green Party said the new swimmable standard was just shifting the goalposts.

Forest & Bird's Kevin Hague described the proposal as a reduced swimmability standard.

Marnie Prickett of the Choose Clean Water group described the proposal as "fraud" as it intended to change the definition of swimmable to meet a lower standard.

The environmental NGO's argument was that the new proposed 'risk' standard for swimming (expressed in E Coli as an indicator of faecal matter and pathogens) allowed a one in a twenty probability of getting sick when the old standard was a much more precautionary one in a hundred probability of getting sick.

Dr Siouxsie Wiles and Dr Jonathan Marshall explained that the change in risk wasn't quite as simple as that. As did University of Auckland Professor of Biostatistics Thomas Lumley.

However, I thought there was something wrong with that 90 percent number. I seemed to recall Green MP Eugenie Sage saying in 2014 that more than 60 percent of the monitored river swimming sites were unfit for swimming.

The Clean Water package 2017 included this barchart which shows that the 90% 'swimmable' target (and five new swimming quality categories from 'excellent' to 'poor') are actually expressed in a different variable: length of river measured in kilometres (not in number of monitoring sites).

It also shows, in the left-most bar, that the use of the use of the 'length of river' variable in place of numbers of river monitoring sites, results in a very different result.

On the basis of recent data, 72 percent of kilometres of rivers currently meet the 'swimmable' standard (the sum of the 'Fair', 'Good' and 'Excellent' quality categories. Expressing the results in kilometres of river lengths and not in numbers of sampling sites immediately enables a more positive spin to be put on the results.

The underlying data must be water quality sampling results from NIWA's National Rivers Water Quality Network (NRWQN) and sites operated by regional councils.

So, way back on 15 March 2017, I asked for the underlying sampling data from the water quality monitoring sites.

I felt I had expressed my official information request sufficiently clearly to get a reply in a reasonable time.

On your website on the page "Clean Water package 2017" there is a bar chart explaining the target of 90% of rivers and lakes swimmable by 2040 included in the report "Clean Water, ME 1293". The bar chart is also on page 11 of report "Clean Water, ME 1293". The bar chart shows kilometres (which I assume are lengths of segments of rivers) in each of the five 'quality' categories (Poor, Intermittent, etc) with a time variable which has three bars; "Current", "2030" and "2040".

Will you please provide me with the underlying data; which I assume must be water quality monitoring site results (and future predictions for 2030 and 2040) analysed by the five quality categories and the three time categories "Current", "2030" and "2040". Will you also please include the name or number of each monitoring site, its region and for the "Current" selection, the sampling period for the actual E Coli counts. Please provide this data either in comma separated values or Excel 2007 format via the FYI website.

However, I had to lodge a complaint with the Office of the Ombudsmen to eventually obtain the data. That only happened after the investigator from the Office of the Ombudsmen brokered a deal with the Ministry for the Environment. He rang me and said that the Ministry didn't want to give me the data in either .csv or .xls format as I'd requested as the data was in a special binary format; .rdata, specific to a certain statistical programming language named after the letter 'R'.

In other words, it appeared to me the Ministry were claiming that a 'technical' problem in providing me the data I had requested, and not a problem of intent to frustrate the information request.

Sure, it's fair enough to take the Ministry at their word that they didn't intend to delay and frustrate my request. However, whatever the intention, it was still a delay from my point of view as the requester.

I told the investigator I would be happy to get the data in .rdata format. I also expressed the view that it would have only been a very short line of 'R' script to convert the .rdata formatted file into .csv format. And that it was a weak reason for the delay and for not providing me the data in .csv format. I observed that the Ministry's response was pretty unsatisfactory from an open data perspective. The investigator said he couldn't comment on open data issues, as we were in an official information space.

I was finally emailed the data in .rdata format by the Manager, Executive Relations, on 5 July 2017.

I used this R script;

to write the .rdata file to a .csv file.

The .rdata file is WQdailymeansEcoli.rdata at Google Drive.

The .csv format file is WQdailymeansEcoli.csv at Google Drive.

Now I just need to find the time to analyse the sampling sites data.

No comments:

Post a Comment