The MoT Files: The Story Behind The Data
Dan Harrison, Editor
Fri, 11 May 2012
Our MoT data story picks up where the BBC left off.
You may recall back in January 2010 the BBC won an 18-month battle to force the Vehicle Operator and Services Agency (VOSA) to release details about MoT pass rates.
VOSA - the body responsible for MoTs in the UK - initially declined to supply the material, but the information commissioner ruled that disclosure was in the public interest and overturned VOSA's refusal.
Reluctantly, VOSA released a 1,200-page document that gave a breakdown of how a variety of models perform in the MoT test. It was a big story and gave the public access to previously confidential information. VOSA also agreed to regulary release updates
There were, however, problems with the data. Although it was 1,200 pages long and gave more information than ever before, it was incredibly vague. It told you make, model and pass rate but not much else. It gave the impression that the car with the highest failure rate (Renault Megane at just 28.1 per cent) was inherently unreliable. That may very well be the case. But it also may not be, as we had no way of telling what the cars were failing on. Was it something cheap and simple like a headlight bulb blowing or something the owner should have replaced like a wiper blade? There was no way to differentiate (though we now know the reasons for the Megane and they're here...)
It also didn't tell you where in the country the tests look place - the pass rate varies more than you might expect or how many miles the cars had covered.
Then, in May 2010, came a change of Government.
This was a significant development as the Conservatives had promised to open up more Government data to the public and website/app developers.
Following the launch of the OpenData website - which houses all the data that the government has released - we downloaded the MoT data when it became available and set about getting it into a format that could be easily accessed. With more than 355m records, 200m MoTs (all those since the system was computerised in 2006) and 40gb of data, this wasn't an easy task.
Like the BBC, we have also had a few problems dealing with the MoT data that's provided by the Government. Firstly, it's huge and difficult to work with. Secondly, as it's sourced from thousands of technicians - and humans make mistakes - it was littered with errors. There were plenty of cars registered in the 1800s and a few steam-powered Renault Clios to boot. We've done our best to ensure it's as clean as possible, but with such a huge data set, there may still be the odd error.
Then there was the structure. One person's BMW 320 is another's BMW 3 Series - if we went with the VOSA data as it was, we would have ended up with a huge amount of separate BMW 3 Series models (and 1, 5, 6 and 7 Series for that matter) - amongst others. Plus there were the complications of generations, bodystyles and other variants. We've done the best we can with the data we had to classify these models in a sensible, useful, way.
Like any big set of data, there will be generalisations and some comparisons may not be fair. It's been a challenging data set to work with and we think we've been able to present the information in a clear, intelligable and meaningful way. We've tried to fill in all the gaps and add significantly more detail to the work the BBC started in 2010.
So here, for the first time, is the MoT data that has been kept secret for so long. What we are publishing goes into more detail than ever, showing makes and models going back to 1980, information about tests in 118 postcode areas and how mileage affects the MoT test.