16K COVID-19 Cases Go Missing In UK Due To Government's Use Of Excel CSVs For Tracking
from the excel? dept
Yes, yes, you're sick of hearing about COVID-19. Me too. But the dominant force of 2020 continues to provide news, often times with a technology focus. This mismanaged pandemic has already given us an explosion of esports, students gaming remote learning systems, and enough dystopia to make George Orwell vomit in his grave.
But to really get your anger bubbles gurgling, you need turn only to the myriad of ways far too many governments have taken a moment that requires real leadership and forethought, and pissed it all down their legs. America appears to be trying to lead the charge in this, with our shining city on the hill mostly being illuminated by headlights of cars carrying sick passengers looking to get tested for this disease. Still, we're not alone when it comes to sheer asshatery. The UK recently managed to lose thousands of COVID-19 cases... because it was tracking them in Excel CSVs.
The issue was caused by the way the agency brought together logs produced by commercial firms paid to analyse swab tests of the public, to discover who has the virus. They filed their results in the form of text-based lists - known as CSV files - without issue.
PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team, as well as other government computer dashboards.
Public Health England (PHE) decided to put all of this information into a file using the XLS format. XLS was first introduced in 1987 and was replaced by the XLSX format over a decade ago. Putting aside the use of Excel to monitor positive COVID-19 cases in a major industrialized nation for just a moment, just the use of an antiquated format managed to lose PHE over sixteen thousand positive cases.
How? Well, XLS has restrictions as to how many rows of data it can record.
As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of. And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases.
When that total was reached, further cases were simply left off.
Which means the people that had COVID-19 weren't tracked for contact tracing. The government and its people didn't have a complete picture as to either the total case count for the disease, nor its positivity rate. In other words, the agency in charge of national health failed to keep the nation informed as to its risk exposure because it didn't know how to properly use a common office application that it repurposed to record COVID-19 data.
Labour's shadow health secretary Jonathan Ashworth said lives had still been put at risk because the contact-tracing process had been delayed.
"Thousands of people [were] blissfully unaware they've been exposed to Covid, potentially spreading this deadly virus at a time when hospital admissions are increasing," he told the House of Commons. "This isn't just a shambles. It's so much worse."
The UK's Health Secretary told the House of Commons that PHE had decided to replace the use of Excel, or what he called a "legacy system", two months ago. But apparently PHE hadn't gotten around to it yet.
And still hasn't, actually. In fact, PHE's plan to temporarily fix all of this is... more Excel!
To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.
But insiders acknowledge that the current clunky system needs to be replaced by something more advanced that excludes Excel, as soon as possible.
When you hear complaints that governments are not taking this pandemic seriously, this is what they mean.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: covid-19, excel, public health england, uk
Reader Comments
Subscribe: RSS
View by: Time | Thread
Hint for other agencies trying to avoid this mistake: SQLite files are well-documented and have size limits rather larger than Excel's.
[ link to this | view in chronology ]
Re:
Bold of you to assume that they even consulted anyone who knows what a database is for this work.
[ link to this | view in chronology ]
Re:
"SQLite files are well-documented and have size limits rather larger than Excel's."
That sounds like IT. Governments hate IT.
Much easier to just use excel.
[ link to this | view in chronology ]
This is a misreading. The very next paragraph says the CSV files (presumably not having anything to do with Excel) were fine. The format to blame is XLS; or, better, the software to blame is Excel (any reasonable software would warn the user before throwing away part of their data).
Still, CSV often leads to trouble, because it's really a family of formats. RFC 4180 attempts to standardize it, but if nobody specified this as the format to use, probably at least one piece of software will be using something subtly different—subtle enough that things will appear to work at first, until there's a string containing a comma, backslash, double-quote, or newline; or the first line starts with "Sep=". And you could still be screwed if someone opens that in Excel, since it doesn't default to 4180.
[ link to this | view in chronology ]
Re:
It think the point is rather not unrecoverable data loss, but that those cases were in fact lost with respect to anything useful which could have been done with the data in a timely fashion.
[ link to this | view in chronology ]
Re: Re:
The claim is that Excel is limited to 65535 or 65536 records in certain circumstances. One would think there would be some hard-to-ignore errors/warnings when dealing with larger datasets, e.g.:
But, apparently, Excel sometimes silently truncates the datasets. If so, that's bad design.
[ link to this | view in chronology ]
Re: Re: Re:
It's been a while since I've seen spreadsheets that large, due to thankfully being employed in areas where people are capable of picking the right tool for the job. But, I do recall having seen such errors in the past.
I'd say it's more likely that the errors you described were being displayed, but the monkeys using the spreadsheets just ignored them. I hate to side with Microsoft, but if the choice is between assuming competence on their part or on the part of people who thought that a spreadsheet was the correct tool for this job, I'll side with them.
[ link to this | view in chronology ]
Re: Re: Re: Re:
If so, those monkeys are masters of spin to be claiming it's a Microsoft problem. It seems rather unlikely to me, but I don't have a copy of Excel to check with.
Were they errors, they couldn't have just been "ignored", because Excel would've aborted the saving/loading and nobody would have received the truncated data. Warnings could be ignored, but software should make it hard to do by accident and should make the consequences obvious (e.g., it shouldn't be an OK/Cancel dialog defaulting to OK).
[ link to this | view in chronology ]
Re: Re: Re: Re:
The standard error when Excel fails reading a file says that there is an error with the file (not Excel) and would you like Excel to fix the file. Most people just click Yes and Excel throws away stuff until it can read the file.
Recent case in point: was using XLSX but kept getting errors with an outputted file with 150,000 rows. Turned out XLSX wasn't having a problem with the row count since it will let you have a million rows in the new format, but it was choking on the fact that it only allows 65,535 hyperlinks in a single sheet. And it kept saying the file had an error. No, the file was perfect, Excel just failed and blamed everyone else. How did Microsoft ever become the software giant it is today with bullshit like this?
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
Marketing, and getting its OS on the IBM PC.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
Wow. Given the option to fix a file, of course one would want Excel to try and would expect it to show an error if it were unable. Throwing away half the file is an... unconventional definition of "fixing".
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
"How did Microsoft ever become the software giant it is today with bullshit like this?"
With office suites, they indulged in a lot of deliberate sabotage that prevented competitors like Novell from having their software work properly on Windows 95, leading customers to buy Microsoft Office instead...
For some time they had a monopoly in the space, and while free alternatives do exist now, since they had to reverse engineer Microsoft's file formats to be compatible they don't always work 100% or offer the same collaboration tools, so corporate environments refuse to switch.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
"How did Microsoft ever become the software giant it is today with bullshit like this?"
Bullying. Also, swarming the market with what, for lack of a better term, they called "standards".
[ link to this | view in chronology ]
Re: Re: Re:
It does.
It pops a big box that says 'Significant loss of functionality' and then a decent blurb explaining exactly what's about to happen.
I suspect the issue is probably that someone wrote a VBA macro that may not have the same safeguards.
[ link to this | view in chronology ]
Re:
That's why they really should just use xml. It's easy to parse. There's plenty of tools available for it so that you can transform the data into whatever format you need it. Plus, it's extensible. so if you find you need some additional information, you don't need to "add another column" possibly messing up existing parsers.
Also, I really suspect the name of the file where they stored the data was called "Excel.ppt".
[ link to this | view in chronology ]
Re: Re:
The problem wasn't the CSV file, but rather the import into an old excel format, and xml would not fix that problem.
[ link to this | view in chronology ]
Re: Re: Re:
It kind of would—as I recall, Excel can't import XML files. While not a CSV problem per se, it's really easy to import a CSV to Excel incorrectly. XML is harder to deal with in general, which should also make it harder for non-programmers to fuck up (although there's no shortage of opportunity for programmers to fuck it up).
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
"Dev: Have at it then, just don't come to me when it goes wrong"
It's much easier now that everyone's working remotely, but I quickly learned to routinely demand that any decisions were made via email, ticketing or IM and not verbally. It's amazing how many times that's saved my skin when some middle manager tried blaming techs for not doing something that they had explicitly ordered not to be done. There's not a company I've worked for where some penny pinching management type hasn't ignored all warnings to save a few hundred, then went into full blame mode when a preventable outage cost thousands.
[ link to this | view in chronology ]
Re: Re:
<OT>
Documenting decisions, incidents and actions helps with vermin like lawyers too. Lawyers hate opponents and individuasl who document out the wazoo. They like everything to be verbal, that way everything is "he said, she said" in front of the judge. Arguing is easy-peasy.
On the other hand, document everything, and in writing ask for clarification/confirmation. That way in front of a judge, there is a document and the lawyer has to explain why he didn't clarify/explain when invited to.
Not a cure all for the scum of law. However, fighting someone who documents thoroughly adds significantly to a lawyers costs and often isn't worth it. Even better, on RARE occasions, proper documentation can lead to a scum-bag lawyer getting what they deserve; unfortunately not often enough due to the lawyer conspiracy of "Professional Courtesy".
Similarly, managers hate fighting documents. Documenting isn't fun, but unless you are one of those who go through life lucky, then it's better to be prepared. Documents which exist but weren't needed are a small waste of time. Documents who don't exist and are needed can make a big difference in the course of one's future.
</OT>
[ link to this | view in chronology ]
Re: Re:
Or, as /r/TaleFromTechSupport would say, "CYA" (Cover Your Ass).
[ link to this | view in chronology ]
Re: Re: Re:
Exactly. If it's not written down, it didn't happen, and there's always a middle manager trying to get out of blame for his dumb mistakes.
[ link to this | view in chronology ]
Re: Re:
"I quickly learned to routinely demand that any decisions were made via email, ticketing or IM and not verbally."
This, right there. Always make sure there's a correspondence chain detailing who asked for what and when.
Because if such correspondence doesn't exist, it'll be the fault of the grunt who did the work, not the one who ordered the work done to those specifications.
[ link to this | view in chronology ]
nuclear facepalm
Wow just doesn't seem appropriate enough.
Priceless
If it was a matter of cost, someone should introduce them to LibreOffice. The standard .ods format can handle 1,048,576 (2^20) rows and it's free.
[ link to this | view in chronology ]
Re: nuclear facepalm
Excel, like IE has become critical to corporations due to reliance on their unique features.
[ link to this | view in chronology ]
The larger issue isn’t that so many cases went missing.
The larger issue is that they had that many cases to lose in the first place.
[ link to this | view in chronology ]
The wrong tool . . .
So they lost Access?
[ link to this | view in chronology ]
Re: The wrong tool . . .
They were a centre of Excel-lence
[ link to this | view in chronology ]
Re: The wrong tool . . .
That would explain not having a Base for their Data...
[ link to this | view in chronology ]
This one wonders...
Hmm a file small enough to email.. and as a file instead of a trackable database query? Guess no worries of.. imdunno..data theft?
[ link to this | view in chronology ]
Top-rated comment on this from Ars, by deet:
[ link to this | view in chronology ]
Re:
"It's nice to be in IT and to know the right way to do everything and to have everything you need in order to do whatever you want."
...then you wake up.
[ link to this | view in chronology ]
Not too much self reflection here
Tech people on here who have worked with large orgs and government should really understand how 'new and emerging requirements' get handled in these situations.
Firstly, almost no development will be getting done in-house. Your new reporting solution will need to go to an external party who is just dying to milk time and money from your government department.
People are here talking about databases which might be part of the solution but then how are you extracting data and where's it going to? I mean, do people really think the contact tracers have SQL access/skills?
No, contact tracers are a hastily assembled call-centre group with no CRM system who probably work by having a team-leader assigning cases out of a spreadsheet (yes, a spreadsheet).
The real story here is how, in the era of ubiquitous IT, it is still almost impossibly hard for technology solutions to respond quickly to changing circumstances in a way which is reasonable for front-line workers, manages data security, is flexible and is accurate. And that's not even considering cost.
I'm not suggesting Excel is the 'right way' but there are very good reasons it gets used.
As a footnote, I notice with interest mention of the XLS 'format'. I'd be interested to know more about that, it's a common data-analyst trick to add an XLS extension to a tab-delimited text file. File associations will cause this to open in a recipients Excel with a warning about file format. In this scenario you are not limited in the number of rows but I wonder what happens if someone tries to save said file. I imagine it will apply the 'correct' XLS format and truncate the file.
[ link to this | view in chronology ]