IRS Finally Examines Backup Tapes, Recovers 30,000 'Missing' Lois Lerner Emails
from the oh,-you-mean-THESE-backup-tapes? dept
Whether or not the IRS is subjecting certain politically-affiliated groups to an unfair amount of attention remains to be seen. What is indisputable is that the agency's document retention policies are an unenforced joke. As citizens, we're required to hold onto pertinent financial records for 2-7 years just in case the IRS wants to look through them. The IRS, however, seemingly only retains records for as long as it can keep itself from inadvertently destroying them.
Emails from IRS official Lois Lerner have been sought for several months. At first, the IRS said it had them. Then it said it couldn't find them. Then it said Lerner's computer suffered a hard drive crash, taking with it a bunch of the emails being sought. Then it said more computers had crashed, taking out even more emails. Then it said it had recycled the crashed hard drives, making any data unrecoverable.
Questions were asked, most of them being "Bro, do you even back up files to a server?" Apparently, the IRS did no such thing, or was unaware of it, or didn't understand the question… and so on. The IRS admitted it told officials to print out and save emails (per internal guidance) but apparently no one took these rules very seriously, as there was no hard copy to be found either. A Justice Department official noted that there were backups, but that it was too hard to recover stuff from them, before dozing off in mid-sentence.
Now, all of a sudden (well, actually on a pre-Thanksgiving week Friday afternoon), the IRS has suddenly found the emails it claimed were lost.
Up to 30,000 missing emails sent by former Internal Revenue Service official Lois Lerner have been recovered by the IRS inspector general, five months after they were deemed lost forever.The prodigal Lerner emails have returned! And there was much rejoicing, especially in Darrell Issa's camp, which has been applying much of the pressure over the past several months.
The U.S. Treasury Inspector General for Tax Administration (TIGTA) informed congressional staffers from several committees on Friday that the emails were found among hundreds of “disaster recovery tapes” that were used to back up the IRS email system.
It will still be some time before these emails are turned over, however. The investigators looked through 744 disaster recovery tapes, holding an estimated 250 million emails and says it will be a few weeks before the recovered emails are in a readable format. If this goes at the usual speed of government, it will be next year before the emails even make their way into the hands of the investigating committee, and longer than that before the public can take a look for itself.
The good news is that despite the IRS's internal failures, the system still mostly worked. A backup backed up files and (after much hassling) an internal investigation recovered most of what had been declared officially missing. It's almost enough to restore your faith in the IRS (and the government as a whole), except for almost everything else about the IRS (and the government as a whole).
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: backups, irs, loise lerner
Reader Comments
Subscribe: RSS
View by: Time | Thread
Just enough time...
[ link to this | view in thread ]
Re: Just enough time...
I can imagine that their 'delete' key is worn the fuck out by now.
[ link to this | view in thread ]
What the heck were they thinking?
Let's talk about that statement for a moment.
I happen to have access to some very large and diverse email archives. A small sample of those (363K messages) suggests an average message size of just over 8K each (including full headers). 250M such messages would occupy 2T -- well within the capacity of a single external drive, even without compression and allowing for the overhead of encryption. If we presume for a moment that their message corpus has an average size that 500% larger, this still remains a tractable problem: buy a bunch of external drives, encrypt, copy each year's subset onto each of a set of three drives, store in diverse locations, test periodically and replace any drives that fails by cloning one of the ones that hasn't.
So why are they screwing around with hundreds of tapes? (And are those 744 tapes replicated somewhere?)
[ link to this | view in thread ]
[ link to this | view in thread ]
What Work?
I've been reading up on DRM security for Outlook/Exchange, and it's scary how fragile the system could be. Lose the server and certificate, your emails are unreadable.
744 tapes to look through just sounds far too disorganized. I suspect you see chaotic disorganization as a side effect of a government that does not pay for good IT (despite that the IRS's primary job is essentially about organizing data) plus the typical civil service / large bureaucracy problem that nobody can make a timely decision or fix a problem. Backup takes too many tapes? Backup crashes? We'll set a taskforce to study the problem and make recommendations. Need new tape system? We'll put in a request in next year's budget and hope it doesn't get cut.
[ link to this | view in thread ]
[ link to this | view in thread ]
Re: What the heck were they thinking?
It probably is a full server backup. That is, each backup set includes not only emails, but also things like shared file servers and databases. While email databases can be small (unless your users love file attachments), shared file servers can have years of accumulated junk, and databases can be huge even without accumulated junk.
The emails were probably backed up just as a side effect of backing up everything else.
[ link to this | view in thread ]
Look, I know conspiracy theories are fun and all...
To Mr. 5-months-is-enough-to-delete-incriminating-emails - I know 30k emails SOUNDS like a lot. It is not. It's probably about 15 days of work for someone who had never done that kind of thing before, probably half that for someone who actually has some familiarity with document review.
Disaster recovery guy? It sounds like you've got an IT background, but it's possible that you're not familiar with disaster recovery - but it doesn't work like a drive mirror. The things you're talking about bear no relation to the reality of how tape archiving works.
But from my point of view, the biggest problem here is that by even talking about server problems and email infrastructures, you're fundamentally buying into the idea that there's a scandal here. We've been through literally YEARS of Issa bullying everyone in his arm's reach, trying to manufacture a scandal whether there was one or not. And what's come out is that the IRS took the somewhat lazy tack of focusing on groups whose names clearly indicated that they intended to violate the laws pertaining to their nonprofit status. Which, to be honest, seems reasonable to me. But then again, profiling always SEEMS reasonable when it's against people you don't like, and I generally don't like liberal or conservative groups who pretend to nonprofit and nonpartisanship.
So it's profiling, and that's wrong, and I'm absolutely willing to let that principle stand over my own feelings of "yeah, not too bothered by this." But where's the coverup? What's the point? The scandal - such as it is - is out in the open. Occam's Razor is that this is stupidity and laziness.
But who knows? Maybe it'll come out - if Issa keeps pushing to keep himself in the news... oops, I mean, if he keeps tirelessly investigating this - that the White House ordered the IRS to use their powers to TAKE AWAY THE TAX BREAKS of a bunch of fairly insignificant progressive and tea party groups. Truly, a plan worthy of Lex Luthor.
[ link to this | view in thread ]
How many of those 30,000 emails are spam?
[ link to this | view in thread ]
Re: How many of those 30,000 emails are spam?
[ link to this | view in thread ]
Re: Look, I know conspiracy theories are fun and all...
Conspiracy? Ehh your point is invalid here because they stated they lost the emails, yet five months later here they are. Whether it was an attempted coverup or gross incompetence is beside the point: They said they lost them, you argue that's probable, except they didn't lose them so your argument is null and void.
[ link to this | view in thread ]
Re: Look, I know conspiracy theories are fun and all...
"We've been through literally YEARS of Issa bullying everyone in his arm's reach"
You've been personally affected by this? How?
[ link to this | view in thread ]
What a coincidence
[ link to this | view in thread ]
Re: Re: What the heck were they thinking?
If I'm tasked with creating a legally-mandated backup/archive of an email corpus in order to comply with future discovery requests, then I do that separately from routine server backups that include everything on the disks. The former only needs to contain email messages and perhaps the log files associated with them. The latter needs to contain everything including the OS and all the software, libraries, config files, etc.
But it's the former that I would retain in triplicate in order to comply with the law, not the latter. (I don't think anybody's going to come along 5 years later and ask for a recovered copy of /usr/sbin/sendmail.) The former is an archive designed to achieve compliance with records retention laws; the latter is a backup designed to defend from hardware/software/human failure or from a successful intrusion.
[ link to this | view in thread ]
Re: Look, I know conspiracy theories are fun and all...
You sound dismissive, but *if* it's true it's worthy of impeachment, just like a third-rate burglary would be. And just like a certain third-rate burglary, the problem may not be so much the crime as the coverup.
[ link to this | view in thread ]
Re: What a coincidence
[ link to this | view in thread ]
Re: Re: Re: What the heck were they thinking?
About those tapes, of course tapes do fail. They're just as fragile as hard drives, possibly even more so. The good thing about them is they're cheaper. Tape drives fail too, and tend to take the tape with them when they do. I wonder if anyone's still making them.
I would think multi-CD or DVDs would make a better and more permanent medium, and I think somebody's even come up with a way to laser burn data onto glass which can last a heck of a long time (though even glass "flows", given enough time).
However, kudos to the IRS backup team! At least somebody was taking their responsibilities seriously. Good job. Any time data is successfully recovered from backups, it's time for a Snoopy dance.
[ link to this | view in thread ]
Re: Re: Re: Re: What the heck were they thinking?
That's pretty much a myth.
[ link to this | view in thread ]
Re: Re: Look, I know conspiracy theories are fun and all...
You appear to have forgotten about the left hand, right hand problem. Somebody said they lost them. That's entirely possible if they were unaware of those disaster recovery backup copies. They were lost to them, and then somebody at the back of the room raised his hand.
[ link to this | view in thread ]
30,000 emails worth of time has passed.
[ link to this | view in thread ]
Re: Re: Re: Look, I know conspiracy theories are fun and all...
After they found them. Which is a null and void argument to make because they found them. Left/right hand doesn't enter into it.
It's like if you lost some emails and found them six months later, then told me that you found them, and i then piped up and said, "naw dog, you probably lost them for good because of x y and z".
That's his argument in a nutshell.
[ link to this | view in thread ]
Re: What the heck were they thinking?
[ link to this | view in thread ]
Re: Re: Re: Re: What the heck were they thinking?
I will give you that if your tape drive fails, youre gonna have a bad time. None of the drives I have worked with were backwards compatible.
[ link to this | view in thread ]
Re: Re: Re: What the heck were they thinking?
I'm pretty much sure they are running an Exchange server, since most .gov types I've run across usually are, so backups for data retention and message discovery are a bit different. Usually it's either a separate appliance, (ProofPoint, Baracuda, GFI, et al) or you use an in-house mailbox and journaling, then just export the journal mailbox to a pst file for archival storage, ie if you only need active discovery of messages for the past year or so. I.E. a bit messy from your typical archival/compliance message backups, but generally with the export it's pretty much the same process.
[ link to this | view in thread ]
Exchange Restore
The cheaper backup solutions simply backed up the database. OK, but how do we restore? Overwrite the (corrupt, failed) existing database. the server may not exist any more, or it's a production server and needs to be there to allow people to keep getting email. So we create a new server, additional cost, additional resources. format as Exchange server. Restore database and attach. But this server needs to be on the domain so the userid's are valid. Either do it with the existing domain, or duplicate a domain controller on the private network. After that restore, open the mailboxes required, run search for requested emails. All done? restore number two tape and repeat. Then three... all the way to seven hundred. An email may be on one tape but not the next, if there was not enough mailbox room and it was moved to local PST. Eliminate duplicates.
I hope nobody asks for another user's email, or they do this all over again from tape 1.
[ link to this | view in thread ]
Re: Re: Re: Re: Re: What the heck were they thinking?
I first heard about it from a PBS show where they showed that it's easily detected in cathedral stained glass windows.
[ link to this | view in thread ]
Re: Exchange Restore
[ link to this | view in thread ]
Re: What the heck were they thinking?
(Also, as for size, I have 9 years of email from a previous job, from 2005 to 2014, exported as PSTs, converted to mbox format and exported to timestamped individual files. The median size is 12k [ls -l | awk '{print $5}' | sort -n | sed -n "$(($(echo * | wc -w)/2))p"], but the average size is a much larger 180k [46k messages totalling 8.1G], and I know I would frequently delete mails with big attachments that I didn't want to save).
[ link to this | view in thread ]
Koskinen should be fired
This guy needs to be fired, and replaced with someone who will actually work to get to the bottom of this mess, rather than someone who is a lackey to delay as long as possible, and continue to sweep things under the rug.
I thought Koskinen's appointment would be a good move, but his actions prove he has the intelligence of a garden slug....Apologies to all of you garden slugs out there.
[ link to this | view in thread ]
Re: Re: Re: Re: Re: Re: What the heck were they thinking?
https://en.wikipedia.org/wiki/Glass#Behavior_of_antique_glass
That's why I said it's "pretty much" a myth. Yes, glass can flow, but it's SO slow you'd never notice it even in centuries-old glass - or glass that's been around for the last billion years, for that matter. As the article points out, if this were true you'd expect really old telescopes (telescopes are very sensitive to any change in the lens) to be unusable, but this is not the case. And the really ancient Roman or Egyptian glass should be showing proportionally more "flow" if was flowing, but this is also not observed.
The cathedral glass LOOKS this way because the glass was spun unevenly, and they ordinarily put the thick side down for stability.
[ link to this | view in thread ]
Re: What the heck were they thinking?
QUC-120's were good enough for my Grandfather and they are good enough for me! and apparently the IRS....
Young Whippersnapper!
[ link to this | view in thread ]
PACs
This whole issue is like Benghazi - parts of the situation were a tragedy but the sinister conspiracy just didn't happen the way that some people have tried to make it seem.
[ link to this | view in thread ]
Things that make you go hmmmm. One thing we can say for sure. There's a bunch of bald faced liars inside the IRS.
[ link to this | view in thread ]
Problem with their process...
The problem is obvious. They forgot to add, "Then fax your printed e-mails to our offsite backup where scribes will copy them to papyrus scrolls for long-term storage."
It is 2015 and the IRS is using computers (sort of) but their procedures are stuck in 1980 mode.
[ link to this | view in thread ]
IRS IT
[ link to this | view in thread ]
Re: Re: Re: What the heck were they thinking?
As the article says, these are from DR tapes, which likely means they are full server backup tapes.
DR backups are not designed to go "lets recover email x from the email database stored on server Z".
DR backups are designed for restoring entire servers to an operational state after some sort of disaster - ranging from someone taking a hammer to a server to a nuke destroying the data center.
At best they could restore the entire email database, then search through the restored database for emails (either by querying it directly or importing it back into an email server to do 'normal' email searches).
Not to mention they may have to actually restore multiple instances of the database, because if they are DR backups they are probably monthly full backups, so to retrieve emails that have been deleted at various times, they may have to restore several backup versions to pull out emails that may have been deleted prior to subsequent backups.
All in all, it sounds like they have a bodgy system designed to NOT be able to easily audit/version emails, with no email-specific archiving mechanism (hey, they even said the official way to 'archive' business-relevant emails is to print them out and literally file them on a paper file). Therefore it is a SIDE-EFFECT of DR backups that they are able to retrieve old emails.
Hell, if I was running the backup system and the intent (even if unofficially) was to not keep a history of business decisions, I'd only keep 2 'monthly' backups, overwriting the 3-month old backup with the current month (in addition to the daily incremental backups which would only be kept until the next full backup is verified successful). That way you COULDN'T go back more than 3 months, which is more than sufficient for DR.
Of course, that's of you are using a grandfather/father/son backup schedule, I always preferred "incremental forever" systems myself.
[ link to this | view in thread ]
Typo
There's no "W" in the word "Hole"
[ link to this | view in thread ]