Did The NSA Think The Public Can't Do Math? Attempt To Downplay Data Collection Fails Miserably
from the carry-the-one... dept
Last week we wrote about the NSA's ridiculous attempt to justify its surveillance efforts, including this really wacky callout designed to show just how "little" data the NSA collects.Scope and Scale of NSA CollectionThis was bizarre on a number of levels, not the least of which is the wacky basketball court-to-dime scale. Next time, maybe we can play "is it bigger than a breadbox" with the NSA. But, as for what any of this meant, it hasn't been at all clear. Since the NSA has already redefined basic English words like "collect," "target," "datamine," and "relevant" it's not at all clear what is meant by "touch." However, some are starting to dig into the numbers, and contrary to the NSA's attempt to suggest that this is "nothing to fear," a bit of analysis certainly suggests they're collecting quite a bit of info.
According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world's traffic in conducting their mission -- that's less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA's total collection would be represented by an area smaller than a dime on that basketball court.
First up, we have Jeff Jarvis, who highlights a bunch of important comparative datapoints including that Sandvine claims that only 2.9% of US traffic is communication traffic and 68.8% of all email is spam -- meaning that it's entirely possible that the NSA collects nearly all non-spam email and it would still be within its 1.6% number. He also points out that 62% of traffic on the internet is considered entertainment, and we can assume that the NSA doesn't need to collect every copy of Game of Thrones that people are passing around (I'm sure one or two will do the job). He similarly points out that Google itself claims to only index approximately 0.004% of traffic on the internet, suggesting that the NSA may be collecting more info than Google indexes by two orders of magnitude.
Meanwhile, Sean Gallagher, over at Ars Technica, digs a bit deeper into the numbers, suggesting that the NSA's data collection is closer to being on par with Google, but still greater than Google:
The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day. That means NSA is "touching" more data than Google processes every day (a mere 20 petabytes).Gallagher also looks much more closely at the recently revealed details of the Xkeyscore program, to show how that 1.6% of "touched" internet communications can cover pretty much everything important.
As a result, if properly tuned, the packet analyzer gear at the front-end of XKeyscore (and other deep packet inspection systems) can pick out a very small fraction of the actual packets sent over the wire while still extracting a great deal of information (or metadata) about who is sending what to who. This leaves disk space for "full log data" on connections of particular interest.In other words, while the 1.6% number was put forth by the NSA to try to make people think this is no big deal, when you look at what it means, it suggests it's a very big deal indeed. In fact, the NSA may be collecting even more information that people had believed before.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: data, data collection, internet traffic, nsa, nsa surveillance, scale
Reader Comments
Subscribe: RSS
View by: Time | Thread
Statistics
[ link to this | view in chronology ]
Re: Statistics
― W.C. Fields
Requoted for the NSA age:
If you can't dazzle them with bullshit, baffle them with stats.
[ link to this | view in chronology ]
Re: Statistics
[ link to this | view in chronology ]
Re: Statistics
[ link to this | view in chronology ]
Re: Re: Statistics
[ link to this | view in chronology ]
Re: Re: Statistics
[ link to this | view in chronology ]
Well...
[ link to this | view in chronology ]
Re: Well...
[ link to this | view in chronology ]
Re: Re: Well...
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re: No, AC it SHOWS Google is 2/3 size of NSA!
"29.21 petabytes of data a day. That means NSA is "touching" more data than Google processes every day (a mere 20 petabytes)." -- Since you obviously don't understand numbers: 20 / 29.21 is about 2/3. -- Google, ONE corporation, is doing 2/3 as much SPYING that national scale NSA is!
Thanks for the query! You're welcome!
[ link to this | view in chronology ]
Re: Re: No, AC it SHOWS Google is 2/3 size of NSA!
[ link to this | view in chronology ]
Re: Re: No, AC it SHOWS Google is 2/3 size of NSA!
[ link to this | view in chronology ]
Re: Re: No, AC it SHOWS Google is 2/3 size of NSA!
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Now, what does "index" mean? -- Spying on activities.
Now, on other hand this merits exclams: "Google processes every day (a mere 20 petabytes)"!!! Especially when just tossed in to compare with NATIONAL scale collection at NSA!!! -- That's a HEAP of data gathered EVERY DAY about you and me -- mostly you who foolishly just let it, don't do what you can to fight the Data Monster. So I say that for Google TOO "it's a very big deal indeed" that "can cover pretty much everything important."
Where Mike sez: Any system that involves spying on the activities of users is going to be a non-starter. Creeping the hell out of people isn't a way of encouraging them to buy. It's a way of encouraging them to want nothing to do with you." -- So why doesn't that apply to The Google?
[ link to this | view in chronology ]
Re: Now, what does "index" mean? -- Spying on activities.
[ link to this | view in chronology ]
Re: Re: Now, what does "index" mean? -- Spying on activities.
[ link to this | view in chronology ]
Re: Re: Re: Now, what does "index" mean? -- Spying on activities.
[ link to this | view in chronology ]
Re: Re: Now, what does "index" mean? -- Spying on activities.
[ link to this | view in chronology ]
At least one of Sandvine's numbers is wrong
But that aside: clearly the NSA is in a rather unique position to filter all that and pluck out the non-spam: they have the largest corpus of email traffic of anybody, anytime, anywhere. That, combined with basic traffic analysis, combined with their vast computing resources, actually makes the problem somewhat tractable -- if tedious.
In other words, they're best-positioned to toss the junk and keep what's useful. And the more email they vacuum up, the better positioned they are.
[ link to this | view in chronology ]
Re: At least one of Sandvine's numbers is wrong
[ link to this | view in chronology ]
Re: Re: At least one of Sandvine's numbers is wrong
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Doesn't really matter how large it is if it's our most sensitive junk...
[ link to this | view in chronology ]
Re: Doesn't really matter how large it is if it's our most sensitive junk...
Having taken a page from the NSA's PR playbook pedophiles now claim 'In their child molestation mission, NAMBLA only touches about 1.6% of their bodies.'
*PS. Data dolls would have a better ring to it.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Most places send confirmation emails. Do a secure transaction on Paypal? They automatically send you a confirmation email for the amount and destination. Buy something on Amazon, or Barnes and Noble, or Staples, or pretty much any online store and they send you a confirmation email saying "this is what you bought". Often times price is included, sometimes they have the tracking number right in the email for anyone to go to the shipping company's website and plug in the number and see where the shipment is.
So even without the NSA being able to monitor https transactions, they can monitor what you buy online, where you buy it, for how much, and in some cases, when you get it. They can also monitor who you donate to using paypal, and how much.
All from just reading your email.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
The first quote says:
---
According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day.
---
The second quote says:
---
The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day.
---
1.826 vs 29.21 PB
So how much traffic on the internet is there? and how much are they actually processing?
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
For even more number fun, the old-timey british "billion" (a million millions) and the US "billion" (a thousand millions) differ by three orders of magnitude. This has since been reconciled -- it's always a thousand millions now -- but sometimes it can cause confusion when reading older texts.
[ link to this | view in chronology ]
Re: Re: Re: Re:
[ link to this | view in chronology ]
Damnit, people understanding what they are doing is exactly why they lied. Stupid truth, way to go and ruin it for everyone.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Yes, it was a stupid statement.
I don't care if the NSA isn't collecting, scheduled to review, or have review a 480MB YouTube video. I do care if they're collecting/scheduling to/ or have reviewed any of my .0000002 MB e-mails.
[ link to this | view in chronology ]
Re: Yes, it was a stupid statement.
[ link to this | view in chronology ]
Re: Yes, it was a stupid statement.
You send emails that are less than 1 byte?
But yeah, some traffic is much more important than other traffic. Although I don't want them spying on my videos EITHER. The NSA does not need to know about ANY of my traffic, because there is no probable cause.
[ link to this | view in chronology ]
Re: Re: Yes, it was a stupid statement.
[ link to this | view in chronology ]
Re: Re: Re: Yes, it was a stupid statement.
Part of my premise is that of "all the data transmitted on the Internet", most of it is public: such as typical videos.
My example even excluded the fact that said YouTube video is subject to thousands times more transmissions than the single transmission of an e-mail to a single recipient.
[ link to this | view in chronology ]
So who is worse?
But according to out_of_his_giant_blue_ass, raising these points and making articles about them is "trivial" while Google is the DEVIL! DEVIL! DEVIL!
[ link to this | view in chronology ]
Spam
I can see it now: It was V1agra that caused the embassy to be destroyed!
[ link to this | view in chronology ]
Re: Spam
[ link to this | view in chronology ]
[ link to this | view in chronology ]
One official comes out and says this, the next one that, when compared they're both lies and misdirections. Attempts at clearing the air to the public come down to schemes on how to hide and continue.
What really worries me is we haven't heard it all and there are more block busters coming according to Glen.
This whole thing reeks of scandal. Of government gone bat shit crazy and expecting the citizens to just accept it. I know for sure I can't trust what I am hearing coming out of Washington. There's been too many lies and half truths.
We are not to the bottom of all this yet. We are not hearing the truth at all. There is nothing in all these supposed revelations I am comfortable with given I know I'm being lied to up front.
[ link to this | view in chronology ]
Subtitles
Ooops, think I gave them something to think about. Now they'll have to collect all data!
[ link to this | view in chronology ]
Boundless Informant
[ link to this | view in chronology ]
But they are having troubles getting the keys to accept each other. Then it becomes clear why. There's a man-in-the-middle interfering with the transfers of numbers for the keys. Very shortly after they get the numbers to jive, the originating guy gets an email from his buddy in the same room saying they've agreed...but his buddy didn't send that email.
Within a short period of time, AT&T sends an email he needs to update his browser. Why would AT&T do that in email? Why wouldn't it be a web browser pop up?
I don't know what to make of it. Maybe it's more tinfoil hat stuff. Maybe he was dead on the money given how little NSA likes encryption.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Downloading movies
[ link to this | view in chronology ]
Avg email size
and yes, the American public are a majority of idiots. they will believe practically anything the media and govt tell them. "It must be true, i saw it on TV or read it in the paper, etc..."
For those of you tech savvy enough to figure it out, the TOR project is a great way to keep prying eyes from your private parts. May be slow, but so far, it's secure. The more people that use TOR, the more difficult it becomes to trace what we are doing within it.
[ link to this | view in chronology ]
Cut their funding off
[ link to this | view in chronology ]
Re: Cut their funding off
[ link to this | view in chronology ]
Re: Re: Cut their funding off
[ link to this | view in chronology ]
Re: Cut their funding off
[ link to this | view in chronology ]
NSA could be helpful ...
[ link to this | view in chronology ]
[ link to this | view in chronology ]
All this is stupidity!
If you are stupid enough to use open email as a way of communicating with others about your illegal activities, you are too stupid to be allowed to circulate in society, because your very presence there (given your level of stupidity) will degrade society. I would rather have you in jail for your illegal activities than to have you drooling on my shoes (a metaphor, in case you are one of them) wherever we cross paths.
What the NSA is doing, is to just make it somewhat more tedious for "terrorists" to operate, much as the Secret Service makes it very, very difficult, though not impossible, to assassinate the President. The dangerous "terrorists" already know about encryption, and the many other ways from which they may choose to communicate covertly.
Some rambling: I would assume, they (both the NSA and Secret Service) do stumble across a drooler now and then. Only a true drooler would ever think it would be in any way beneficial to assassinate the president, or to kill a bunch of innocents, as the "terrorists" are wont to do. Though some are more technically literate than others, and would be able to function covertly. Now, more focused rambling.
Anyway, it is important that both the droolers and those who are able to operate very covertly be thwarted, and that is what the NSA is attempting to do. I don't know what some people may be buying on the internet, or putting in their emails and texts that they are fearful that "Big Brother" will discover, but the NSA uses filters much as Google does to target relevant emails etc. No one is (nor could be) reading all that stuff, and no one cares what it says, unless it trips a filter. Hint, don't buy 12 tons of ammonium nitrate, or discuss bombing an embassy. Then someone may read it.
I know that it is really the "principle" of the thing (intrusion into our privacy) that gets people riled up, but in the end, THE INTRUSION IS NOT GOING TO STOP (figger it out, DUH!), as long as there are serious "threats" out there. The NSA doesn't care about "your" emails etc., or even about your privacy, that they are actually not violating, because they aren't actually reading your emails, but they do care about "threats".
This is where it starts to get sticky, because the definition of what a threat is can change. But I see no way that it can come to include my purchase of 2 pounds of organic alfalfa seeds for sprouting, or bathroom scale - my most recent internet purchases. That's about as insidious as it gets for me, and about as insidious as it ever gets for most people. That is not going to trip any filters I can imagine, and so, none of the stuff I do on the internet is ever going to be read by anyone, and even if it is, I don't care. I don't do things that harm people (violate laws - well, you know, generally). I just don't want bombs going off around me. If I ever do decide to blow up an embassy, I'll encrypt my emails.
If you discuss things in your emails that you would find embarrassing (cute little love notes etc.), no one at the NSA is ever going to read them. If I were to send such things I wouldn't care if they did read them, as I'm not particularly bashful. If you're a porn perv - and a pornography "obsession" is a perversion (rather, the result of a sort of mental illness) that is not good for your peace of mind, and being such, harms you - or you are an internet stalker, bent on maltreatment of others, then the filters may (and perhaps should - it bears discussion) come to include such activities. But the real question, again, is where will it stop?
Well, none of us really knows the answer to that question, except that it will not stop until the "government" wants it to. And who is the government? Ultimately, it's you. So, do keep complaining, but get real about it and quit sniveling about if the government could potentially know what you buy on the internet, or what web sites you visit or what's in your innocent emails. Potentially - they don't read your emails, though they could. With a little effort, I could probably capture your email.
The real thing is to work on getting some sort of reality based controls put in place, for when it gets to the point that the NSA finds that its "intrusions" are no longer worth the effort. Then "they" will allow us to put effective control measures into place, but as long as "they" believe they can "protect" the U.S. by doing what they see as necessary, the intrusions will not stop, and personally, I don't want them too. It would be very dangerous for the current level of "intrusions" on our privacy to stop just now. I don't want bombs going off around me. Do you? Devil's advocate, signing out.
[ link to this | view in chronology ]