Facebook's Distinction Between Data Breaches And Scraping Would Make A Lot More Sense If It Hadn't Argued Differently In Court
from the but-it's-important dept
A few weeks back, news broke that some hackers were offering up personal information on over 500 million Facebook accounts (there had been some earlier reports about this data availability as well). It was even highlighted that some of the "personal information" apparently included Mark Zuckerberg's phone number (though it's unclear if that data was confirmed to be accurate or not).
After there was more reporting on this data being available, some people noticed that Facebook had not alerted people about it. Various data breach laws require notification to effected users of a data breach, and so Facebook's decision raised some eyebrows. Facebook's explanation for not notifying users is that the company did not consider this to be a breach, but rather was done via scraping of information that people put on their own profiles:
Scraping is a common tactic that often relies on automated software to lift public information from the internet that can end up being distributed in online forums like this. The methods used to obtain this data set were previously reported in 2019. This is another example of the ongoing, adversarial relationship technology companies have with fraudsters who intentionally break platform policies to scrape internet services. As a result of the action we took, we are confident that the specific issue that allowed them to scrape this data in 2019 no longer exists. But since there’s still confusion about this data and what we’ve done, we wanted to provide more details here.
This may feel like weaseling out (and in some ways it absolutely is), but the distinction here is actually really important. As explained in a detailed Wired article, no one "hacked" into Facebook, so in some way the data wasn't breached the way many people think of a breach. Instead, it appears that the "import contacts" feature (which is wise to avoid using on basically any service if you can!) had a vulnerability in that malicious actors could basically try to import every possible phone number, and any "match" would give them access to the information those people shared with connections (so, personal info like phone numbers, names, date of birth, email, etc.).
It appears that data protection regulators around the globe are not happy with Facebook's response, and it would not surprise me at all to see Facebook eventually hit with another fine over this situation.
However, we should be at least somewhat wary of efforts to make data scraping a violation of privacy rules.
Admittedly, Facebook is now trying to lean heavily on the distinction to get out of trouble here. In an email someone from Facebook accidentally sent to a Dutch reporter, it was revealed that Facebook was going to try to "normalize" data scraping efforts as both common and different than data breaches (Facebook later confirmed to Vice that the email was legitimate):
'In the long term we expect more scraping incidents and it is important to frame this as a sector problem and normalize that this happens regularly. To do this, the team proposes a follow-up post in the coming weeks that talks more broadly about our anti-scraping work and provides more transparency around the work we do here. This may reflect much of the scraping activity, we hope this helps normalize the fact that this is ongoing and avoid the criticism that we are not transparent about specific incidents. '
And there is some truth to the fact that this kind of thing happens regularly. There were recent reports on similar "data scrapes" from Clubhouse and LinkedIn.
And here's where things get a bit trickier. If we demonize all data scraping, that would create much bigger problems. Data scraping is actually really important for many non-nefarious situations. Hell, search engines are giant data scrapers, and that's incredibly useful. Similarly, many academic researchers use data scraping to analyze various things online (including the activities of big companies like Facebook). Demonizing all data scraping as if it's functionally equivalent to a data breach would lead to bad results.
Indeed, in many ways, being able to scrape data can help lead to a more competitive internet. As people like Cory Doctorow are pushing for competitive compatibility, some of that may include the ability to scrape data to get it out of a silo like Facebook.
Notably, both Facebook and LinkedIn -- both of whom have just recently faced public reports of this kind of scraping -- have, in the past, sued companies that scraped their sites, claiming it violated the Computer Fraud and Abuse Act (CFAA) for "unauthorized access." As you may recall, Facebook won its case against Power, blocking Power's attempt to create an out-of-Facebook dashboard that users could use to post content to Facebook and other social networks without having to use Facebook directly.
LinkedIn, on the other hand, lost its very similar case, because of an important distinction that Facebook is now conveniently ignoring. In the LinkedIn case, the court found that LinkedIn couldn't use the CFAA to stop scraping because the information in question was publicly available. The court distinguished it from the Power case by saying that, with Power, it was scraping (with permission from the end users) data that was not "publicly available."
And, thus, I think that a lot of people are -- somewhat cynically perhaps -- conflating a bunch of different situations to their own advantage. People who are mocking Facebook for trying to distinguish between data breaches and data scraping are perhaps going too far. There is a difference, and that difference matters. Jumping to the extreme position that data scraping is bad would lead to really bad results. The distinction that people should be making is who has authorization to access the data. So, for example, with the recent Clubhouse data scraping, all of the information that was scraped was publicly available to anyone on Clubhouse. No secret information was revealed.
Similarly, with the LinkedIn/HiQ case, the court rightly found that HiQ should be able to scrape public data from LinkedIn. That was a good ruling because scraping public data is an important factor in how the web works.
But, now let's look at the Facebook situation. In the Power case, which I think was decided incorrectly, Facebook argued that scraping data with the permission of the end user who created that data was bad hacking that violated the law. The courts agreed, but that still seems very, very wrong. The key factor here should have been that the end user gave their authorization, and thus the data scraping should have been allowed.
But with this latest situation with the over 500 million accounts... well... now it gets more complicated. This data was available because of the sloppy way in which Facebook set up its "import contacts" feature. The people who put their own data in did not authorize it to be shared that widely (or if they technically did so through convoluted terms of service, it certainly was not with the intent that that data would then be available in a database anyone could hunt through to find their personal info). Given that -- that this data was accessed without the end user's authorization -- this feels a hell of a lot worse than what was happening with Power. Yet, in Power's case, Facebook ran to the courts to claim a "breach." But with the latest story, Facebook is trying to downplay it as not a breach.
At the very least, Facebook is being cynically hypocritical.
And that worries me. I fear that too many people will not sort through the different issues at play here, and will argue that any kind of data scraping is the equivalent of a data breach. Dangerously, that could play right into Facebook's hands, allowing it to put up even larger walls, and create more impenetrable silos, that make it that much more difficult to extract your data from Facebook and move it to alternative and competing platforms.
So, yes, Facebook is being cynical and opportunistic (and inconsistent) in arguing that this scraping situation is different than a data breach. But there is an underlying kernel of truth that shouldn't be ignored. Not all data scraping is bad. It's often quite good and important. The real issue is what data is public and what is private -- and who is given the authorization to make more private data public.
If the end result of this is for regulators to say that Facebook has to lock down more data, that would be bad, as it would continue to raise the competitive barriers for new entrants, and give Facebook even more power. That doesn't mean anyone should let Facebook off the hook for its stupidly implemented contact importer, but understanding the nuances here is important.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cfaa, competition, data breaches, interoperability, privacy, scraping
Companies: clubhouse, facebook, linkedin
Reader Comments
Subscribe: RSS
View by: Time | Thread
I was following until the part that explained what they did.. Sure the distinction between hacking and plain abuse of a system is valid and somewhat important but
no one "hacked" into Facebook,
They simply exploited a vulnerability in a facebook feature to gain access to information they were not supposed to have access to..
Even if you are splitting hairs.. Thats just the definition of hacking.
[ link to this | view in chronology ]
Re:
DUH!
That the whole of the site has ways to protect itself, EXCEPT 1 part?
Whats fun is the Original name I had, before they demaned I change it, is the one they send to.
The latest Spam, Iv been getting even my ISP cant stop, as its from about 4 different servers, and changes the Names of those sent.
They should get a HINT from the AMOUNTS sent to myself and everyone else.
[ link to this | view in chronology ]
There should not be a vulnerability where all the users private data can be scraped by anyone and dumped on the Web for anyone to use for fraud, if theft or blackmail ,
This is like a bank saying we left the safe open and the code for
The alarm is 1234
The server password is admin
If someone robs your money its not our fault
And every bank has the same password and alarm code
Saying this is not a hack is just a technicality and a lame excuse
[ link to this | view in chronology ]
repeat after me: (entity) is not an authenticator
There is a very basic security process called identification, authentication, and authorization. Confusing identifiers and authenticators is a rookie mistake, like asking for Joe Johnson's social security number to authorize a phone caller claiming to be Joe to transfer money out of his bank account.
From the description in the article, Facebook's process uses a phone number as both an identifier and an authenticator (or an identifier needing no authentication -- what?). In fact, a phone number by itself is at most potentially an identifier for an endpoint in the telephone network. So to proceed on to authorization (allowing the response to supply facts about an account) is to ignore security altogether.
I suppose this is motivated by Facebook's desire to be frictionless, a word in wide use when the company started, so the interconnections between their users can grow very rapidly and Facebook will make lots of money. But as "security" it's not even the equivalent of an open door swinging in the breeze.
Facebook cannot possibly be considered to implement any privacy concerns whatsoever in this process. Nor is the integrity of Facebook's data nor the availability of FB's systems threatened. Facebook has no security complaint to make that isn't caused by its own negligence. Their users are the only ones harmed.
No one who has a telephone could possibly believe that the robocallers plaguing them have the slightest concern about whether the recipients want to be their "friends." And yet that's the basis that this particular system is built on.
So, yes, I agree that screen scraping has too much utility to be outlawed altogether. I believe that part of allowing it has to be authorization by any parties whose data is exposed. But FB's system is so basically flawed that I can only weep at the idea of it being used as a test case.
[ link to this | view in chronology ]
Same same but different
What Power did was scrape the public web. Any upload contacts feature is importantly different because it scrapes my phone's contact list.
Additionally, the intent of that hoovering is to convertinto public data YOUR private list of phone numbers (nb. MY personally identifying information).
I can see merit in that kind of sharing not being allowed by any app - including Facebook. That PII of yours is not mine to share & certainly not mine to put online.
[ link to this | view in chronology ]
question about gradation between public and private
For the record, I suspect that any data submitted to a large organisation will become public with probability approaching unity as time increases.
My question is whether there's any meaningful "in between" between data that is private and data that is fully public if someone cares to look it up.
[ link to this | view in chronology ]
Just say "No!"
Just say "No!" to [insert name of company or online service that uses violation of privacy and / or mass surveillance as a business model here]
Mass abandonment is the solution, here. Unfortunately it is a less viable option when government is the violator. Government's private enterprise proxies, however, can and should be kicked to the curb.
[ link to this | view in chronology ]