Court Says CFAA Isn't Meant To Prevent Access To Public Data, Orders LinkedIn To Drop Anti-Scraper Efforts
from the perverting-a-bad-law dept
Some good pushback against the CFAA (Computer Fraud and Abuse Act) has been handed down by a federal court. LinkedIn, which has frequently sued scrapers under both the CFAA and DMCA, just lost an important preliminary round to a company whose entire business model relies on LinkedIn's publicly-available data.
hiQ Labs scrapes LinkedIn data from users whose accounts are public, repackages it and sells it to third party recruiters and HR departments, allowing companies to track employee skills and get a read on which employees might be planning to jump ship.
LinkedIn didn't care much for another business piggybacking on its data (and likely cutting back ever so slightly on the number of third parties it sells this data to), so it sued hiQ, alleging the scraping of publicly-available data violated the CFAA. This has completely backfired. hiQ has obtained an injunction preventing LinkedIn from blocking its scraping efforts. [h/t Brad Heath]
In short, the court finds the hardships are all on hiQ's side: if LinkedIn blocks the scraping, the company will likely close. The decision [PDF], importantly, notes this isn't what the CFAA was put in place to guard against. It also adds that if it sided with LinkedIn's arguments, the internet itself would suffer.
In summary, the balance of hardships tips sharply in hiQ's favor. hiQ has demonstrated there are serious questions on the merits. In particular, the Court is doubtful that the Computer Fraud and Abuse Act may be invoked by LinkedIn to punish hiQ for accessing publicly available data; the broad interpretation of the CFAA advocated by LinkedIn, if adopted, could profoundly impact open access to the Internet, a result that Congress could not have intended when it enacted the CFAA over three decades ago.
And there's more bad news for LinkedIn:
Furthermore, hiQ has raised serious questions as to whether LinkedIn, in blocking hiQ's access to public data, possibly as a means of limiting competition, violates state law.
LinkedIn tried to argue continued access by hiQ would threaten its own business, mainly through supposed violations of its customers' privacy. It notes many of its users (50 million to be exact) have deployed LinkedIn's "Do Not Broadcast" option, which limits notifications about changes to accounts. Out of the 50 million users, LinkedIn claims three have alleged harm from third-party data collection. LinkedIn says hiQ's scraped determinations about poachable employees could harm users whose accounts remain public, but are utilizing the "Do Not Broadcast" feature.
The court is not entirely unsympathetic to LinkedIn's arguments. But it is mostly unsympathetic, partially because LinkedIn appears to be vastly overstating the privacy concerns of its users...
These considerations are not without merit, but there are a number of reasons to discount to some extent the harm claimed by LinkedIn. First, LinkedIn emphasizes that the fact that 50 million users have opted into the "Do Not Broadcast" feature indicates that a vast number of its users are fearful that their employer may monitor their accounts for possible changes. But there are other potential reasons why a user may opt for that setting. For instance, users may be cognizant that their profile changes are generating a large volume of unwanted notifications broadcasted to their connections on the site. They may wish to limit annoying intrusions into their contacts.
Second, LinkedIn has presented little evidence of users' actual privacy expectation; out of its hundreds of millions of users, including 50 million using Do Not Broadcast, LinkedIn has only identified three individual complaints specifically raising concerns about data privacy related to third-party data collection. Docket No. 49-1 Exs. A-C. None actually discuss hiQ or the "Do Not Broadcast" setting.
...and partially because LinkedIn doesn't appear to care all that much about its users' privacy.
Third, LinkedIn's professed privacy concerns are somewhat undermined by the fact that LinkedIn allows other third-parties to access user data without its members' knowledge or consent. LinkedIn offers a product called "Recruiter" that allows professional recruiters to identify possible candidates for other job opportunities. LinkedIn avers that when users have selected the Do Not Broadcast option, the Recruiter product respects this choice and does not update recruiters of profile changes. However, hiQ presented marketing materials at the hearing which indicate that regardless of other privacy settings, information including profile changes are conveyed to third parties who subscribe to Recruiter. Indeed, these materials inform potential customers that when they "follow" another user, "[f]rom now on, when they update their profile or celebrate a work anniversary, you'll receive an update on your homepage. And don't worry – they don't know you're following them." LinkedIn thus trumpets its own product in a way that seems to afford little deference to the very privacy concerns it professes to be protecting in this case.
As for the alleged CFAA violations, the court find nothing that agrees with LinkedIn's legal theory public information anyone can access somehow turns into unauthorized access when a company accesses it via a scraper.
A user does not "access" a computer "without authorization" by using bots, even in the face of technical countermeasures, when the data it accesses is otherwise open to the public.
But it goes further, laying down in explicit detail how ruling in LinkedIn's favor would severely damage open access on the internet.
Under LinkedIn's interpretation of the CFAA, a website would be free to revoke "authorization" with respect to any person, at any time, for any reason, and invoke the CFAA for enforcement, potentially subjecting an Internet user to criminal, as well as civil, liability. Indeed, because the Ninth Circuit has specifically rejected the argument that "the CFAA only criminalizes access where the party circumvents a technological access barrier," Nosal II, 844 F.3d at 1038, merely viewing a website in contravention of a unilateral directive from a private entity would be a crime, effectuating the digital equivalence of Medusa. The potential for such exercise of power over access to publicly viewable information by a private entity weaponized by the potential of criminal sanctions is deeply concerning...
[T]he CFAA as interpreted by LinkedIn would not leave any room for the consideration of either a website owner's reasons for denying authorization or an individual's possible justification for ignoring such a denial. Website owners could, for example, block access by individuals or groups on the basis of race or gender discrimination. Political campaigns could block selected news media, or supporters of rival candidates, from accessing their websites. Companies could prevent competitors or consumer groups from visiting their websites to learn about their products or analyze pricing. Further, in addition to criminalizing any attempt to obtain access to information otherwise viewable by the public at large, the CFAA would preempt all state and local laws that might otherwise afford a legal right of access (e.g., state law rights asserted by hiQ herein). A broad reading of the CFAA could stifle the dynamic evolution and incremental development of state and local laws addressing the delicate balance between open access to information and privacy – all in the name of a federal statute enacted in 1984 before the advent of the World Wide Web.
The case will still proceed forward, but the outlook isn't that bright for LinkedIn. It has been ordered to drop any anti-circumvention efforts it put in place within 24 hours and rescind the cease-and-desist orders it sent to hiQ. On top of there being zero chance it will prevail on its CFAA claims, the company will now have to defend itself against state law counterclaims by hiQ. This legal effort -- probably deployed in hopes of achieving a quick settlement -- is going to add up to real dollars in legal fees alone.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cfaa, public data, scraping
Companies: hiq, linkedin
Reader Comments
Subscribe: RSS
View by: Time | Thread
Yeah, no surprises here
Just, again, proves the adage that with "free" services, you are the product.
[ link to this | view in chronology ]
Re: Yeah, no surprises here
This is the problem. LinkedIn have very easy tools to stop hiQ from being able to access information and ways to permanently ban them from legally accessing data. They just don't want to give up the extra traffic and other benefits that come form public accessibility. Hopefully the courts will find what I think is the correct outcome - LinkedIn are told to choose between public data and control. They can't have both.
[ link to this | view in chronology ]
There's a word for that kind of "following:" we call it stalking, and in many other contexts, it's illegal.
[ link to this | view in chronology ]
Headline overstates as usual: it's JUST an injunction.
Now, I'm no network engineer, demonologist, gastropod, or mathemagician, but let's do some ball park figgers:
50000 pages * 200000 bytes each (probably optimistic) = 10,000,000,000.
I've actually tested and looks like can get pages in 3 seconds, so:
(50000 * 3) / 3600 = 41.667 hours per loop, or 4 complete scrapes / week.
Then 10G * 4 per week * 4 weeks = 160G / month.
Calcs are just for article pages, doesn't include monitoring each account, which can run in parallel. And of course there'll be focus on newest pages, so add MANY as possible of those TOO. -- Are you okay with paying for a little extra bandwidth? Bytes and speed may be much higher in practice: I'll have to find how many requests can go in parallel. With your well-known insouciance for cost of bandwidth, I'll just take silence as yes and begin scraping tomorrow, or even tonight, it's a trivial "script" to write. Thanks.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
[ link to this | view in chronology ]
Re: Headline overstates as usual: it's JUST an injunction.
Really?
"Are you okay with paying for a little extra bandwidth?"
I can't speak for Techdirt, but to speak for myself if your idiotic overblown scenario is to happen:
Yes, if the benefits of having the data publicly accessible outweigh the risk of these costs. If those costs become too burdensome, I will take steps to stop you from accessing the data. I won't be running to the courts whining that the public are accessing the things I put in public.
"it's a trivial "script" to write"
Lol go ahead. If your coding is anything like your English and maths skills, this site will be perfectly safe for the time being.
[ link to this | view in chronology ]
Why an Injunction?
[ link to this | view in chronology ]
Re: Why an Injunction?
No, that's the point of .htaccess files. The point of ReCaptcha is to prevent bots from using forms.
[ link to this | view in chronology ]
Re: Re: Why an Injunction?
[ link to this | view in chronology ]
Re: Re: Re: Why an Injunction?
I don't think robots.txt has any legal weight to it. Anyone wanting to do scraping could just ignore it.
[ link to this | view in chronology ]
Re: Re: Re: Re: Why an Injunction?
I'd have said "maybe CFAA", but per this ruling, nope.
Which is a good result. There are a lot of reasons why robots.txt shouldn't be legally enforceable.
[ link to this | view in chronology ]
Re: Why an Injunction?
Using CFAA to prevent scraping seems extreme, however I’m confused about the injunction. Why can’t LinkedIn use technical measures to block scraping? Plenty of sites prevent bots from access. That is the whole point of reCaptcha.
Yeah, that's the part that confuses me about this as well. I think HiQ should be able to scrape without legal concern and I think Linkedin should be free to try to block with technical measures, and HiQ should be free to adjust and respond. But... I'm not sure about a law demanding that Linkedin let someone scrape.
[ link to this | view in chronology ]
Re: Re: Why an Injunction?
[ link to this | view in chronology ]
Re: Why an Injunction?
They can, but won't. I believe the whole point is that they don't just want to stop hiQ in this specific instance, they want a legal precedent to get any competitor using their data shut down, including those that haven't based their entire business on scraping like hiQ seem to have done. They want to claim complete control over everything they have published, including that clearly in the public realm.
In order to do this, they have to pretend the damage is a great as possible, which means not utilising any technical barrier available to them.
[ link to this | view in chronology ]
Re: Why an Injunction?
The judge both raised that exact point and deferred on it when making this ruling.
Stated that as long as it isn't password protected it is public. Might still rule in favor of LinkedIn for technical measures like IP blocking.
[ link to this | view in chronology ]
Re: Why an Injunction?
[ link to this | view in chronology ]
I had problems with my site where members from other sites would register on my site and try to get my community to ditch my site for theirs. Not only did I ban their accounts but I also banned their IP addresses, email addresses and blocked their ability to access my site. I'm able to block them from not just my site's forum community software but also through my site's administration tools.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
There is a similar ruling in favor in a case brought by Facebook. Main difference was that that was scraping password protected content. So on appeal this still has a chance of being reversed.
[ link to this | view in chronology ]
CFAA and Aaron Swartz
[ link to this | view in chronology ]
Got what they wanted... sort of
They'll probably get their quick settlement, but at this point, it'll probably be the opposite way they thought in the beginning.
[ link to this | view in chronology ]