Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites
from the don't-scrape-me-bro dept
Clearview continues to dominate the "Most Hated" category in the facial recognition tech games. And with Amazon tossing aside its "Rekognition" program for the time being (it's spelled with a K because the AI tried to spell "recognition" correctly and failed), Clearview has opened up what could be an insurmountable lead.
Clearview has been sued, investigated, banned by law enforcement agencies, and suffered numerous self-inflicted wounds. Underneath Clearview's untried and untested AI lies an underbedding composed of the internet. The ~4 billion images in Clearview's database have been scraped from public posts and accounts hosted by thousands of websites and dozens of social media platforms.
There's nothing inherently wrong with scraping sites to make use of information hosted there. In fact, this often controversial power can sometimes be used for good. The last thing we need is Clearview's questionable tech convincing legislators, prosecutors, and courts that scraping sites is something only criminals do.
Clearview called out Google's apparent hypocrisy on the subject of site scraping when Google sent a cease-and-desist demanding it stop harvesting images and data from Google's online possessions. But Clearview is apparently unable to recognize its own hypocrisy. While it's cool with site scraping when it can benefit from it, it frowns upon others perpetrating this "harm" on its own databases.
Eerily reminiscent of Disney's take on the public domain (good when Disney uses it, bad when Disney's copyrights are set to expire) is Clearview's take on site scraping. Its user agreement [PDF] with the Evansville, Indiana police department (obtained by MuckRock user J Ader) contains this paragraph:
The use of automated systems or software to extract the whole or any part of the Service or Website, the Information or data on or within the Service or the Website, including image search results or source code, for any purposes (including uses commonly known as “scraping”) is strictly prohibited.
Pretty sure a bunch of the sites scraped by Clearview have similar clauses in their terms of use. And if Clearview doesn't believe those terms should be honored, it shouldn't expect others to give it the respect it refuses to extend to others. I don't think anyone else should necessarily be in possession of everything in Clearview's facial recognition database but I do think someone needs to scrape the shit out of it on sheer principle.
Also bundled in this package of public records is Clearview's laughable "accuracy" test. It compares itself to Rekognition and its highly publicized failure. When Amazon's tech was tested, it misidentified several DC legislators as criminals, especially those that weren't white and male.
Clearview touts its own success in this document [PDF], which covers a non-independent test of its AI performed in 2019. Here are the results:
The test compared the headshots from all three legislative bodies against Clearview’s proprietary database of 2.8 billion images (112,000 times the size of the database used by the ACLU). The Panel determined that Clearview rated 100% accurate, producing instant and accurate matches for every one of the 834 federal and state legislators in the test cohort.
LOL. This is proof of nothing. Anyone with access to a reverse image search could perform this test with the same accuracy. While Amazon's AI was tested against arrestees' mugshots, Clearview's was tested against photos and info scraped from social media profiles and public websites. Of course it was able to positively identify politicians, most of whom maintain multiple social media accounts and websites. It would only be notable if the AI had failed to perform this simple task given the wealth of information it had to work with.
In conclusion, Clearview sucks. Its tech is unproven and its policy on scraping is the apex of hypocrisy. On the other hand, the company seems to be harvesting criticism as fast as its harvesting web content, so the prognosis on its continued survival remains refreshingly bleak.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: facial recognition, hypocrisy, scraping
Companies: clearview
Reader Comments
Subscribe: RSS
View by: Time | Thread
steps to the side to avoid the rain of iron's from the sky
[ link to this | view in chronology ]
Clearly it's not working properly if it was only able to find 28 criminals in Congress.
[ link to this | view in chronology ]
Re:
Obviously they miswrote the search query and got the list of all the Congressmen who haven't committed a crime.
[ link to this | view in chronology ]
It's certainly a Clearview of hypocrisy
[ link to this | view in chronology ]
Re:
i... uh... what?
[ link to this | view in chronology ]
Clearview's scraping policy is just ok...
It's completely different matter of scraping internet, where every content item belongs to/is owned by different entity. Each owner can sue you for peanuts, and get $100 from you.
But if you scrape content from single owner, that's huge copyright infringement. That one owner can sue you for $2 million bucks.
That's basically the reason why scraping from single owner is completely forbidden but scraping from internet is semi-legal.
[ link to this | view in chronology ]
Re: Clearview's scraping policy is just ok...
But Clearview is not the owner.
They publicly admit that they scavenge the data from across the web, in violation of all those other places' terms of service(in fact it is a main selling point of the service), but are turning around at their own gate and claiming the moral high ground, which is silly.
[ link to this | view in chronology ]
Re: Re: Clearview's scraping policy is just ok...
Of course it's the owner. Someone needs to take responsibility of all problems that their copyrighted content collection is causing, and only way to attach that responsibility to Clearview is via giving them ownership rights to the collection. If it contains someone elses work, and there's other owners involved, then Clearview needs to have operation for asking the owner's permission. But responsibility cannot be attached to Clearview unless we also give them ownership rights to whatever content they're collecting.
[ link to this | view in chronology ]
Clearview's ACTUAL recognition rate is 0.002%. Basically worse than playing spin the bottle.
So they jury-rigged it to always "recognize" black people as criminals. because thats their belief and company aim. To put all non-whites in prison.
[ link to this | view in chronology ]