Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites

(Mis)Uses of Technology

from the don't-scrape-me-bro dept

Fri, Jun 25th 2021 3:34pm — Tim Cushing

Clearview continues to dominate the "Most Hated" category in the facial recognition tech games. And with Amazon tossing aside its "Rekognition" program for the time being (it's spelled with a K because the AI tried to spell "recognition" correctly and failed), Clearview has opened up what could be an insurmountable lead.

Clearview has been sued, investigated, banned by law enforcement agencies, and suffered numerous self-inflicted wounds. Underneath Clearview's untried and untested AI lies an underbedding composed of the internet. The ~4 billion images in Clearview's database have been scraped from public posts and accounts hosted by thousands of websites and dozens of social media platforms.

There's nothing inherently wrong with scraping sites to make use of information hosted there. In fact, this often controversial power can sometimes be used for good. The last thing we need is Clearview's questionable tech convincing legislators, prosecutors, and courts that scraping sites is something only criminals do.

Clearview called out Google's apparent hypocrisy on the subject of site scraping when Google sent a cease-and-desist demanding it stop harvesting images and data from Google's online possessions. But Clearview is apparently unable to recognize its own hypocrisy. While it's cool with site scraping when it can benefit from it, it frowns upon others perpetrating this "harm" on its own databases.

Eerily reminiscent of Disney's take on the public domain (good when Disney uses it, bad when Disney's copyrights are set to expire) is Clearview's take on site scraping. Its user agreement [PDF] with the Evansville, Indiana police department (obtained by MuckRock user J Ader) contains this paragraph:

The use of automated systems or software to extract the whole or any part of the Service or Website, the Information or data on or within the Service or the Website, including image search results or source code, for any purposes (including uses commonly known as “scraping”) is strictly prohibited.

Pretty sure a bunch of the sites scraped by Clearview have similar clauses in their terms of use. And if Clearview doesn't believe those terms should be honored, it shouldn't expect others to give it the respect it refuses to extend to others. I don't think anyone else should necessarily be in possession of everything in Clearview's facial recognition database but I do think someone needs to scrape the shit out of it on sheer principle.

Also bundled in this package of public records is Clearview's laughable "accuracy" test. It compares itself to Rekognition and its highly publicized failure. When Amazon's tech was tested, it misidentified several DC legislators as criminals, especially those that weren't white and male.

Clearview touts its own success in this document [PDF], which covers a non-independent test of its AI performed in 2019. Here are the results:

The test compared the headshots from all three legislative bodies against Clearview’s proprietary database of 2.8 billion images (112,000 times the size of the database used by the ACLU). The Panel determined that Clearview rated 100% accurate, producing instant and accurate matches for every one of the 834 federal and state legislators in the test cohort.

LOL. This is proof of nothing. Anyone with access to a reverse image search could perform this test with the same accuracy. While Amazon's AI was tested against arrestees' mugshots, Clearview's was tested against photos and info scraped from social media profiles and public websites. Of course it was able to positively identify politicians, most of whom maintain multiple social media accounts and websites. It would only be notable if the AI had failed to perform this simple task given the wealth of information it had to work with.

In conclusion, Clearview sucks. Its tech is unproven and its policy on scraping is the apex of hypocrisy. On the other hand, the company seems to be harvesting criticism as fast as its harvesting web content, so the prognosis on its continued survival remains refreshingly bleak.

Filed Under: facial recognition, hypocrisy, scraping
Companies: clearview

9 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

That Anonymous Coward (profile), 25 Jun 2021 @ 3:48pm

steps to the side to avoid the rain of iron's from the sky
[ link to this | view in chronology ]
Anonymous Coward, 25 Jun 2021 @ 5:48pm

Clearly it's not working properly if it was only able to find 28 criminals in Congress.
[ link to this | view in chronology ]
- Anonymous Coward, 26 Jun 2021 @ 5:48pm
  
  Re:
  Obviously they miswrote the search query and got the list of all the Congressmen who haven't committed a crime.
  [ link to this | view in chronology ]
Anonymous Coward, 26 Jun 2021 @ 10:56am

It's certainly a Clearview of hypocrisy
[ link to this | view in chronology ]
Anonymous Coward, 26 Jun 2021 @ 3:12pm

Re:
i... uh... what?
[ link to this | view in chronology ]
tp (profile), 27 Jun 2021 @ 11:55pm

Clearview's scraping policy is just ok...
It's completely different matter of scraping internet, where every content item belongs to/is owned by different entity. Each owner can sue you for peanuts, and get $100 from you.

But if you scrape content from single owner, that's huge copyright infringement. That one owner can sue you for $2 million bucks.

That's basically the reason why scraping from single owner is completely forbidden but scraping from internet is semi-legal.
[ link to this | view in chronology ]
- teka, 28 Jun 2021 @ 5:13am
  
  Re: Clearview's scraping policy is just ok...
  But Clearview is not the owner.
  
  They publicly admit that they scavenge the data from across the web, in violation of all those other places' terms of service(in fact it is a main selling point of the service), but are turning around at their own gate and claiming the moral high ground, which is silly.
  [ link to this | view in chronology ]
  - tp (profile), 30 Jun 2021 @ 1:21am
    
    Re: Re: Clearview's scraping policy is just ok...
    
    But Clearview is not the owner.
    
    Of course it's the owner. Someone needs to take responsibility of all problems that their copyrighted content collection is causing, and only way to attach that responsibility to Clearview is via giving them ownership rights to the collection. If it contains someone elses work, and there's other owners involved, then Clearview needs to have operation for asking the owner's permission. But responsibility cannot be attached to Clearview unless we also give them ownership rights to whatever content they're collecting.
    [ link to this | view in chronology ]
Anonymous Coward, 28 Jun 2021 @ 3:13pm

Clearview's ACTUAL recognition rate is 0.002%. Basically worse than playing spin the bottle.

So they jury-rigged it to always "recognize" black people as criminals. because thats their belief and company aim. To put all non-whites in prison.
[ link to this | view in chronology ]