Google Says Clearview's Site Scraping Is Wrong; Clearview Reminds Google It Scrapes Sites All The Time

from the twospidermans.jpg dept

Clearview's business model has resulted in some mutual finger pointing. The most infamous of facial recognition tech companies outsources its database development. Rather than seeking input from interested parties, it scrapes sites for pictures of faces and whatever personal info accompanies them. The scraped info forms the contents of its facial recognition database, putting law enforcement only a few app clicks away from accessing over 3 billion images.

The companies being scraped have claimed this is a violation of their terms of service, if not actually illegal. It's not clear that it's actually illegal, even if it does violate the restrictions placed on users of these services. Twitter has already sent a cease-and-desist to Clearview, but it will probably take a court to make this stick. Unfortunately, Clearview's actions could lead to some damaging precedent if Twitter forces the issue. Given the number of sites affected by Clearview's scraping efforts, it's probably only a matter of time before this gets litigious.

But the finger pointed by Google at Clearview hasn't obtained the reaction Google may have hoped for. As CBS News reports, Clearview has returned fire by comparing its business model to Google's business model.

Google and YouTube have sent a cease-and-desist letter to Clearview AI, a facial recognition app that scrapes images from websites and social media platforms, CBS News has learned.

[...]

[Clearview CEOP Hoan] Ton-That also argued that Clearview AI is essentially a search engine for faces. "Google can pull in information from all different websites," he said. "So if it's public and it's out there and could be inside Google search engine, it can be inside ours as well."

He's not wrong. Google's bots crawl the internet non-stop, building a database for its search engine. But there is one key difference: website owners can opt out of Google's indexing.

"Most websites want to be included in Google Search, and we give webmasters control over what information from their site is included in our search results, including the option to opt-out entirely. Clearview secretly collected image data of individuals without their consent, and in violation of rules explicitly forbidding them from doing so," [YouTube spokesperson Alex Thomas] said in the statement to CBS News.

There's no way to opt out of Clearview's "service," other than just not existing on the internet. Ton-That is correct in assuming there's very little legal exposure in scraping publicly-available images from the net, but these statements don't make him or his company any more sympathetic. Ton-That is serving up untested AI to as many law enforcement agencies as possible, encouraging them to test drive the app using faces of friends and family even as the company states the software should only be used for approved law enforcement purposes.

It also claims an accuracy rate of 99.6% for searches, but that number hasn't been rigorously tested. What appears to be happening is a mass rollout of untested AI to law enforcement agencies via demo/trial accounts. Clearview claims to be working with over 600 law enforcement agencies but very few agencies have stated publicly they've used Clearview to perform investigations.

Clearview's packaging of public information into a law enforcement app is unpleasant, but likely legal. The same thing goes on behind the scenes of multiple data aggregators that sell info and analytics directly to government agencies. The main difference here is Clearview hasn't been shy about its desire to pitch a cheap app/database to law enforcement even as its product remains unproven and untested. And it puts cops a lot closer to their dystopian dream of being able to demand identification from anyone they run into on the streets.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: ai, cease and desist, facial recognition, scraping, terms of service
Companies: clearview, clearview ai, google, twitter


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Bobvious, 6 Feb 2020 @ 12:42pm

    On a Clearview you can scrape forever

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Feb 2020 @ 12:52pm

    Fair Use?

    Thinking about it the true difference between Google and Clearview aside from ethics of helping to index the internet in a mutually beneficial way vs selling dubiously accurate facial recognition to force the burden of false positives on innocents might fall under a broad concept of Fair Use.

    Google essentially "snippets" the content for display and indexing regardless of the internal implementation. Meanwhile Clearview takes the content and adds it to their training set. They still both gather and process public data but one is more expansive in its gathering. It may not be defined the same legally but it highlights how arbitrary not only the law but the underlying concepts are. Especially when compared to internet archives who essentially discards the processing more or less.

    link to this | view in chronology ]

    • identicon
      anny nomouse, 6 Feb 2020 @ 1:11pm

      Re: Fair Use?

      Also, Google respects robots.txt files that can, and do, prohitit it from indexing sites.....that's what its for.

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 Feb 2020 @ 5:32pm

        Re: Re: Fair Use?

        Is there evidence that Clearview AI does not respect robots.txt? A brief search I did yielded no such report.

        link to this | view in chronology ]

        • identicon
          Canuck, 6 Feb 2020 @ 10:57pm

          Re: Re: Re: Fair Use?

          The fact that they haven't claimed to respect robots.txt or any other control mechanism is proof enough.

          link to this | view in chronology ]

      • icon
        Dan (profile), 7 Feb 2020 @ 10:25am

        Re: Re: Fair Use?

        That mechanism has become useless. People who don't even know such a thing exists, can now put up a website in 30 minutes with the tools offered. That is, unless the Go Daddy's, et al. turn on 'no web crawling' as the default. It's another arms race, just like when telcom offered robocalling and then call blocking became a thing as a result.

        link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Feb 2020 @ 1:04pm

    There is another significant difference, Google offer value to other sites by sending them visitors via search results. Clearview if anything subtracts value from a site, as it exposes their users to the risk of misidentification, followed by a swat raid.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 6 Feb 2020 @ 7:42pm

      Re:

      Any person, company or government SCRAPING up faces or whatever other part of the human anatomy is [({Fucking Creepy})].

      link to this | view in chronology ]

    • identicon
      Anonymous Coward, 7 Feb 2020 @ 6:02am

      Re:

      This is not accurate at all. Google scrapes information from websites which isn't wholly used for its search engine algorithms.

      Google collects, and combines other data points, and sells this information to advertisers, who then utilize the information so Google can profit because its business model is selling ads, not a search engine.

      For those who believe Google respects the "robots.txt" file, you're misinformed. Google respects the file to only omit the page from its search algorithms.

      It does not prevent is spiders from collecting information on the pages.

      link to this | view in chronology ]

    • icon
      Dan (profile), 7 Feb 2020 @ 10:30am

      Re:

      In all fairness to Clearview, SWAT raids are not their fault, nor should they be. That's another issue caused by law enforcement.

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 7 Feb 2020 @ 10:46am

        Re: Re:

        They are selling a product that is curiosity grade and upselling its reliability when others can't even reliably identify all of Congress. The overzealousness of the SWAT might be on the police morally but selling it without heavy disclaimers and processes to verify shows a callous disregard for human life.

        link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Feb 2020 @ 1:37pm

    Just because it isn't illegal, this doesn't mean you should go out and be the biggest asshole you can, just, you know, within the law.

    ...until your behavior causes people to make new (and probably very bad) laws in response.

    Thanks, asshole!

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 7 Feb 2020 @ 12:26pm

      Re:

      Isn't a hundred false positives worth one arrest and conviction?

      link to this | view in chronology ]

  • icon
    crade (profile), 6 Feb 2020 @ 1:46pm

    I don't see why Google wouldn't be happy with this reply.. Having the ability to opt out isn't just a key difference it is the difference between right or wrong.. Between collecting data with consent (at least the consent of the site hosting the data, the end user is another story) and without.

    This is not accidental.. Clearview doesn't allow opting out because their "service" is not a service but is straight exploitation and every site would just opt out if they could.

    link to this | view in chronology ]

  • icon
    That One Guy (profile), 6 Feb 2020 @ 2:46pm

    'But wait, there's more!'

    And it puts cops a lot closer to their dystopian dream of being able to demand identification from anyone they run into on the streets.

    It's actually much worse, as facial recognition tech(assuming it even worked) means they don't need to demand identification. All they need to do is run your face through whatever database they're using and you've unwittingly provided your identification, with no chance to refuse.

    It's somewhat similar to the encryption debacle, being the difference between them having to get the key/password from someone who can protest and potentially fight back when it actually matters, versus them already having a key/password and the only ability to object being after the fact when it's already too late.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Feb 2020 @ 5:33pm

    Clearview [secretly] collected image data of individuals without their consent, and in violation of rules explicitly forbidding them from doing so,"

    I grant the "secretly". For the rest... [citation]

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 Feb 2020 @ 8:30pm

    Look at Masnick, shilling for Google again. /sarc

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 7 Feb 2020 @ 9:45am

    He's not wrong. Google's bots crawl the internet non-stop, building a database for its search engine. But there is one key difference: website owners can opt out of Google's indexing.

    Why would that be a "key" difference?

    1) Opt-out systems are bullshit in general.
    2) If Clearview did obey robots.txt, users could only block them from getting pictures off of sites controlled by those users. They'd have no control of Clearview's access to sites hosted by Google, Twitter, etc.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 8 Feb 2020 @ 1:06pm

      Re:

      1) True, but opt-out is still much better than no option at all.

      2) While it's true that the users couldn't directly control whether Google or Twitter allowed the access, they could note what those companies' policies are, and see that the access is not allowed.

      link to this | view in chronology ]

  • icon
    Dan (profile), 7 Feb 2020 @ 10:47am

    Shooting the wrong horse...

    I find it amusing that everyone is blaming Clearview. (Not that I'm condoning their actions. Far from it...) The underlying issue is who is getting the data...law enforcement. "Well gee, if the cops don't have this information, they can't burst into homes with a battering ram, and possibly kill people."

    In the land of sanity, people would be arguing that law enforcement shouldn't be doing that to start with. Then again, is way easier to go after one company then go after a much bigger organization that is not only their customer, but is also causing the actual problem.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.