Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Nancy Pelosi Sells Out The Public: Agrees To Put Massive Copyright Reform In 'Must Pass' Spending Bill

Reform The DMCA? OK, But Only If It's Done Really, Really Carefully

Content Moderation Case Study: Facebook Attracts International Attention When It Removes A Historic Vietnam War Photo Posted By The Editor-in-Chief Of Norway's Biggest Newspaper (2016)

Content Moderation Case Study: Scammers Targeting Scrabble Chat (2020)

Content Moderation Case Study: Google's Photo App Tags Photos Of Black People As 'Gorillas' (2015)

Content Moderation

from the automation-isn't-the-answer dept

Fri, Dec 4th 2020 3:34pm — Copia Institute

Summary: In May 2015, Google rolled out its "Google Photos" service. This service allowed users to store their images in Google's cloud and share them with other users. Unlike some other services, Google's photo service provided unlimited storage for photos under a certain resolution, making it an attractive replacement for other paid services.

Unfortunately, it soon became apparent the rollout may have outpaced internal quality control. The built-in auto-tagging system utilizing Google's AI began tagging Black people as "gorillas," resulting in backlash from users and critics who believed Google's algorithm was racist.

Google's immediate response was to apologize to users. The Twitter user who first noticed the tagging error was contacted directly by Google, which began tackling the problem that made it out of beta unnoticed. Google's Yonatan Zunger pointed out the shortcomings of AI when auto-tagging photos, noting the company's previous problems with mis-tagging people (of all races) as dogs and struggles with less-than-ideal lighting or low picture resolution. In fact, Google's rollout misstep mirrored Flickr's own struggles with auto-tagging photos, which similarly resulted in Black people being labeled as "ape" or "animal."

Decisions to be made by Google:

Would more diversity in product development/testing teams increase the chance issues like this might be caught before services go live?
Can additional steps be taken to limit human biases from negatively affecting the auto-tag AI?
Should more rigorous testing be performed in the future, given the known issues with algorithmic photo tagging?

Questions and policy implications to consider:

Does seemingly inconsequential moderation like this still demand some oversight by human moderators?
Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?

Resolution: As of 2018, Google was still unable to completely eliminate this problem. Instead, it chose to eliminate the problematic tags themselves, resulting in no auto-tags for terms like "gorilla," "chimp," "chimpanzee," and "monkey." An investigation by Wired showed searches of Google Photos images returned zero results for these terms. Google said it was working on "longer-term fixes" but put no end date on when those fixes would arrive. It also acknowledged those terms had been blocked by Google and would remain blocked until the problem was solved.

Originally published at the Trust & Safety Foundation website.

Filed Under: ai, algorithmic bias, content moderation, google photos, racism
Companies: google

32 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 4 Dec 2020 @ 4:27pm

The thing I still don't get about this "mistake" is that AI/ML learns by being rewarded for successes and/or punished for failures. What determines success or failure is done by humans. In order for Google Photos AI to have learned that a black face == gorilla someone had to tag photos with "gorilla" for the software to learn from.

Where the bias/racism comes in is that whoever fed the AI photos of gorillas did not also feed it photos of black people tagged "person" so that the AI could learn the difference. Garbage in, garbage out. And a thoroughly trained AI is hard to retrain, a lot like people (ML is modeled on human learning after all).

Google never did take proper responsibility for that fuckup. I'm not sure the proper amount of shame stuck to them either. Good on you for keeping that colossal screw up alive.
[ link to this | view in chronology ]
- OldMugwump (profile), 4 Dec 2020 @ 8:42pm
  
  I'm not even sure it's a "fuckup"
  I have a lot of sympathy for Google here (or anybody else who tries this).
  
  This is machine learning, where the machine learns what things are called based on already-tagged photos created by humans. The machine doesn't decide for itself what a "gorilla" is; it learns based on the tags it sees.
  
  Using that approach and scanning the Internet for tagged photos, a search on "idiot" is going to pull up photos of politicians, because lots of people tag politicians they don't agree with as "idiots". That doesn't mean those politicians ARE idiots, it just means that some people call them that. (That is, some significant number more than a few random noise tags that show no pattern.)
  
  Same for "gorilla". As we all know there are lots of racists online who tag photos of black people with names of various apes. The machine is ultimately going to learn that, and that association is going to show up in the machine's search results.
  
  It's not the machine's fault, it's not Google's fault, it's the fault of the racists who post such tags.
  
  It's not practically feasible to filter out such common, but racist, tags. The only reason this kind of machine learning works at all is because the machine can learn from (literally) millions of pre-existing examples online without manual intervention. If humans have to filter the training set to remove racist (or any other kind of biased) tags, the whole thing becomes impractical.
  
  To the extent we're going to use this technology at all, we have to accept that biases in the training set are going to show up in the outputs.
  
  Or just choose not to use the tech.
  [ link to this | view in chronology ]
  - Anonymous Coward, 5 Dec 2020 @ 5:06am
    
    Re: I'm not even sure it's a "fuckup"
    "It's not practically feasible to filter out such common, but racist, tags. "
    
    But it is practical and feasible to sell a defective product to the unsuspecting public.
    
    Damn the consequences, full dividends ahead.
    [ link to this | view in chronology ]
    - Anonymous Coward, 5 Dec 2020 @ 8:45am
      
      You should adjust your expectations.
      
      > "It's not practically feasible to filter out such common, but racist, tags. "
      
      But it is practical and feasible to sell a defective product to the unsuspecting public.
      
      The "product" here is not "given a picture, tell me what is depicted". The product is "given a picture, tell me what word(s) people most commonly associate with pictures that have similar characteristics." That is a VASTLY different thing.
      
      For instance, do you imagine that "black people" will label pictures of their family as "black people"? Of course not! They'll label them mom, dad, Uncle Henry, Aunt Jerome, Army Cadet, Space Marine Costume, Hollywood Star, or whatever. So tell me, where does the label "black people" that you are expecting, come from?
      
      Defective? No. Sabotaged? Yes. "This is why we can't have nice things."
      [ link to this | view in chronology ]
      - Anonymous Coward, 5 Dec 2020 @ 9:53am
        
        Re: You should adjust your expectations.
        "Defective? No."
        
        If I buy something and it does not meet what was advertised, I consider it to be defective and return it.
        If the seller tells me it was designed that way I still want my money back no matter the excuses.
        Am I being unreasonable here? I don't think so.
        
        Possibly, the advertised functionality is not technically feasible at this time, or ever. Why pretend that it s?
        [ link to this | view in chronology ]
        
        Anonymous Coward, 7 Dec 2020 @ 8:25am
        
        Re: Re: You should adjust your expectations.
        Ok Karen
        [ link to this | view in chronology ]
        
        Anonymous Coward, 7 Dec 2020 @ 1:11pm
        
        Re: Re: Re: You should adjust your expectations.
        I do not think this is a Karen event, it is now only a Karen that is allowed to return a defective item?
        
        Most returns now - a - days, are online and there is no Karen shouting going on is there? idk.
        [ link to this | view in chronology ]
        
        Anonymous Coward, 7 Dec 2020 @ 6:23pm
        
        Re: Re: Re: Re: You should adjust your expectations.
        For some reason people like to use meme-y phrases any chance they get, even when they don't apply. Or perhaps especially when they don't apply.
        
        Like a Karen complaining that something misrepresents Black people.
        [ link to this | view in chronology ]
Anonymous Coward, 4 Dec 2020 @ 4:30pm

Film and digital camera industries have suffered similar problems, so the issue is unlikely to go away soon with some "AI" tweaks.
[ link to this | view in chronology ]
- Anonymous Coward, 5 Dec 2020 @ 7:14am
  
  Re:
  "Film and digital camera industries have suffered similar problems"
  
  Like what for example?
  Film cameras use AI?
  [ link to this | view in chronology ]
  - Arthur Moore (profile), 5 Dec 2020 @ 10:17am
    
    Re: Re:
    Sure they do. You may have noticed face detection and auto white balance on cameras. On the production side, using smart filters to do things like background removal means that we can get pretty good results without having to pay someone to rotoscope every frame by hand.
    
    These sorts of things reduce the barrier to entry and mean that one person can do what once took a team months. They're invaluable tools, especially for anyone who isn't a blockbuster movie.
    
    As an example, if the face detector doesn't recognize a skin color or face shape then it doesn't work with some actors. Can you imagine that conversation?
    
    "We can't cast you, our tools don't work with your skin color." "Prepare to loose a major lawsuit."
    [ link to this | view in chronology ]
    - Anonymous Coward, 5 Dec 2020 @ 12:21pm
      
      Re: Re: Re:
      Film cameras .... as in Kodachrome, Ektachrome where you load the (usually 35mm) film in low light conditions and have to operate the lever to advance ... that kind of film camera?
      
      Or film cameras referring to any capture device used in the making of films?
      [ link to this | view in chronology ]
  - Anonymous Coward, 5 Dec 2020 @ 4:08pm
    
    Re: Re:
    Film ---- and digital cameras. Not film cameras and digital cameras. The problem isn't AI, but a bias in how the systems were tested and tuned for accurate depiction.
    
    In addition to what Arthur Moore said, but (color) film and the process of interpreting data from CCDs (or whatever the image sensor is in cameras these days) requires definition and tuning. Neither simply "capture raw reality". And there seems to be a trade-off with phone cameras, as they are mostly made for capturing faces, leaving many other things with wrong color entirely. They also seem to have been trained on skin tones, causing others to appear different than they really are, although that may or may not have improved in recent years. Color film had and probably still has similar issues, where the color balance works pretty good for some skin tones while washing out or overly darkening others, or leaving them with strange hues. It is entirely common, for instance, to see film photographs, especially older ones, showing black people many shades darker than their actual skin tone. And lets not even get into printing / lithography and photo "touch-up" decisions.
    [ link to this | view in chronology ]
smbryant (profile), 4 Dec 2020 @ 4:40pm

Missing paragraph?
This doesn't quite read right to me. Seems to need something between the intro paragraph and the one beginning "Google's immediate response".
[ link to this | view in chronology ]
- smbryant (profile), 4 Dec 2020 @ 8:58pm
  
  Re: Missing paragraph?
  Ah, from the original article:
  
  Unfortunately, it soon became apparent the rollout may have outpaced internal quality control. The built-in auto-tagging system utilizing Google's AI began tagging Black people as "gorillas," resulting in backlash from users and critics who believed Google's algorithm was racist.
  
  [ link to this | view in chronology ]
  - Mike Masnick (profile), 5 Dec 2020 @ 12:52am
    
    Re: Re: Missing paragraph?
    Crud. Somehow it got lost in translation. Fixed now. Thanks for pointing it out.
    [ link to this | view in chronology ]
Zonker, 4 Dec 2020 @ 5:14pm

Interesting in light of this article that just days ago Google fired one of their leading AI ethics researchers, Dr. Timnit Gebru, for essentially refusing to retract on demand a research paper she co-authored and complaining about it just before taking a planned vacation.

Dr Gebru is a well-respected researcher in the field of ethics and the use of artificial intelligence.

She is well-known for her work on racial bias in technology such as facial recognition, and has criticised systems that fail to recognise black faces.

[ link to this | view in chronology ]
- Anonymous Coward, 4 Dec 2020 @ 11:48pm
  
  Re:
  She honestly comes across as a complete deva and drama queen in the article. Posting to her own clique subgroup "You are not worth having any conversations about this, since you are not someone whose humanity... is acknowledged or valued in this company," Talk about a martyr complex - even if you don't believe the claims of Google that it was her tableflipping over bureaucracy for reviewing papers they implicitly advocate there isn't any treatment cited which justifies that level of hysteronics.
  [ link to this | view in chronology ]
  - Ed (profile), 5 Dec 2020 @ 6:45am
    
    Re: Re:
    Exactly. Unfortunately, she and her supporters are trying to conflate her leaving with racism when it is not so. Google let her go because she was insubordinate and causing problems, thinking herself to be irreplaceable. She gave Google an ultimatum that if they didn't let her do what she wanted she would leave. Google said, "Bye", so she flipped out and brought the wrath of the woke internet down on Google.
    [ link to this | view in chronology ]
crade (profile), 4 Dec 2020 @ 5:18pm

Eliminating any problematic tags seems like the best solution for me.

"Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?"

Not sure it matters when you do this sort of this at a basically infinite scale and your AI is never going to be perfect, you should assume you will eventually get every combination of mistaken matches out there. It doesn't matter if there is any inherent bias or not, the AI could have no bias, be nearly perfect and the majority of mistaken matches could be innocuous and it wouldn't help you any when you hit a bad one.
I think if you can identify which tags are going to be incredibly offensive and make you look terrible, removing them as options is a perfectly fine solution.
[ link to this | view in chronology ]
That One Guy (profile), 4 Dec 2020 @ 6:43pm

'Should we test this on non-whites?' '... nah.'
Just... how? How does something that huge make it through testing without being caught, did they test literally zero non-white faces or did they just get insanely lucky on the ones they did test?

Whatever the case gotta say, having a multi-billion dollar company not able to solve this does not exactly create confidence in similar tech offered by smaller companies and employed by cities and/or schools.
[ link to this | view in chronology ]
- crade (profile), 4 Dec 2020 @ 8:04pm
  
  Re: 'Should we test this on non-whites?' '... nah.'
  They are running a probability based guessing game round to infinity times. Their goal is matching correctly "often" and potentially improve that percentage over time, not to match correctly always.
  
  They should expect every possible mismatch of tags to come up. They should be looking at each of their tags and thinking "what's the worst that could happen" because it will
  [ link to this | view in chronology ]
- OldMugwump (profile), 4 Dec 2020 @ 8:46pm
  
  Re: 'Should we test this on non-whites?' '... nah.'
  Test what, exactly?
  
  The only way they'd find this out before rolling it out is to search on "gorilla" and discover the result.
  
  Testing it on non-white faces probably worked fine, same as on white faces.
  
  Unless they anticipated the racist "gorilla" result in advance, and went looking for it.
  
  Even for racists, that's an unlikely thing to try. All the more so for a non-racist person who never would have thought of testing for that outcome in the first place.
  [ link to this | view in chronology ]
  - That One Guy (profile), 4 Dec 2020 @ 9:26pm
    
    Re: Re: 'Should we test this on non-whites?' '... nah.'
    Scan a selection of faces of various races in various lighting conditions and make sure the results came out as expected is the first thing that comes to mind, though I suppose they could have done so and it was just dumb luck that they didn't test any photos that came back with the wrong tag.
    [ link to this | view in chronology ]
    - Tanner Andrews (profile), 5 Dec 2020 @ 12:07am
      
      Re: Re: Re: 'Should we test this on non-whites?' '... nah.'
      I suppose that it is possible that their original test set did not include images of darker-complected humans labeled as gorillas. Of course, once you expose the computer to the real world, bad things happen.
      [ link to this | view in chronology ]
    - Anonymous Coward, 5 Dec 2020 @ 9:14am
      
      Re: Re: Re: 'Should we test this on non-whites?' '... nah.'
      
      Scan a selection of faces of various races in various lighting conditions and make sure the results came out as expected...
      
      I think this case is more one of training the AI on a public corpus that was garbage, instead of a hand curated one. If you have 5 pictures tagged "black person" and 50 tagged "gorilla", which is the AI going to choose?
      [ link to this | view in chronology ]
- nasch (profile), 8 Dec 2020 @ 7:04pm
  
  Re: 'Should we test this on non-whites?' '... nah.'
  There could be tens of millions of black faces correctly tagged, and dozens or hundreds or thousands with these bad tags. If the numbers are anything like that, it would be easy to get through testing without encountering any of the problem scenarios. I don't know if that's how it went down, or if testing really was inadequate, but from what I've read there's not enough information to tell which.
  [ link to this | view in chronology ]
Peter, 5 Dec 2020 @ 3:48am

So about this facial recognition used by Law Enforcement
Tell me how accurate they are again?
[ link to this | view in chronology ]
Anonymous Coward, 5 Dec 2020 @ 4:53am

Way to go Quality Control department!
(assuming they were tasked with testing this thing)

The general public has for some time now been the QA dept for many a corner cutting enterprise. Get it out by Friday and let the chips fall where they may. Management may say these things but if it actually happens look out cause it's not their fault at all just because they told you to not test it.
[ link to this | view in chronology ]
ECA (profile), 5 Dec 2020 @ 12:26pm

90% of the problem
Most of you have seen it, but another part is Humans being able to SHOW the computer what to look for.
There are a few tricks here that I dont know if you know about Digital cameras.
All of them use IR for the Black and white texture(Not totally sure about that anymore). And some even use UV to measure distance from reflection. Taking most of these pictures and Shifting the spectrum Could make it abit easier to ID most people.
If this isnt really being done, then Tall the camera makers to do it. Even if using radar/pulse wave to range a picture Take a sound picture of the face as well as record the UV and IR signals. this should give enough detail in most cases to ID most anyone. At least of the Police ID.
[ link to this | view in chronology ]
DavePR, 6 Dec 2020 @ 11:04am

Is it the AI or is it the interpretation? Or the reaction?
[ link to this | view in chronology ]
Anonymous Coward, 6 Dec 2020 @ 5:08pm

"AI" is way overused and doesn't imply what the unfamiliar think it does. A more accurate term is Machine Learning (ML).

ML works by setting up an empty "brain" (really just a database of past trials and results) and feeding it tons of pre-answered inputs. The machine tries to produce an answer and is either punished or rewarded depending on accuracy (and whatever other metrics you might wish to apply such as speed). Punishment and reward are simply scores applied to each trial.

With each successive trial the machine uses past results to influence how it tries again. With sufficient training (read: thousands and thousands of trials) the machine gets better and better at achieving a higher score with each trial.

ML's purpose is to find a pattern that can get from input to result in the fewest steps. Then you can feed it brand new inputs without predetermined solutions and let the pattern find it for you. Unfortunately the results are not as accurate as a human no matter how many inputs and how many trials we throw at the machines. There are just too many factors we use to do things like analyze photos and machines are too hard to train at that level of detail. As a result, computer facial recognition won't achieve human accuracy for a very very long time and then hopefully it will do a better job because we kinda suck at it, too.
[ link to this | view in chronology ]