Content Moderation Case Study: Google's Photo App Tags Photos Of Black People As 'Gorillas' (2015)
from the automation-isn't-the-answer dept
Summary: In May 2015, Google rolled out its "Google Photos" service. This service allowed users to store their images in Google's cloud and share them with other users. Unlike some other services, Google's photo service provided unlimited storage for photos under a certain resolution, making it an attractive replacement for other paid services.
Unfortunately, it soon became apparent the rollout may have outpaced internal quality control. The built-in auto-tagging system utilizing Google's AI began tagging Black people as "gorillas," resulting in backlash from users and critics who believed Google's algorithm was racist.
Google's immediate response was to apologize to users. The Twitter user who first noticed the tagging error was contacted directly by Google, which began tackling the problem that made it out of beta unnoticed. Google's Yonatan Zunger pointed out the shortcomings of AI when auto-tagging photos, noting the company's previous problems with mis-tagging people (of all races) as dogs and struggles with less-than-ideal lighting or low picture resolution. In fact, Google's rollout misstep mirrored Flickr's own struggles with auto-tagging photos, which similarly resulted in Black people being labeled as "ape" or "animal."
Decisions to be made by Google:
- Would more diversity in product development/testing teams increase the chance issues like this might be caught before services go live?
- Can additional steps be taken to limit human biases from negatively affecting the auto-tag AI?
- Should more rigorous testing be performed in the future, given the known issues with algorithmic photo tagging?
- Does seemingly inconsequential moderation like this still demand some oversight by human moderators?
- Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?
Originally published at the Trust & Safety Foundation website.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: ai, algorithmic bias, content moderation, google photos, racism
Companies: google
Reader Comments
Subscribe: RSS
View by: Time | Thread
The thing I still don't get about this "mistake" is that AI/ML learns by being rewarded for successes and/or punished for failures. What determines success or failure is done by humans. In order for Google Photos AI to have learned that a black face == gorilla someone had to tag photos with "gorilla" for the software to learn from.
Where the bias/racism comes in is that whoever fed the AI photos of gorillas did not also feed it photos of black people tagged "person" so that the AI could learn the difference. Garbage in, garbage out. And a thoroughly trained AI is hard to retrain, a lot like people (ML is modeled on human learning after all).
Google never did take proper responsibility for that fuckup. I'm not sure the proper amount of shame stuck to them either. Good on you for keeping that colossal screw up alive.
[ link to this | view in chronology ]
I'm not even sure it's a "fuckup"
I have a lot of sympathy for Google here (or anybody else who tries this).
This is machine learning, where the machine learns what things are called based on already-tagged photos created by humans. The machine doesn't decide for itself what a "gorilla" is; it learns based on the tags it sees.
Using that approach and scanning the Internet for tagged photos, a search on "idiot" is going to pull up photos of politicians, because lots of people tag politicians they don't agree with as "idiots". That doesn't mean those politicians ARE idiots, it just means that some people call them that. (That is, some significant number more than a few random noise tags that show no pattern.)
Same for "gorilla". As we all know there are lots of racists online who tag photos of black people with names of various apes. The machine is ultimately going to learn that, and that association is going to show up in the machine's search results.
It's not the machine's fault, it's not Google's fault, it's the fault of the racists who post such tags.
It's not practically feasible to filter out such common, but racist, tags. The only reason this kind of machine learning works at all is because the machine can learn from (literally) millions of pre-existing examples online without manual intervention. If humans have to filter the training set to remove racist (or any other kind of biased) tags, the whole thing becomes impractical.
To the extent we're going to use this technology at all, we have to accept that biases in the training set are going to show up in the outputs.
Or just choose not to use the tech.
[ link to this | view in chronology ]
Re: I'm not even sure it's a "fuckup"
"It's not practically feasible to filter out such common, but racist, tags. "
But it is practical and feasible to sell a defective product to the unsuspecting public.
Damn the consequences, full dividends ahead.
[ link to this | view in chronology ]
You should adjust your expectations.
The "product" here is not "given a picture, tell me what is depicted". The product is "given a picture, tell me what word(s) people most commonly associate with pictures that have similar characteristics." That is a VASTLY different thing.
For instance, do you imagine that "black people" will label pictures of their family as "black people"? Of course not! They'll label them mom, dad, Uncle Henry, Aunt Jerome, Army Cadet, Space Marine Costume, Hollywood Star, or whatever. So tell me, where does the label "black people" that you are expecting, come from?
Defective? No. Sabotaged? Yes. "This is why we can't have nice things."
[ link to this | view in chronology ]
Re: You should adjust your expectations.
"Defective? No."
If I buy something and it does not meet what was advertised, I consider it to be defective and return it.
If the seller tells me it was designed that way I still want my money back no matter the excuses.
Am I being unreasonable here? I don't think so.
Possibly, the advertised functionality is not technically feasible at this time, or ever. Why pretend that it s?
[ link to this | view in chronology ]
Re: Re: You should adjust your expectations.
Ok Karen
[ link to this | view in chronology ]
Re: Re: Re: You should adjust your expectations.
I do not think this is a Karen event, it is now only a Karen that is allowed to return a defective item?
Most returns now - a - days, are online and there is no Karen shouting going on is there? idk.
[ link to this | view in chronology ]
Re: Re: Re: Re: You should adjust your expectations.
For some reason people like to use meme-y phrases any chance they get, even when they don't apply. Or perhaps especially when they don't apply.
Like a Karen complaining that something misrepresents Black people.
[ link to this | view in chronology ]
Film and digital camera industries have suffered similar problems, so the issue is unlikely to go away soon with some "AI" tweaks.
[ link to this | view in chronology ]
Re:
"Film and digital camera industries have suffered similar problems"
Like what for example?
Film cameras use AI?
[ link to this | view in chronology ]
Re: Re:
Sure they do. You may have noticed face detection and auto white balance on cameras. On the production side, using smart filters to do things like background removal means that we can get pretty good results without having to pay someone to rotoscope every frame by hand.
These sorts of things reduce the barrier to entry and mean that one person can do what once took a team months. They're invaluable tools, especially for anyone who isn't a blockbuster movie.
As an example, if the face detector doesn't recognize a skin color or face shape then it doesn't work with some actors. Can you imagine that conversation?
"We can't cast you, our tools don't work with your skin color." "Prepare to loose a major lawsuit."
[ link to this | view in chronology ]
Re: Re: Re:
Film cameras .... as in Kodachrome, Ektachrome where you load the (usually 35mm) film in low light conditions and have to operate the lever to advance ... that kind of film camera?
Or film cameras referring to any capture device used in the making of films?
[ link to this | view in chronology ]
Re: Re:
Film ---- and digital cameras. Not film cameras and digital cameras. The problem isn't AI, but a bias in how the systems were tested and tuned for accurate depiction.
In addition to what Arthur Moore said, but (color) film and the process of interpreting data from CCDs (or whatever the image sensor is in cameras these days) requires definition and tuning. Neither simply "capture raw reality". And there seems to be a trade-off with phone cameras, as they are mostly made for capturing faces, leaving many other things with wrong color entirely. They also seem to have been trained on skin tones, causing others to appear different than they really are, although that may or may not have improved in recent years. Color film had and probably still has similar issues, where the color balance works pretty good for some skin tones while washing out or overly darkening others, or leaving them with strange hues. It is entirely common, for instance, to see film photographs, especially older ones, showing black people many shades darker than their actual skin tone. And lets not even get into printing / lithography and photo "touch-up" decisions.
[ link to this | view in chronology ]
Missing paragraph?
This doesn't quite read right to me. Seems to need something between the intro paragraph and the one beginning "Google's immediate response".
[ link to this | view in chronology ]
Re: Missing paragraph?
Ah, from the original article:
[ link to this | view in chronology ]
Re: Re: Missing paragraph?
Crud. Somehow it got lost in translation. Fixed now. Thanks for pointing it out.
[ link to this | view in chronology ]
Interesting in light of this article that just days ago Google fired one of their leading AI ethics researchers, Dr. Timnit Gebru, for essentially refusing to retract on demand a research paper she co-authored and complaining about it just before taking a planned vacation.
[ link to this | view in chronology ]
Re:
She honestly comes across as a complete deva and drama queen in the article. Posting to her own clique subgroup "You are not worth having any conversations about this, since you are not someone whose humanity... is acknowledged or valued in this company," Talk about a martyr complex - even if you don't believe the claims of Google that it was her tableflipping over bureaucracy for reviewing papers they implicitly advocate there isn't any treatment cited which justifies that level of hysteronics.
[ link to this | view in chronology ]
Re: Re:
Exactly. Unfortunately, she and her supporters are trying to conflate her leaving with racism when it is not so. Google let her go because she was insubordinate and causing problems, thinking herself to be irreplaceable. She gave Google an ultimatum that if they didn't let her do what she wanted she would leave. Google said, "Bye", so she flipped out and brought the wrath of the woke internet down on Google.
[ link to this | view in chronology ]
Eliminating any problematic tags seems like the best solution for me.
"Will AI ever be able to surmount the inherent biases fed into it by those designing and training it?"
Not sure it matters when you do this sort of this at a basically infinite scale and your AI is never going to be perfect, you should assume you will eventually get every combination of mistaken matches out there. It doesn't matter if there is any inherent bias or not, the AI could have no bias, be nearly perfect and the majority of mistaken matches could be innocuous and it wouldn't help you any when you hit a bad one.
I think if you can identify which tags are going to be incredibly offensive and make you look terrible, removing them as options is a perfectly fine solution.
[ link to this | view in chronology ]
'Should we test this on non-whites?' '... nah.'
Just... how? How does something that huge make it through testing without being caught, did they test literally zero non-white faces or did they just get insanely lucky on the ones they did test?
Whatever the case gotta say, having a multi-billion dollar company not able to solve this does not exactly create confidence in similar tech offered by smaller companies and employed by cities and/or schools.
[ link to this | view in chronology ]
Re: 'Should we test this on non-whites?' '... nah.'
They are running a probability based guessing game round to infinity times. Their goal is matching correctly "often" and potentially improve that percentage over time, not to match correctly always.
They should expect every possible mismatch of tags to come up. They should be looking at each of their tags and thinking "what's the worst that could happen" because it will
[ link to this | view in chronology ]
Re: 'Should we test this on non-whites?' '... nah.'
Test what, exactly?
The only way they'd find this out before rolling it out is to search on "gorilla" and discover the result.
Testing it on non-white faces probably worked fine, same as on white faces.
Unless they anticipated the racist "gorilla" result in advance, and went looking for it.
Even for racists, that's an unlikely thing to try. All the more so for a non-racist person who never would have thought of testing for that outcome in the first place.
[ link to this | view in chronology ]
Re: Re: 'Should we test this on non-whites?' '... nah.'
Scan a selection of faces of various races in various lighting conditions and make sure the results came out as expected is the first thing that comes to mind, though I suppose they could have done so and it was just dumb luck that they didn't test any photos that came back with the wrong tag.
[ link to this | view in chronology ]
Re: Re: Re: 'Should we test this on non-whites?' '... nah.'
I suppose that it is possible that their original test set did not include images of darker-complected humans labeled as gorillas. Of course, once you expose the computer to the real world, bad things happen.
[ link to this | view in chronology ]
Re: Re: Re: 'Should we test this on non-whites?' '... nah.'
I think this case is more one of training the AI on a public corpus that was garbage, instead of a hand curated one. If you have 5 pictures tagged "black person" and 50 tagged "gorilla", which is the AI going to choose?
[ link to this | view in chronology ]
Re: 'Should we test this on non-whites?' '... nah.'
There could be tens of millions of black faces correctly tagged, and dozens or hundreds or thousands with these bad tags. If the numbers are anything like that, it would be easy to get through testing without encountering any of the problem scenarios. I don't know if that's how it went down, or if testing really was inadequate, but from what I've read there's not enough information to tell which.
[ link to this | view in chronology ]
So about this facial recognition used by Law Enforcement
Tell me how accurate they are again?
[ link to this | view in chronology ]
Way to go Quality Control department!
(assuming they were tasked with testing this thing)
The general public has for some time now been the QA dept for many a corner cutting enterprise. Get it out by Friday and let the chips fall where they may. Management may say these things but if it actually happens look out cause it's not their fault at all just because they told you to not test it.
[ link to this | view in chronology ]
90% of the problem
Most of you have seen it, but another part is Humans being able to SHOW the computer what to look for.
There are a few tricks here that I dont know if you know about Digital cameras.
All of them use IR for the Black and white texture(Not totally sure about that anymore). And some even use UV to measure distance from reflection. Taking most of these pictures and Shifting the spectrum Could make it abit easier to ID most people.
If this isnt really being done, then Tall the camera makers to do it. Even if using radar/pulse wave to range a picture Take a sound picture of the face as well as record the UV and IR signals. this should give enough detail in most cases to ID most anyone. At least of the Police ID.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
"AI" is way overused and doesn't imply what the unfamiliar think it does. A more accurate term is Machine Learning (ML).
ML works by setting up an empty "brain" (really just a database of past trials and results) and feeding it tons of pre-answered inputs. The machine tries to produce an answer and is either punished or rewarded depending on accuracy (and whatever other metrics you might wish to apply such as speed). Punishment and reward are simply scores applied to each trial.
With each successive trial the machine uses past results to influence how it tries again. With sufficient training (read: thousands and thousands of trials) the machine gets better and better at achieving a higher score with each trial.
ML's purpose is to find a pattern that can get from input to result in the fewest steps. Then you can feed it brand new inputs without predetermined solutions and let the pattern find it for you. Unfortunately the results are not as accurate as a human no matter how many inputs and how many trials we throw at the machines. There are just too many factors we use to do things like analyze photos and machines are too hard to train at that level of detail. As a result, computer facial recognition won't achieve human accuracy for a very very long time and then hopefully it will do a better job because we kinda suck at it, too.
[ link to this | view in chronology ]