No, Tech Companies Can't Easily Create A 'ContentID' For Harassment, And It Would Be A Disaster If They Did
from the not-how-it-works dept
Every so often we see a sort of "tech magical thinking" when it comes to solving big challenges -- in which people insist that "those smart people in Silicon Valley can fix [big social problem X] if they just decided to do so." This sort of thinking is wrong on multiple levels, and often is based on the false suggestion that tech innovators "don't care enough" about certain problems, rather than recognizing that, perhaps, there aren't any easy solutions. A perfect example of this is a recent column from Jessica Valenti, over at the Guardian, that claims that tech companies could "end online harassment" and that they could do it "tomorrow" if they just had the will to do so. How? Well, Valenti claims, by just making a "ContentID for harassment."If Twitter, Facebook or Google wanted to stop their users from receiving online harassment, they could do it tomorrow.See? Just like that. Snap your fingers and boom, harassment goes away. Except, no, it doesn't. Sarah Jeong has put together a fantastic response to Valenti's magical tech thinking, pointing out that ContentID doesn't work well and that harassment is different anyway. As she notes, the only reason ContentID "works" at all (and we use the term "works" loosely) is because it's a pure fingerprinting algorithm, matching content against a database of claimed copyright-covered material. That's very different than sorting out "harassment" which involves a series of subjective determinations.
When money is on the line, internet companies somehow magically find ways to remove content and block repeat offenders. For instance, YouTube already runs a sophisticated Content ID program dedicated to scanning uploaded videos for copyrighted material and taking them down quickly – just try to bootleg music videos or watch unofficial versions of Daily Show clips and see how quickly they get taken down. But a look at the comments under any video and it’s clear there’s no real screening system for even the most abusive language.
If these companies are so willing to protect intellectual property, why not protect the people using your services?
Furthermore, Jeong goes into great detail about how ContentID isn't even particularly good on the copyright front, as we've highlighted for years. It creates both Type I and Type II errors: pulling down plenty of content that isn't infringing, and still letting through plenty of content that is. Add in an even more difficult task of determining "harassment" which is much less identifiable than probable copyright infringement, and you would undoubtedly increase both types of errors to a hilarious degree -- likely shutting down many perfectly legitimate conversations, while doing little to stop actual harassment.
None of this is to suggest that harassment online isn't a serious problem. It is. And it's also possible that some enterprising folks may figure out some interesting, unique and compelling ways of dealing with it, sometimes via technological assistance. But this sort of "magic bullet" thinking is as dangerous as it is ridiculous -- because it often leads to reframing the debate, sometimes to the point of shifting the actual liability of the issue from those actually responsible (whether copyright infringers or harassers) to intermediaries who are providing a platform for communication.The more aggressive the tool, the greater the chance it will filter out communications that aren’t harassing — particularly, communications one wishes to receive. You can see this in the false positives flagged by systems like Content ID. For example, there’s the time that Content ID took down a video with birds chirping in the background, because it matched an avant-garde song that also had some birds chirping in the background. Or the time NASA’s official clips of a Mars landing got taken down by a news agency. Or the time a livestream was cut off because people began singing "Happy Birthday." Or when a live airing on UStream of the Hugo Awards was interrupted mid-broadcast as the awards ceremony aired clips from Doctor Who and other shows nominated for Hugo Awards.
In the latter case, UStream used something similar but not quite the same as Content ID—one in which blind algorithms automatically censored copyrighted content without the more sophisticated appeals process that YouTube has in place. Robots are not smart; they cannot sense context and meaning. Yet YouTube’s appeals system wouldn’t translate well to anti-harassment tools. What good is a system where you must report each and every instance of harassment and then follow through in a back-and-forth appeals system?
The idea that tech companies "don't care enough" about harassment (or, for that matter, infringement) to do the "simple things" to stop it are arguments of ignorance. If there were some magical silver bullet to make online communications platforms more welcoming and accommodating to all, that would be a huge selling point, and one that many would immediately embrace. But the reality is that some social challenges are problems that can't just be solved with a dollop of javascript, and pretending otherwise is a dangerous distraction that only leads to misplaced attacks, without taking on the underlying problems.
Filed Under: abuse, algorithms, contentid, copyright, filtering, free speech, harassment, jessica valenti, sarah jeong