DailyDirt: Computers Are Editing Our Double-Plus-Ungood Content

from the urls-we-dig-up dept

Mon, Apr 7th 2014 5:00pm — Michael Ho

More and more digital media is being edited and prioritized in datacenters by intangible algorithms. As usual, this can be good and bad, depending on how the technology is used. On the one hand, algorithms can do laborious tasks that humans don't want to do. But at the same time, algorithms might introduce all kinds of errors or inadvertent biases on a scale that no group of humans could ever accomplish without automation. Here are just a few links on bots tinkering with online content.

The Los Angeles Times reported on an earthquake in about 3 minutes -- thanks to an algorithm that collected data from the US Geological Survey and automagically created an article about the seismic event. This robotic reporting isn't exactly a new thing, but the robotic "scoop" can't be matched by human writers. [url]
About half of all the edits on Wikipedia are made by bots. Algorithms keep spam links from flooding the site, and they also create whole entries based on online data, as well as perform tedious tasks such as grammar and spelling corrections. Not surprisingly, the biggest bot job on Wikipedia is detecting vandalism. [url]
Algorithms aren't free of bias; they can actually amplify biases. Humans can also trick algorithms by gaming their inputs and biasing results, so computer-produced content isn't necessarily more objective than the writings of humans (not that anyone here would have assumed that). [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: algorithms, artificial intelligence, automation, bias, bots, earthquakes, journalism, labor, media, scoop, wikipedia

6 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

zip, 7 Apr 2014 @ 7:47pm

vandalism-by-bot on Wikipedia
"the biggest bot job on Wikipedia is detecting vandalism."

And the second-biggest bot "job" on Wikipedia is perpetrating vandalism.

I've seen much content, often high-quality content, deleted by bots. It seems to follow a familiar pattern: a new person -unregistered- comes in and writes a substantial addition to an article, but due to a minor violation of some arbitrary rule, everything the person wrote is automatically deleted. (and often times, new writers don't return to "defend" their edits)

Here's just one example of vandalism-by-wikibot that caught my attention:

https://en.wikipedia.org/w/index.php?title=Skiptrace&diff=302349485&oldid=30234931 2
[ link to this | view in chronology ]
- Anonymous Coward, 8 Apr 2014 @ 12:14am
  
  Re: vandalism-by-bot on Wikipedia
  imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal.
  
  When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action.
  
  Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump.
  [ link to this | view in chronology ]
  - zip, 8 Apr 2014 @ 3:35am
    
    Re: Re: vandalism-by-bot on Wikipedia
    "imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal. When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action. Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump."
    
    Bots don't argue "reasons" -- they spit out canned responses (and enforce non-negotiable blanket rules) when triggered. In this case based on the inclusion of a single word, "myspace".
    
    This was the offending line that nuked everything:
    
    "MySpace (http://www.myspace.com)- a "self-promotion" site where people often provide substantial details about themselves"
    
    So because of a single line containing the URL of the home page of a highly-popular site on the 'ban' list, the entire body of work by that author was thrown out. Although well-intentioned, the bot 'crashed-and-burned' here because the bot's programmer failed to distinguish between links to personal pages on MySpace (of which there are millions) and the front page of MySpace. The bot was obviously programmed to assume that any links to MySpace (homepage or not) were put there by a spammer trying to googlebomb his personal vanity page to increase its search-engine ranking. As judge, jury and executioner, the bot pronounced that Wikipedia editor guilty of link-spamming, and as punishment, deleted not just the offending word, but all edits ever made by that person (even those that broke no "rules") going all the way back to his first appearance on Wikipedia.
    
    That's severe overkill, based on an invalid assumption, triggered by the bot's slopily-programmed ruleset. And as a result, the Wikipedia bot vandalized --in this case permanently-- an entire two-thirds of an article about an informative subject.
    
    But just think about it for a moment ... the Wikipedia article "Skiptrace" is about research methods used to locate people. Doesn't it seem counterproductive that Wikipedia's search-and-destroy bots would (mis)identify the URLs of these related search engines and online research tools used by investigators for data mining -- including the website of the US Post Office -- and consider them all to be "link spam" - even when they are precisely on-topic and relevant to the subject?
    
    I find it amusing that "Anonymous Coward" would find the Wikipedia bot's draconian enforcement action to be justified because a new user was not aware of the various customs peculiar to Wikipedia. I think this is one of the main problems with Wikipedia -- the site has become very unfriendly and unforgiving to new visitors, who are somehow expected to know a long list of esoteric rules before they ever start. Rules that are often counter-intuitive and illogical to an outsider not steeped in the "culture" of Wikipedia.
    [ link to this | view in chronology ]
    - John Fenderson (profile), 8 Apr 2014 @ 8:11am
      
      Re: Re: Re: vandalism-by-bot on Wikipedia
      Myspace is still "highly popular"??
      [ link to this | view in chronology ]
    - Anonymous Coward, 8 Apr 2014 @ 8:19am
      
      Re: Re: Re: vandalism-by-bot on Wikipedia
      It's a very poor quality edit that is correctly being reversed by the bot.
      
      The biggest problem with the edit is actually that that mass of material is unsourced original research; it shouldn't stay even if it's "defended" (unless the defense is adequate sourcing, which seems unlikely).
      
      The biggest problem with the bot program is that well-intentioned edits such as this one get reversed without a whole lot of deep clarification. This causes potential new editors to get alienated early. However, even with major efforts at mentoring, it's really rare to convert someone with a "link dump" mentality, such as that evinced by the reverted edit, into a good editor. They tend to have preconceived notions about how Wikipedia should be that are at odds with the views of most other editors.
      [ link to this | view in chronology ]
Rekrul, 8 Apr 2014 @ 9:47am

You left out the biggest and worst bot editor around: YouTube's Content ID filter.
[ link to this | view in chronology ]