DailyDirt: Computers Are Editing Our Double-Plus-Ungood Content

from the urls-we-dig-up dept

More and more digital media is being edited and prioritized in datacenters by intangible algorithms. As usual, this can be good and bad, depending on how the technology is used. On the one hand, algorithms can do laborious tasks that humans don't want to do. But at the same time, algorithms might introduce all kinds of errors or inadvertent biases on a scale that no group of humans could ever accomplish without automation. Here are just a few links on bots tinkering with online content. If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: algorithms, artificial intelligence, automation, bias, bots, earthquakes, journalism, labor, media, scoop, wikipedia


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    zip, 7 Apr 2014 @ 7:47pm

    vandalism-by-bot on Wikipedia

    "the biggest bot job on Wikipedia is detecting vandalism."

    And the second-biggest bot "job" on Wikipedia is perpetrating vandalism.

    I've seen much content, often high-quality content, deleted by bots. It seems to follow a familiar pattern: a new person -unregistered- comes in and writes a substantial addition to an article, but due to a minor violation of some arbitrary rule, everything the person wrote is automatically deleted. (and often times, new writers don't return to "defend" their edits)

    Here's just one example of vandalism-by-wikibot that caught my attention:

    https://en.wikipedia.org/w/index.php?title=Skiptrace&diff=302349485&oldid=30234931 2

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 8 Apr 2014 @ 12:14am

      Re: vandalism-by-bot on Wikipedia

      imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal.

      When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action.

      Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump.

      link to this | view in chronology ]

      • identicon
        zip, 8 Apr 2014 @ 3:35am

        Re: Re: vandalism-by-bot on Wikipedia

        "imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal. When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action. Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump."


        Bots don't argue "reasons" -- they spit out canned responses (and enforce non-negotiable blanket rules) when triggered. In this case based on the inclusion of a single word, "myspace".

        This was the offending line that nuked everything:

        "MySpace (http://www.myspace.com)- a "self-promotion" site where people often provide substantial details about themselves"

        So because of a single line containing the URL of the home page of a highly-popular site on the 'ban' list, the entire body of work by that author was thrown out. Although well-intentioned, the bot 'crashed-and-burned' here because the bot's programmer failed to distinguish between links to personal pages on MySpace (of which there are millions) and the front page of MySpace. The bot was obviously programmed to assume that any links to MySpace (homepage or not) were put there by a spammer trying to googlebomb his personal vanity page to increase its search-engine ranking. As judge, jury and executioner, the bot pronounced that Wikipedia editor guilty of link-spamming, and as punishment, deleted not just the offending word, but all edits ever made by that person (even those that broke no "rules") going all the way back to his first appearance on Wikipedia.

        That's severe overkill, based on an invalid assumption, triggered by the bot's slopily-programmed ruleset. And as a result, the Wikipedia bot vandalized --in this case permanently-- an entire two-thirds of an article about an informative subject.

        But just think about it for a moment ... the Wikipedia article "Skiptrace" is about research methods used to locate people. Doesn't it seem counterproductive that Wikipedia's search-and-destroy bots would (mis)identify the URLs of these related search engines and online research tools used by investigators for data mining -- including the website of the US Post Office -- and consider them all to be "link spam" - even when they are precisely on-topic and relevant to the subject?

        I find it amusing that "Anonymous Coward" would find the Wikipedia bot's draconian enforcement action to be justified because a new user was not aware of the various customs peculiar to Wikipedia. I think this is one of the main problems with Wikipedia -- the site has become very unfriendly and unforgiving to new visitors, who are somehow expected to know a long list of esoteric rules before they ever start. Rules that are often counter-intuitive and illogical to an outsider not steeped in the "culture" of Wikipedia.

        link to this | view in chronology ]

        • icon
          John Fenderson (profile), 8 Apr 2014 @ 8:11am

          Re: Re: Re: vandalism-by-bot on Wikipedia

          Myspace is still "highly popular"??

          link to this | view in chronology ]

        • identicon
          Anonymous Coward, 8 Apr 2014 @ 8:19am

          Re: Re: Re: vandalism-by-bot on Wikipedia

          It's a very poor quality edit that is correctly being reversed by the bot.

          The biggest problem with the edit is actually that that mass of material is unsourced original research; it shouldn't stay even if it's "defended" (unless the defense is adequate sourcing, which seems unlikely).

          The biggest problem with the bot program is that well-intentioned edits such as this one get reversed without a whole lot of deep clarification. This causes potential new editors to get alienated early. However, even with major efforts at mentoring, it's really rare to convert someone with a "link dump" mentality, such as that evinced by the reverted edit, into a good editor. They tend to have preconceived notions about how Wikipedia should be that are at odds with the views of most other editors.

          link to this | view in chronology ]

  • identicon
    Rekrul, 8 Apr 2014 @ 9:47am

    You left out the biggest and worst bot editor around: YouTube's Content ID filter.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.