DailyDirt: Computers Are Editing Our Double-Plus-Ungood Content
from the urls-we-dig-up dept
More and more digital media is being edited and prioritized in datacenters by intangible algorithms. As usual, this can be good and bad, depending on how the technology is used. On the one hand, algorithms can do laborious tasks that humans don't want to do. But at the same time, algorithms might introduce all kinds of errors or inadvertent biases on a scale that no group of humans could ever accomplish without automation. Here are just a few links on bots tinkering with online content.- The Los Angeles Times reported on an earthquake in about 3 minutes -- thanks to an algorithm that collected data from the US Geological Survey and automagically created an article about the seismic event. This robotic reporting isn't exactly a new thing, but the robotic "scoop" can't be matched by human writers. [url]
- About half of all the edits on Wikipedia are made by bots. Algorithms keep spam links from flooding the site, and they also create whole entries based on online data, as well as perform tedious tasks such as grammar and spelling corrections. Not surprisingly, the biggest bot job on Wikipedia is detecting vandalism. [url]
- Algorithms aren't free of bias; they can actually amplify biases. Humans can also trick algorithms by gaming their inputs and biasing results, so computer-produced content isn't necessarily more objective than the writings of humans (not that anyone here would have assumed that). [url]
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: algorithms, artificial intelligence, automation, bias, bots, earthquakes, journalism, labor, media, scoop, wikipedia
Reader Comments
Subscribe: RSS
View by: Time | Thread
vandalism-by-bot on Wikipedia
And the second-biggest bot "job" on Wikipedia is perpetrating vandalism.
I've seen much content, often high-quality content, deleted by bots. It seems to follow a familiar pattern: a new person -unregistered- comes in and writes a substantial addition to an article, but due to a minor violation of some arbitrary rule, everything the person wrote is automatically deleted. (and often times, new writers don't return to "defend" their edits)
Here's just one example of vandalism-by-wikibot that caught my attention:
https://en.wikipedia.org/w/index.php?title=Skiptrace&diff=302349485&oldid=30234931 2
[ link to this | view in chronology ]
Re: vandalism-by-bot on Wikipedia
When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action.
Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump.
[ link to this | view in chronology ]
Re: Re: vandalism-by-bot on Wikipedia
Bots don't argue "reasons" -- they spit out canned responses (and enforce non-negotiable blanket rules) when triggered. In this case based on the inclusion of a single word, "myspace".
This was the offending line that nuked everything:
"MySpace (http://www.myspace.com)- a "self-promotion" site where people often provide substantial details about themselves"
So because of a single line containing the URL of the home page of a highly-popular site on the 'ban' list, the entire body of work by that author was thrown out. Although well-intentioned, the bot 'crashed-and-burned' here because the bot's programmer failed to distinguish between links to personal pages on MySpace (of which there are millions) and the front page of MySpace. The bot was obviously programmed to assume that any links to MySpace (homepage or not) were put there by a spammer trying to googlebomb his personal vanity page to increase its search-engine ranking. As judge, jury and executioner, the bot pronounced that Wikipedia editor guilty of link-spamming, and as punishment, deleted not just the offending word, but all edits ever made by that person (even those that broke no "rules") going all the way back to his first appearance on Wikipedia.
That's severe overkill, based on an invalid assumption, triggered by the bot's slopily-programmed ruleset. And as a result, the Wikipedia bot vandalized --in this case permanently-- an entire two-thirds of an article about an informative subject.
But just think about it for a moment ... the Wikipedia article "Skiptrace" is about research methods used to locate people. Doesn't it seem counterproductive that Wikipedia's search-and-destroy bots would (mis)identify the URLs of these related search engines and online research tools used by investigators for data mining -- including the website of the US Post Office -- and consider them all to be "link spam" - even when they are precisely on-topic and relevant to the subject?
I find it amusing that "Anonymous Coward" would find the Wikipedia bot's draconian enforcement action to be justified because a new user was not aware of the various customs peculiar to Wikipedia. I think this is one of the main problems with Wikipedia -- the site has become very unfriendly and unforgiving to new visitors, who are somehow expected to know a long list of esoteric rules before they ever start. Rules that are often counter-intuitive and illogical to an outsider not steeped in the "culture" of Wikipedia.
[ link to this | view in chronology ]
Re: Re: Re: vandalism-by-bot on Wikipedia
[ link to this | view in chronology ]
Re: Re: Re: vandalism-by-bot on Wikipedia
The biggest problem with the edit is actually that that mass of material is unsourced original research; it shouldn't stay even if it's "defended" (unless the defense is adequate sourcing, which seems unlikely).
The biggest problem with the bot program is that well-intentioned edits such as this one get reversed without a whole lot of deep clarification. This causes potential new editors to get alienated early. However, even with major efforts at mentoring, it's really rare to convert someone with a "link dump" mentality, such as that evinced by the reverted edit, into a good editor. They tend to have preconceived notions about how Wikipedia should be that are at odds with the views of most other editors.
[ link to this | view in chronology ]
[ link to this | view in chronology ]