stories filed under: "faked"

Faux Randomness Strikes Again: How Researchers Realized Research 2000's Daily Kos Data Looked Faked

from the random-ain't-so-random dept

Tue, Jun 29th 2010 2:52pm — Mike Masnick

You may have heard by now that the political website Daily Kos has come out and explained that it believes the polling firm it has used for a while, Research 2000, was faking its data. While it's nice to see a publication come right out and bluntly admit that it had relied on data that it now believes was not legit, what's fascinating if you're a stats geek is how a team of stats geeks figured out there were problems with the data. As any good stats nerd knows, the concept of "randomness" isn't quite as random as some people think, which is why faking randomness almost always leads to tell-tale signs that the data was faked or manipulated. For example, one very, very common test is to use Benford's Law to look at the first digit of data in a data set, because in a truly random set, the distribution is not what people usually expect.

In this case, the three guys who had problems with the data (Mark Grebner, Michael Weissman, and Jonathan Weissman) zeroed in on just a few clues that the data was faked or manipulated. The first thing they noticed was that when R2K did polls that tested how men and women viewed certain politicians or political parties (favorable/unfavorable) there was an odd pattern: if the percentage of men that rated a particular politician favorable or unfavorable was an even number, so was the the percentage of female raters. It seemed like these two points always matched up. If the male percentage was even the female percentage was even. If the male percentage was odd, the female percentage was odd. Yet, as you should know, these are independent variables, not influenced by each other. That 34% of men find a particular politician favorable should have no bearing on why an even percentage of women find that politician favorable. In fact, this happened in almost every such poll that R2K did, to such a level as to suggest it being as close to impossible as you can imagine:

Common sense says that that result is highly unlikely, but it helps to do a more precise calculation. Since the odds of getting a match each time are essentially 50%, the odds of getting 776/778 matches are just like those of getting 776 heads on 778 tosses of a fair coin. Results that extreme happen less than one time in 10²²⁸. That's one followed by 228 zeros. (The number of atoms within our cosmic horizon is something like 1 followed by 80 zeros.) For the Unf, the odds are less than one in 10²³¹. (Having some Undecideds makes Fav and Unf nearly independent, so these are two separate wildly unlikely events.)

There is no remotely realistic way that a simple tabulation and subsequent rounding of the results for M's and F's could possibly show that detailed similarity. Therefore the numbers on these two separate groups were not generated just by independently polling them.

The other statistical analysis that I found fascinating was that when you looked at weekly changes in favorability ratings, the R2K data almost always changed a bit. But, if you look at other data, no change is the most common result. As they point out, if you look at, say, Gallup data, you get this nice typical bell curve:

But if you look at the R2K data, you get things like the following:

Notice that substantial dip at the 0% mark. That seems to indicate a likelihood of faked or manipulated data, from someone who thinks that "random" data means the data has to keep changing. Back to Grebner, Weissman and Weissman:

How do we know that the real data couldn't possibly have many changes of +1% or -1% but few changes of 0%? Let's make an imaginative leap and say that, for some inexplicable reason, the actual changes in the population's opinion were always exactly +1% or -1%, equally likely. Since real polls would have substantial sampling error (about +/-2% in the week-to-week numbers even in the first 60 weeks, more later) the distribution of weekly changes in the poll results would be smeared out, with slightly more ending up rounding to 0% than to -1% or +1%. No real results could show a sharp hole at 0%, barring yet another wildly unlikely accident.

Kos is apparently planning legal action, and so far R2K hasn't responded in much detail other than to claim that its polls were conducted properly. I'm not all that interested in that part of the discussion however. I just find it neat how the "faux randomness" may have exposed the problems with the data.

Filed Under: data, faked, random
Companies: dailykos, research 2000

35 Comments

Expand

Follow Techdirt

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

read all »

Techdirt Deals

Report this ad | Hide Techdirt ads

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
13:33	Former Employees Say Mossad Members Dropped By NSO Officers To Run Off-The-Books Phone Hacks (2)
12:01	No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet (18)
10:49	San Francisco Cops Are Running Rape Victims' DNA Through Criminal Databases Because What Even The Fuck (18)
10:44	Daily Deal: The Complete 2022 Java Coder Bundle (0)
09:31	As Expected, Trump's Social Network Is Rapidly Banning Users It Doesn't Like, Without Telling Them Why (44)
06:30	Comcast Continues To Bleed Olympics Viewers After Years Of Bumbling (19)
Wednesday
20:42	Apple Finally Defeats Dumb Diverse Emoji Lawsuit One Year Later (6)
15:39	Clearview Pitch Deck Says It's Aiming For A 100 Billion Image Database, Restarting Sales To The Private Sector (10)
13:41	Peloton Outage Prevents Customers From Using $2,500 Exercise Bikes (16)
12:09	The GOP Knows That The Dem's Antitrust Efforts Have A Content Moderation Trojan Horse; Why Don't The Dems? (16)
10:51	Hertz Ordered To Tell Court How Many Thousands Of Renters It Falsely Accuses Of Theft Every Year (24)
09:21	Even As Trump Relies On Section 230 For Truth Social, He's Claiming In Lawsuits That It's Unconstitutional (34)
06:16	Medical, Home Alarm Industries Warn Of Major Outages As AT&T Shuts Down 3G Network (25)
Tuesday
20:37	Video Game History Foundation: Nintendo Actions 'Actively Destructive To Video Game History' (29)
15:35	Massachusetts Court Says No Expectation Of Privacy In Social Media Posts Unwittingly Shared With An Undercover Cop (17)
13:30	Techdirt Podcast Episode 312: Regulating The Internet (2)
12:03	US Copyright Office Gets It Right (Again): AI-Generated Works Do Not Get A Copyright Monopoly (60)
10:42	LA Sheriff Threatens To 'Subject' City Council To 'Defamation Law' If They Won't Stop Calling His Deputies 'Gang Members' (20)
10:37	Daily Deal: codeSpark Academy Sibling Bundle (0)
09:25	Trump's Truth Social Bakes Section 230 Directly Into Its Terms, So Apparently Trump Now Likes Section 230 (128)
06:22	15 Years Late, The FCC Cracks Down On Broadband Apartment Monopolies (31)
Sunday
12:05	Funniest/Most Insightful Comments Of The Week At Techdirt (11)
Saturday
12:00	This Week In Techdirt History: February 13th - 19th (1)
Friday
19:39	Letter From High-Ranking FBI Lawyer Tells Prosecutors How To Avoid Court Scrutiny Of Firearms Analysis Junk Science (25)
15:52	Nintendo Is Beginning To Look Like The Disney Of The Video Game Industry (44)
13:49	Seattle Public Radio Station Manages To Partially Brick Area Mazdas Using Nothing More Than Some Image Files (44)
12:13	Thankfully, Jay Inslee's Unconstitutional Bill To Criminalize Political Speech Dies In The Washington Senate (8)
10:52	How Our Convoluted Copyright Regime Explains Why Spotify Chose Joe Rogan Over Neil Young (136)
10:47	Daily Deal: The Complete Blocs Website Builder Bundle (0)
09:33	Arizona Prosecutor Who Brought Bogus Gang Charges Against Protesters Files Ridiculous Defamation Suit Against Her Boss (12)

Faux Randomness Strikes Again: How Researchers Realized Research 2000's Daily Kos Data Looked Faked

from the random-ain't-so-random dept

The Techdirt Greenhouse

Thursday

Wednesday

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the random-ain't-so-random dept

Techdirt Daily Newsletter

The Techdirt Greenhouse

Tools & Services

Company

Contact

More