DailyDirt: Correlation Is Not Causation
from the urls-we-dig-up dept
Big data is a term that's been getting some buzz as the next thing that's going to change everyone's lives (for better or worse, depending on how you look at it). Having a lot of data doesn't necessarily mean you also have a lot of useful knowledge. Garbage in, garbage out, so they say. And making correlations is easy compared to finding a direct causal relationship. However, that hasn't stopped (so-called) journalists from writing misleading headlines. If you hate correlations being mistaken for causation, submit examples you've seen in the comments below. Here are just a few to start off.- Likes for curly fries on Facebook might correlate with high IQ scores, but don't click that like button just yet. Maybe there are more social experiments being performed on Facebook users than can be accurately counted. [url]
- Former high school athletes seem to get higher paying jobs (at least for the self-reporting men in this study). A lot of skills correlate with various forms of success. Perhaps enjoying the things you do (learned skill or not) is a reward unto itself. [url]
- Measuring the size of brain features can correlate with all kinds of activities, and people have been trying to measure brain sizes for a long time... because there are instruments that can measure the size of various brain parts. The interpretation of these measurements can lead to a lot of faulty conclusions. However, you won't often see the headline: "Watching moderate amounts of porn won’t hurt your brain." [url]
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: big data, brain, causation, correlation, gigo, iq, journalism, pet peeves, success
Companies: facebook
Reader Comments
The First Word
“Subscribe: RSS
View by: Time | Thread
http://www.venganza.org/about/open-letter/
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
I got one!
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
The more subjects a study needs to prove a point, the less you should trust the results. Psychiatric drug field studies routinely twiddle the numbers to get the results they want by combining samples from beneficial (lucky?) studies into cohorts that don't exhibit any positive reponse to show that on average, patients from the two cohorts show a positive response!
Citation: http://www.youtube.com/watch?v=A3YB59EKMKw I *think*. I'm at work now so can't confirm, but I'm pretty sure that's the one.
[ link to this | view in chronology ]
Re: Re:
"Needs to prove a point", implies this is not science, but rather a marketing ploy.
In a well constructed experiment or "study", as the sample size increases so does the precision.
[ link to this | view in chronology ]
Re: Re: Re:
Well, yes.
Well, no. In a well constructed experiment, precision will remain constant regardless of the sample size, but the resolution of the findings may be different.
eg- assuming correct randomisation and good controls across all sample sizes, a study of 20 subjects with no negative outcomes means you can confidently state a rate of "less than 1 in 20". Take it up to 20,000 subjects, you might find 100 negative outcomes relative to control, meaning you can refine your rate of
[ link to this | view in chronology ]
Re: Re: Re: Re:
meaning you can refine your rate of
[ link to this | view in chronology ]
Re: Re: Re: Re:
meaning you can refine your rate of less than 0.05 to less than 0.01 (or lower? My statistics-fu is weak)
Of course, neither study proves that the rate across the entire population isn't really 0.5, but that's what randomisation is supposed to (try to) address. Alternatively, even if the rate across the study population is accurate, it can be difficult to determine if a particular person might fit into that population or not.
[ link to this | view in chronology ]
Re: Re: Re: Re:
- This is incorrect. You assume the sample size quantity exceeds the quantity of possible unique results. When the aforementioned is not the case, increased resolution would only provide more detail of an incomplete data set.
"assuming correct randomisation"
- This is an attempt at simplifying the problem, as clearly there is no such thing as "correct randomization"
[ link to this | view in chronology ]
So What If Correlation Is Not Causation?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Conclusion: surveillance works, so we should do more.
2) An increase of global surveillance since 9/11 by the NSA correlates with an increase in global terrorist activity.
Conclusion: surveillance would work if we could do more.
Of course, #2's predicate might actually involve legitimate causation...
[ link to this | view in chronology ]
The Last Word
“