Talk:Data dredging

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This is the talk page for discussing improvements to the Data dredging article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Psychology

	Psychology portal This article is within the scope of WikiProject Psychology, a collaborative effort to improve the coverage of Psychology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.PsychologyWikipedia:WikiProject PsychologyTemplate:WikiProject Psychologypsychology articles
???	This article has not yet received a rating on the project's importance scale.

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Merge

I support the merge of this page with the page on Data Dredging. These are essentially the same concept by two different names. They should be on the same page. Maybe a disambiguation entry can be posted to differentiate these concepts. AjeetKhurana (talk) —Preceding undated comment added 13:58, 20 May 2009 (UTC).[reply]

Support merge - Data dredging is probably the best title (comment by John Quiggin, forgot to sign).

Do not merge - Bias through incorrect data-snooping is essentially different from the problem created by testing a hypothesis with the same data-set. For example, data-snooping bias may occur when dealing with an highly fluctuating set of data where every removal of a datapoint results in a new extreme, and so on. (Pc100935 11:59, 18 December 2006 (UTC))[reply]

Splitting a data-set parts A and B and then using part B to test a hypothesis formulated using part A is not recommended since these datasets can be highly correlated. Best practice is to formulate a hypothesis before looking at the data and use the data to test the hypothesis. If a hypothesis is based on existing data it should only be tested by collecting new independent data. (Pc100935 11:59, 18 December 2006 (UTC))[reply]

Support. Why hasn't the merge been made already? SweetNightmares (talk) 04:45, 21 January 2010 (UTC)[reply]

Support merge - while slightly different concepts are introduced by the two articles, there is no reason why that can't be written into one more coherent article that covers both. 94.195.129.125 (talk) 20:58, 5 April 2010 (UTC)[reply]

Merge done. 16:13, 30 November 2010 (UTC)

Global tag justified?

I don't think the global tag is justified: it's applied to an example of illegitimate hypothesis formation, but the example doesn't need to be universal! Richard Pinch (talk) 18:57, 11 June 2008 (UTC)[reply]

Well, it could certainly be rephrased in a more international manner, but I think the true problem is that the sentence is too long and not very clear, and doesn't bring a clear conclusion (why would it be wrong?) Calimo (talk) 15:11, 13 January 2009 (UTC)[reply]

Circumventing the scientific approach?

"Circumventing the traditional scientific approach of conducting an experiment without a hypothesis can lead to premature conclusions."

I believe the traditional scientific approach is to form a hypothesis before conducting an experiment, so the sentence should be rewritten to say, "Circumventing the traditional scientific approach by conducting an experiment without a hypothesis can lead to premature conclusions."

Unless someone knows better and objects, I will make the change. —Blanchette (talk) 06:57, 9 May 2011 (UTC)[reply]

Done. —Blanchette (talk) 21:19, 20 May 2011 (UTC)[reply]

Topics for new articles?

p-hacking, data peeking, and the replication crisis are related topics that probably deserve articles of their own. -- The Anome (talk) 10:26, 20 May 2014 (UTC)[reply]

Apparently p-hacking was written again in 2014, and in fact I would be in favour of that. Data-dredging is a more sophisticated approach, whereas P-hacking is much easier and mindlessly done. Viguarda (talk) 11:30, 20 January 2015 (UTC)[reply]

My first thoughts on reading this are that it doesn't seem to mention, or possibly distinguish, between the intentional and accidental cases. One could search through a data set for any statistically significant event, or for specific conclusions. Note that the latter is different from cherry picking, maybe cherry tree picking would be a better analogy. One selects the tree with the best fruit, picks all the fruit, (so as not to be accused of cherry picking), and then presents the results. Gah4 (talk) 00:03, 15 October 2016 (UTC)[reply]

Here's a current example of how p-hacking is used in common discourse:

The Inside Story Of How An Ivy League Food Scientist Turned Shoddy Data Into Viral Studies — 25 February 2018; syndicated at aldaily.com

One reason for the discrepancy is "p-hacking," the taboo practice of slicing and dicing a dataset for an impressive-looking pattern. It can take various forms, from tweaking variables to show a desired result, to pretending that a finding proves an original hypothesis — in other words, uncovering an answer to a question that was only asked after the fact.

I'm not thrilled with p-hacking redirecting to data dredging, a term I have never yet seen used in a mainstream, general-audience publication. p-hacking is a form of data dredging, with the specific end result of gaming an ethical bright line to gain a prominent office or pedestal of trust, which is ultimately more destructive to the scientific venture than mere academic dishonesty. — MaxEnt 08:43, 3 March 2018 (UTC)[reply]

Vague

This article has been watered down since I last read it, apparently in an effort to cast the topic in a more "neutral" light. Just the opening paragraph, for instance, now says: "Data dredging ... is the use of data mining to uncover relationships in data." That does not seem sufficient to me at all. All data mining is used for uncovering relationsships in data; the paragraph is almost tautological. The opening paragraph should instead succintly define data dredging as it differs from other ways of using data. If I can find reasonable sources, I may go ahead and rewrite some of it.--Anders Feder (talk) 10:03, 4 September 2014 (UTC)[reply]

Spurious Correlations

I was thinking of using an image from this site as a headline image since it explains the idea really well (it's Creative Commons Attribution-licensed). Any thoughts on this, or suggestions for a particularly ridiculous one? Blythwood (talk) 22:48, 3 June 2015 (UTC)[reply]

Second Vague, also second no merge

The article appears to vague to me, too and combines multiple problems into one which should be separated. More thorough mathematical derivations would be helpful in my opinion. Some core statements are known to be wrong, although still frequently mentioned as urban legends in social sciences (e.g., testing multiple stochastically independent hypotheses on the same data set is no problem. That distinction is not made in the page so far). I suggest to rewrite. — Preceding unsigned comment added by Timo von Oertzen (talk • contribs) 15:05, 30 January 2017 (UTC)[reply]

Drawing Conclusions from data section

I find the drawing conclusions from data section to be ridiculous and unsourced.

If it had been sourced we could maybe get to the bottom of the problems.

For example, read Deming's "Red Bead experiment" from his The New Economics and then read this section. A p-chart will tell you that immediately that there is nothing there are no conclusions whatsoever that can be drawn from a sample of 5 coin flips.

2601:14F:8005:B810:C823:C430:769F:E823 (talk) 16:43, 21 October 2017 (UTC)[reply]

Easing the verbiage of the Introduction

I think that the intro defines the most egregious case of data dredging, and not data dredging in general.

"The process of data dredging involves automatically testing huge numbers of hypotheses about a single data set by exhaustively searching"

This is not necessarily true. If I test 4 hypotheses about a single data set, only one turns up significant and I report only that one hypothesis, then I have committed p-hacking/data-dredging. This is an important distinction because this involved neither huge numbers of hypotheses nor an exhaustive search.

I'm going to let this sit for a day, and if nobody has objections, I will implement the changes. Ihearthonduras (talk) 18:45, 24 January 2018 (UTC)[reply]

I agree with Ihearthonduras - it's more than just automated testing. This happens just as much by hand. Lionfish0 (talk) 08:44, 4 February 2019 (UTC)[reply]

I tried changing this but it was reverted. The reason given was 'according to whom' but the whole section is unreferenced, so it's better it's correct and unreferenced than wrong and unreferenced. Here's a reference that might do: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 — Preceding unsigned comment added by Lionfish0 (talk • contribs) 09:24, 5 February 2019 (UTC)[reply]

p-hacking

(Edited) The following paper had a considerable impact in experimental psychology and might be worth mentioning in the article. One of its co-authors supposedly coined the term "p-hacking" though the term itself doesn't appear in the paper. I'll try to add it sometime if nobody else does, but if someone else does it first, that's great. Citation:

Simmons, Joseph P. and Nelson, Leif D. and Simonsohn, Uri, False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant (May 23, 2011). Psychological Science, 2011. Available at SSRN: https://ssrn.com/abstract=1850704

The authors wrote an invited retrospective 5 years later:

Simmons, Joseph P. and Nelson, Leif D. and Simonsohn, Uri, False-Positive Citations (March 27, 2017). Perspectives on Psychological Science, Forthcoming. Available at SSRN: https://ssrn.com/abstract=2916240

173.228.123.121 (talk) 02:38, 5 March 2018 (UTC)[reply]

1 in 4 statisticians say they were asked to commit scientific fraud

[1] Not sure where else to park this. In most of the cases the fraud was lighter stuff like underreporting non-significant results but sometimes they were asked to actually falsify data. 173.228.123.166 (talk) 07:07, 3 November 2018 (UTC)[reply]

thus dramatically increasing and understating the risk of false positives

Regarding the sentence with much editing and discussion (in edit summaries). It seems to me that the important point is getting results with an indication of statistical significance, while intentionally (usually) disregarding the truth. What might be called in a legal sense, wanton disregard for the truth. If you do it long enough, you might (un)luckily find something actually true. Consider a politician at a campaign rally energizing his base with unsubstantiated claims, and without any interest in the truth. Very rarely, he might accidentally say something true, but that is just random luck. Gah4 (talk) 01:49, 13 November 2019 (UTC)[reply]

Controlling for gender: delete subsection "In sociology"?

The following paragraph is based on so many misunderstandings of Simonsohn et al. that it's painful to read, yet I will try to summarize the mistakes. Before analyzing the paragraph, let me state that I cannot correct it, because it is not even wrong. Since this is the only content of subsection "In sociology", I suggest to delete the entire subsection.

Another way to flatten a p-curve is to control for gender. An analysis by Simonsohn et. al. of a study by Bruns and Ioannidis (2016) demonstrates this, as when Bruns and Ioannidis dropped the gender control, this also dropped the reported t-value from 9.29 to 0.88, showing a non-causal effect where a causal one was previously recorded (3).[13]

First when Bruns and Ioannidis dropped the gender control: this is plain false, the association under consideration is an example concocted to counter an argument proposed by B-and-I. So B-and-I never dropped the gender control because the whole thing is not present in their paper.

Second dropped the reported t-value from 9.29 to 0.88, showing a non-causal effect where a causal one was previously recorded. This is just ignorant gibberish. The whole point of Simonsohn et. al. is that statistical tests (as opposed to experiment design) cannot distinguish causation (causal vs non causal), only correlation. To quote from the paper: No statistical tool could possibly differentiate correlational from causal relationships. Criticizing p-curve for failing to differentiate causation from correlation is like criticizing a professor for being mortal. So causality of the effect has nothing to do with the t-value being 9.29 or 0.88. This is a profound misunderstanding on the part of whoever wrote that paragraph.

Third Another way to flatten a p-curve is to control for gender. An analysis by Simonsohn et. al. ... demonstrates this. No, Simonsohn et al. needed an example of confounded correlation, and they decided to deliberately use one in which the confounding variable is gender. To say that Another way to flatten a p-curve is to control for gender. is akin to say that another way to flatten a person is to run them over with a yellow Toyota. Gender, in that example, was just a confounding variable chosen to make the example easy to understand. We are talking here about an artificial example: one of the correlated variables is the number of female sexual partners.

Fourth. The entire paragraph cites an example found in a paper by Simonsohn et al. to argue a point that is unrelated to the point made by the paper itself. The point made by Simonsohn et al. is quite sensible, and published research. Yet, using the same example to argue something else — to argue gibberish one should say in this case — is a fallacy at best. — Preceding unsigned comment added by 93.57.248.65 (talk) 11:07, 6 November 2024 (UTC)[reply]

I will have to look at this more closely to be certain (and will in the next few days), but I tend to agree, and it also looks like the last part of the paragraph (This is an important finding because t-values are inversely proportional to p-values, meaning higher t-values (above 2.8) indicate lower p-values. By controlling for gender, one can artificially inflate the t-value, thus artificially deflating the p-value as well.) is WP:OR. If we remove that, it becomes clear that what is left is not an example of data-dredging/p-hacking since the connection to p-hacking only exists because of the last sentence. (Note that the cited paper technically does include an illustration/example of p-hacking using social science data, but the whole thing about the p-curve has nothing do with that and wouldn't need to be mentioned. So we would basically only keep the citation and replace everything else, but this might be preferable to deleting the section.) Felida⁹⁷ (talk) 12:01, 6 November 2024 (UTC)[reply]

I've removed the subsection. As I said, the cited paper [2] may be used as a citation for a completely rewritten subsection, and I may write that if I find the time, but the issues are bad enough that we can't leave this up until I (or someone else) get around to it. Felida⁹⁷ (talk) 22:31, 11 November 2024 (UTC)[reply]

Side note: The paragraph was added in this revision (January 1, 2023) by Eman0nymous. Felida⁹⁷ (talk) 12:04, 6 November 2024 (UTC)[reply]