Speaker: Xiao-Li Meng, PhD, Dean, Graduate School of Arts and Sciences, and the Whipple V.N. Jone Professor of Statistics, Harvard University
Description: This talk provides a statistical perspective on the roles the seven S’s (sins?) play in increasing the amount of irreproducible research, in medical and life sciences and beyond:
- Selections in hypotheses (e.g., subgroup analysis);
- Selections in data (e.g., deleting “outliers” or only using “complete cases”);
- Selections in methodologies (e.g., for goodness of fit);
- Selections in due diligence and debugging (e.g., triple checking only when the outcome seems undesirable);
- Selections in publication (e.g., only when p-value <0.05);
- Selections in reporting/summary (e.g., suppressing caveats);
- Selections in understanding and interpretation (e.g., our preference for deterministic, “common sense” interpretation).
The Big Data Paradox and Simpson’s Paradox will be used to demonstrate that the problem of irreproducible research is getting BIGGER with Big Data. A cocktail treatment approach together with a selfish/blowfish test is suggested to combat this problem.