Two Ways of Knowing: Big Data and Evidence-Based Medicine
Submit a Comment
Contributors must reveal any conflict of interest. Comments are moderated. Please see our information for authorsregarding comments on an Annals publication.
Get full access to this article
View all available purchase options and get full access to this article.
References
Comments
Sign In to Submit A CommentInformation & Authors
Information
Published In
History
Keywords
Copyright
Authors
Metrics & Citations
Metrics
Citations
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.
For more information or tips please see 'Downloading to a citation manager' in the Help menu.
Two Ways of Knowing: Big Data and Evidence-Based Medicine. Ann Intern Med.2016;164:562-563. [Epub 26 January 2016]. doi:10.7326/M15-2970
View More
Login Options:
Purchase
You will be redirected to acponline.org to sign-in to Annals to complete your purchase.
Access to EPUBs and PDFs for FREE Annals content requires users to be registered and logged in. A subscription is not required. You can create a free account below or from the following link. You will be redirected to acponline.org to create an account that will provide access to Annals. If you are accessing the Free Annals content via your institution's access, registration is not required.
Create your Free Account
You will be redirected to acponline.org to create an account that will provide access to Annals.
Bayes is Back
Consider the recent governmental promotion of hospital ratings on various objective (such as in-hospital mortality) and subjective (such as patient ratings of the quality of physician communication) performance measures. Under the classical frequentist paradigm, the individual hospital’s mean score is the best estimate of its performance. However, it has been known since the 1950s that individual mean scores are invalid as estimates of performance (1, 2). The best metric, paradoxically, depends not just on the individual, but on the performance of all other individuals being evaluated (1-3). What sense does this make? What do my communication skills have to do with yours? Simply, collective performance establishes the benchmark (or base rate) of what can be expected. Any deviation from the expected is due to the combination of pure chance and true difference in performance. Consequently, all individual scores should be adjusted to reflect the role of chance. Each score is thus “pulled” towards the overall mean, with the magnitude of the “pull” directly related to the deviation from the expected: wild outliers will be “pulled in” more; those closer to the overall mean, only a little. As a result, the variability—the spread—of the individual scores after adjustment is reduced and the distribution will be shrunken. Similar to the concept of “regression to the mean,” spectacular scores may not really represent spectacular performance, and horrific scores may not indicate truly terrible performance.
This “big data” method of shrinking predictions to be closer to the overall mean has major practical implications in a healthcare system where rewards and punishments are tied directly to the above metrics. A similar logic powers a multitude of “big data” applications, from gene chips (4) to models of disease epidemics (5). At the heart of this reasoning is the Bayesian view that estimates and their certainty are determined not only by the current data sample, but also by prior expectations. Bayesian analysis is a very powerful tool, but its major limitation is that priors are commonly unknown and have to be either arbitrarily chosen or guessed, often yielding wildly inaccurate results. When the amount of data is large, however, this limitation can be overcome. The sample, itself, can be thought of as containing its own priors, which can now be measured with great accuracy.
Bayes is back.
1. Stein C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press, Berkeley, California: 197-206.
2. Robbins H. An Empirical Bayes Approach to Statistics. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press: 157-63.
3. Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction: Cambridge University Press; 2010.
4. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genetic epidemiology. 2002;23(1):70-86.
5. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11(8):e1004382.