Ideas and Opinions
26 January 2016

Two Ways of Knowing: Big Data and Evidence-Based Medicine

Publication: Annals of Internal Medicine
Volume 164, Number 8
Evidence-based medicine (EBM) is more than 20 years old (1). Although EBM's painstaking path of careful clinical studies, critical appraisal of published evidence, and methodologically rigorous systematic reviews has been the template for knowing what works in medicine, new “big data” approaches seem to offer a powerful and tempting alternative. Big data are a distinct “cultural, technological, and scholarly phenomenon” (2) centered on the application of machine learning algorithms to diverse, large-scale data. As clinics and hospitals generate huge amounts of electronic health record (EHR) data and systems like IBM's Watson system combine genomic data, published literature, and …

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA. 1992;268:2420-5. [PMID: 1404801]
2.
Boyd DCrawford K. Critical questions for big data. Information, Communication & Society. 2012;15:662-79.
3.
IBM Watson for Oncology. Accessed at www.ibm.com/smarterplanet/us/en/ibmwatson/watson-oncology.html on 9 December 2015.
4.
Savov V. Google signs deal to put sensors directly on your eye. The Verge. 15 July 2014. Accessed at www.theverge.com/2014/7/15/5900871/google-and-novartis-smart-contact-lens-partnership on 9 December 2015.
5.
Press G. 6 Predictions for the $125 Billion Big Data Analytics Market in 2015. Forbes. 11 December 2014. Accessed at www.forbes.com/sites/gilpress/2014/12/11/6-predictions-for-the-125-billion-big-data-analytics-market-in-2015 on 9 December 2015.
6.
Centre for Evidence-Based Medicine. Study Designs. Accessed at www.cebm.net/study-designs on 8 January 2016.
7.
Zhao FHTiggelaar SMHu SYZhao NHong YNiyazi Met al. A multi-center survey of HPV knowledge and attitudes toward HPV vaccination among women, government officials, and medical personnel in China. Asian Pac J Cancer Prev. 2012;13:2369-78. [PMID: 22901224]
8.
Corley CDMihalcea RMikler ARSanfilippo AP. Chapter 18: Predicting individual affect of health interventions to reduce HPV prevalence. In: Arabnia HR, Tran QN, eds. Software Tools and Algorithms for Biological Systems. New York: Springer Science+Business Media; 2011:181.
9.
De Choudhury MGamon MCounts SHorvitz E. Predicting Depression via Social Media. Proceedings of the Seventh International Association for the Advancement of Artificial Intelligence Conference on Weblogs and Social Media, Boston, MA, 8–10 July 2013. Palo Alto, CA: Association for the Advancement of Artificial Intelligence Pr; 2013.

Comments

0 Comments
Sign In to Submit A Comment
Gregory Mints, M.D., F.A.C.P., Deanna P. Jannat-Khah, DrPH, MSPH, Arthur Thomas Evans, M.D., M.P.H 11 May 2016
Bayes is Back
We·agree·that·"big·data"·not·only·has·caused·substantial·disruption·to·statistical·science but also offers promises for improving EBM and the practice of medicine. The main impact, in our opinion, is in its challenge to the conventional frequentist statistics (exemplified by p-values), and its re-invigoration of the Bayesian approach.

Consider the recent governmental promotion of hospital ratings on various objective (such as in-hospital mortality) and subjective (such as patient ratings of the quality of physician communication) performance measures. Under the classical frequentist paradigm, the individual hospital’s mean score is the best estimate of its performance. However, it has been known since the 1950s that individual mean scores are invalid as estimates of performance (1, 2). The best metric, paradoxically, depends not just on the individual, but on the performance of all other individuals being evaluated (1-3). What sense does this make? What do my communication skills have to do with yours? Simply, collective performance establishes the benchmark (or base rate) of what can be expected. Any deviation from the expected is due to the combination of pure chance and true difference in performance. Consequently, all individual scores should be adjusted to reflect the role of chance. Each score is thus “pulled” towards the overall mean, with the magnitude of the “pull” directly related to the deviation from the expected: wild outliers will be “pulled in” more; those closer to the overall mean, only a little. As a result, the variability—the spread—of the individual scores after adjustment is reduced and the distribution will be shrunken. Similar to the concept of “regression to the mean,” spectacular scores may not really represent spectacular performance, and horrific scores may not indicate truly terrible performance.

This “big data” method of shrinking predictions to be closer to the overall mean has major practical implications in a healthcare system where rewards and punishments are tied directly to the above metrics. A similar logic powers a multitude of “big data” applications, from gene chips (4) to models of disease epidemics (5). At the heart of this reasoning is the Bayesian view that estimates and their certainty are determined not only by the current data sample, but also by prior expectations. Bayesian analysis is a very powerful tool, but its major limitation is that priors are commonly unknown and have to be either arbitrarily chosen or guessed, often yielding wildly inaccurate results. When the amount of data is large, however, this limitation can be overcome. The sample, itself, can be thought of as containing its own priors, which can now be measured with great accuracy.

Bayes is back.

1. Stein C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press, Berkeley, California: 197-206.
2. Robbins H. An Empirical Bayes Approach to Statistics. In: Neyman J, ed. the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956 1955. Statistical Laboratory University of California, Berkeley, California. University of California Press: 157-63.
3. Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction: Cambridge University Press; 2010.
4. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genetic epidemiology. 2002;23(1):70-86.
5. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11(8):e1004382.

Information & Authors

Information

Published In

cover image Annals of Internal Medicine
Annals of Internal Medicine
Volume 164Number 819 April 2016
Pages: 562 - 563

History

Published online: 26 January 2016
Published in issue: 19 April 2016

Keywords

Authors

Affiliations

Ida Sim, MD, PhD
From University of California, San Francisco, San Francisco, California.
Presented in part at the 3rd Annual Cochrane Lecture, Vienna, Austria, 4 October 2015 (available at www.youtube.com/watch?v=RgOgcs95fRk).
Corresponding Author: Ida Sim, MD, PhD, Division of General Internal Medicine, University of California, San Francisco, 1545 Divisadero Street, Suite 308, San Francisco, CA 94143-0320; e-mail, [email protected].
Author Contributions: Conception and design: I. Sim.
Drafting of the article: I. Sim.
Critical revision of the article for important intellectual content: I. Sim.
Final approval of the article: I. Sim.
Administrative, technical, or logistic support: I. Sim.
This article was published at www.annals.org on 26 January 2016.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format





Download article citation data for:
Ida Sim. Two Ways of Knowing: Big Data and Evidence-Based Medicine. Ann Intern Med.2016;164:562-563. [Epub 26 January 2016]. doi:10.7326/M15-2970

View More

Login Options:
Purchase

You will be redirected to acponline.org to sign-in to Annals to complete your purchase.

Access to EPUBs and PDFs for FREE Annals content requires users to be registered and logged in. A subscription is not required. You can create a free account below or from the following link. You will be redirected to acponline.org to create an account that will provide access to Annals. If you are accessing the Free Annals content via your institution's access, registration is not required.

Create your Free Account

You will be redirected to acponline.org to create an account that will provide access to Annals.

View options

PDF/EPUB

View PDF/EPUB

Related in ACP Journals

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media