Abstract
Machine learning is used increasingly in clinical care to improve diagnosis, treatment selection, and health system efficiency. Because machine-learning models learn from historically collected data, populations that have experienced human and structural biases in the past—called protected groups—are vulnerable to harm by incorrect predictions or withholding of resources. This article describes how model design, biases in data, and the interactions of model predictions with clinicians and patients may exacerbate health care disparities. Rather than simply guarding against these harms passively, machine-learning systems should be used proactively to advance health equity. For that goal to be achieved, principles of distributive justice must be incorporated into model design, deployment, and evaluation. The article describes several technical implementations of distributive justice—specifically those that ensure equality in patient outcomes, performance, and resource allocation—and guides clinicians as to when they should prioritize each principle. Machine learning is providing increasingly sophisticated decision support and population-level monitoring, and it should encode principles of justice to ensure that models benefit all patients.
References
- 1.
Krause J ,Gulshan V ,Rahimy E ,Karth P ,Widner K ,Corrado GS ,et al . Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology. 2018;125:1264-72. [PMID: 29548646] doi:10.1016/j.ophtha.2018.01.034 CrossrefMedlineGoogle Scholar - 2. Angwin J, Larson J, Kirchner L, Mattu S. Machine bias. ProPublica. 23 May 2016. Accessed at www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing on 13 December 2017. Google Scholar
- 3. Kleinberg J. Inherent trade-offs in algorithmic fairness [Abstract]. In: Abstracts of the 2018 Association for Computing Machinery International Conference on Measurement and Modeling of Computer Systems, Irvine, California, 18–22 June 2018. New York: Association for Computing Machinery; 2018:40. Google Scholar
- 4.
Lum K ,Isaac W . To predict and serve? Significance. 2016;13:14-9. CrossrefGoogle Scholar - 5.
Chouldechova A ,Benavides-Prado D ,Fialko O ,Vaithianathan R . A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc Mach Learn Res. 2018:134-48. Google Scholar - 6. Hurley D. Can an algorithm tell when kids are in danger? The New York Times Magazine. 2 January 2018. Accessed at www.nytimes.com/2018/01/02/magazine/can-an-algorithm-tell-when-kids-are-in-danger.html on 2 January 2018. Google Scholar
- 7. American Medical Association. AMA passes first policy recommendations on augmented intelligence. 2018. Accessed at www.ama-assn.org/ama-passes-first-policy-recommendations-augmented-intelligence on 6 July 2018. Google Scholar
- 8.
Chin MH ,King PT ,Jones RG ,Jones B ,Ameratunga SN ,Muramatsu N ,et al . Lessons for achieving health equity comparing Aotearoa/New Zealand and the United States. Health Policy. 2018;122:837-53. [PMID: 29961558] doi:10.1016/j.healthpol.2018.05.001 CrossrefMedlineGoogle Scholar - 9.
Smedley BD ,Stith AY ,Nelson AR ,eds .Institute of Medicine . Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. Washington, DC: National Academies Pr; 2003. Google Scholar - 10.
Rothstein R . The Color of Law: A Forgotten History of How Our Government Segregated America. New York: Liveright; 2017. Google Scholar - 11. Healthy People 2020. About Healthy People. 2018. Accessed at www.healthypeople.gov/2020/About-Healthy-People on 9 October 2018. Google Scholar
- 12.
Hinton G . Deep learning—a technology with the potential to transform health care. JAMA. 2018;320:1101-2. [PMID: 30178065] doi:10.1001/jama.2018.11100 CrossrefMedlineGoogle Scholar - 13.
Escobar GJ ,Turk BJ ,Ragins A ,Ha J ,Hoberman B ,LeVine SM ,et al . Piloting electronic medical record-based early detection of inpatient deterioration in community hospitals. J Hosp Med. 2016;11 Suppl 1:S18-24. [PMID: 27805795] doi:10.1002/jhm.2652 CrossrefMedlineGoogle Scholar - 14.
Bates DW ,Zimlichman E . Finding patients before they crash: the next major opportunity to improve patient safety [Editorial]. BMJ Qual Saf. 2015;24:1-3. [PMID: 25249637] doi:10.1136/bmjqs-2014-003499 CrossrefMedlineGoogle Scholar - 15.
Bates DW ,Saria S ,Ohno-Machado L ,Shah A ,Escobar G . Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Millwood). 2014;33:1123-31. [PMID: 25006137] doi:10.1377/hlthaff.2014.0041 CrossrefMedlineGoogle Scholar - 16.
Lyell D ,Coiera E . Automation bias and verification complexity: a systematic review. J Am Med Inform Assoc. 2017;24:423-31. [PMID: 27516495] doi:10.1093/jamia/ocw105 CrossrefMedlineGoogle Scholar - 17.
Drew BJ ,Harris P ,Zègre-Hemsey JK ,Mammone T ,Schindler D ,Salas-Boni R ,et al . Insights into the problem of alarm fatigue with physiologic monitor devices: a comprehensive observational study of consecutive intensive care unit patients. PLoS One. 2014;9:e110274. [PMID: 25338067] doi:10.1371/journal.pone.0110274 CrossrefMedlineGoogle Scholar - 18.
Epstein AM ,Stern RS ,Tognetti J ,Begg CB ,Hartley RM ,Cumella E ,et al . The association of patients' socioeconomic characteristics with the length of hospital stay and hospital charges within diagnosis-related groups. N Engl J Med. 1988;318:1579-85. [PMID: 3131674] CrossrefMedlineGoogle Scholar - 19.
Cook NR . Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928-35. [PMID: 17309939] CrossrefMedlineGoogle Scholar - 20.
Cook NR . Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008;54:17-23. [PMID: 18024533] CrossrefMedlineGoogle Scholar - 21.
Keane PA ,Topol EJ . With an eye to AI and autonomous diagnosis. NPJ Digit Med. 28 August 2018;1:40. CrossrefMedlineGoogle Scholar - 22.
Muntner P ,Colantonio LD ,Cushman M ,Goff DC ,Howard G ,Howard VJ ,et al . Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations. JAMA. 2014;311:1406-15. [PMID: 24682252] doi:10.1001/jama.2014.2630 CrossrefMedlineGoogle Scholar - 23.
Beam AL ,Kohane IS . Big data and machine learning in health care. JAMA. 2018;319:1317-8. [PMID: 29532063] doi:10.1001/jama.2017.18391 CrossrefMedlineGoogle Scholar - 24.
Rajkomar A ,Oren E ,Chen K ,Dai AM ,Hajaj N ,Hardt M ,et al . Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 8 May 2018;1:18. CrossrefMedlineGoogle Scholar - 25.
Cabitza F ,Rasoini R ,Gensini GF . Unintended consequences of machine learning in medicine. JAMA. 2017;318:517-8. [PMID: 28727867] doi:10.1001/jama.2017.7797 CrossrefMedlineGoogle Scholar - 26.
Owens DK ,Sox HC . Biomedical decision making: probabilistic clinical reasoning. In: Shortliffe EH, Cimino JJ, eds. Biomedical Informatics. London: Springer-Verlag; 2014:67-107. Google Scholar - 27.
Manrai AK ,Patel CJ ,Ioannidis JPA . In the era of precision medicine and big data, who is normal? JAMA. 2018;319:1981-2. [PMID: 29710130] doi:10.1001/jama.2018.2009 CrossrefMedlineGoogle Scholar - 28. Ferryman K, Pitcan M. Fairness in precision medicine. Data & Society. 2018. Accessed at https://datasociety.net/research/fairness-precision-medicine on 31 May 2018. Google Scholar
- 29.
Howell EA ,Brown H ,Brumley J ,Bryant AS ,Caughey AB ,Cornell AM ,et al . Reduction of peripartum racial and ethnic disparities: a conceptual framework and maternal safety consensus bundle. Obstet Gynecol. 2018;131:770-82. [PMID: 29683895] doi:10.1097/AOG.0000000000002475 CrossrefMedlineGoogle Scholar - 30.
Gianfrancesco MA ,Tamang S ,Yazdany J ,Schmajuk G . Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018. [PMID: 30128552] doi:10.1001/jamainternmed.2018.3763 CrossrefMedlineGoogle Scholar - 31.
Veinot TC ,Mitchell H ,Ancker JS . Good intentions are not enough: how informatics interventions can worsen inequality. J Am Med Inform Assoc. 2018;25:1080-8. [PMID: 29788380] doi:10.1093/jamia/ocy052 CrossrefMedlineGoogle Scholar - 32.
Insel TR . Digital phenotyping: technology for a new science of behavior. JAMA. 2017;318:1215-6. [PMID: 28973224] doi:10.1001/jama.2017.11295 CrossrefMedlineGoogle Scholar - 33.
Cohen IG ,Amarasingham R ,Shah A ,Xie B ,Lo B . The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Aff (Millwood). 2014;33:1139-47. [PMID: 25006139] doi:10.1377/hlthaff.2014.0048 CrossrefMedlineGoogle Scholar - 34.
Chin MH ,Clarke AR ,Nocon RS ,Casey AA ,Goddu AP ,Keesecker NM ,et al . A roadmap and best practices for organizations to reduce racial and ethnic disparities in health care. J Gen Intern Med. 2012;27:992-1000. [PMID: 22798211] doi:10.1007/s11606-012-2082-9 CrossrefMedlineGoogle Scholar - 35.
National Academies of Sciences, Engineering, and Medicine, ed . Systems Practices for the Care of Socially At-Risk Populations. Washington, DC: National Academies Pr; 2016. Google Scholar - 36.
Hardt M ,Price E ,Srebro N . In: Proceedings from the Conference on Advances in Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2017. La Jolla, CA: Neural Information Processing Systems; 2017:3315-23. Google Scholar - 37.
Chouldechova A . Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data. 2017;5:153-63. [PMID: 28632438] doi:10.1089/big.2016.0047 CrossrefMedlineGoogle Scholar - 38.
Woodworth B ,Gunasekar S ,Ohannessian MI ,Srebro N . Learning non-discriminatory predictors. Proc Mach Learn Res. 2017;65:1920-53. Google Scholar - 39. Beutel A, Chen J, Zhao Z, Chi EH. Data decisions and theoretical implications when adversarially learning fair representations. Accessed at https://arxiv.org/pdf/1707.00075.pdf on 8 May 2018. Google Scholar
- 40. Zhang BH, Lemoine B, Mitchell M. Mitigating unwanted biases with adversarial learning. Artificial Intelligence, Ethics, and Society. 2018. Accessed at http://arxiv.org/abs/1801.07593 on 8 May 2018 Google Scholar
- 41.
Platt JC . Probabilities for SV machines. In: Smola AJ, Bartlett PJ, Schuurmans D, Schölkopf B, eds. Advances in Large Margin Classifiers. Cambridge, MA: MIT Pr; 1999:61-74. Google Scholar - 42.
Dhruva SS ,Mazure CM ,Ross JS ,Redberg RF . Inclusion of demographic-specific information in studies supporting US Food & Drug Administration approval of high-risk medical devices. JAMA Intern Med. 2017;177:1390-1. [PMID: 28738116] doi:10.1001/jamainternmed.2017.3148 CrossrefMedlineGoogle Scholar - 43.
Pleiss G ,Raghavan M ,Wu F ,Kleinberg J ,Weinberger KQ . On fairness and calibration. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al, eds. Proceedings from the Conference on Advances in Neural Information Processing Systems 2017, Long Beach, California, 4–9 December 2017. La Jolla, CA: Neural Information Processing Systems; 2017:5680-9. Google Scholar - 44.
Kilbertus N ,Rojas-Carulla M ,Parascandolo G ,Hardt M ,Janzing D ,Schölkopf B . Avoiding discrimination through causal reasoning. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al, eds, eds. Proceedings from the Conference on Advances in Neural Information Processing Systems 2017, Long Beach, CA, 4–9 December 2017. La Jolla, CA: Neural Information Processing Systems; 2017:656-66. Google Scholar - 45.
Schulman KA ,Berlin JA ,Harless W ,Kerner JF ,Sistrunk S ,Gersh BJ ,et al . The effect of race and sex on physicians' recommendations for cardiac catheterization. N Engl J Med. 1999;340:618-26. [PMID: 10029647] CrossrefMedlineGoogle Scholar - 46.
Dieterich W ,Mendoza C ,Brennan T . COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity. Golden, CO: Northpointe; July 2016. Google Scholar - 47.
Adamson AS ,Smith A . Machine learning and health care disparities in dermatology. JAMA Dermatol. 2018. [PMID: 30073260] doi:10.1001/jamadermatol.2018.2348 CrossrefMedlineGoogle Scholar - 48.
Gloster HM ,Neal K . Skin cancer in skin of color. J Am Acad Dermatol. 2006;55:741-60. [PMID: 17052479] CrossrefMedlineGoogle Scholar - 49.
Goroll AH . Emerging from EHR purgatory—moving from process to outcomes. N Engl J Med. 2017;376:2004-6. [PMID: 28538132] doi:10.1056/NEJMp1700601 CrossrefMedlineGoogle Scholar - 50.
Portela MC ,Pronovost PJ ,Woodcock T ,Carter P ,Dixon-Woods M . How to study improvement interventions: a brief overview of possible study types. BMJ Qual Saf. 2015;24:325-36. [PMID: 25810415] doi:10.1136/bmjqs-2014-003620 CrossrefMedlineGoogle Scholar - 51.
Poplin R ,Varadarajan AV ,Blumer K ,Liu Y ,McConnell MV ,Corrado GS ,et al . Predicting cardiovascular risk factors from retinal fundus photographs using deep learning. Nat Biomed Eng. 2017;2:158-64. CrossrefGoogle Scholar
Author, Article, and Disclosure Information
Alvin Rajkomar,
Google, Mountain View, California, and University of California, San Francisco, San Francisco, California (A.R.)
Google, Mountain View, California (M.H., M.D.H., G.C.)
University of Chicago, Chicago, Illinois (M.H.C.)
Acknowledgment: The authors thank Meredith Whittaker, Roxanne Pinto, Moritz Hardt, Gerardo Flores, Charina Chou, Kathryn Rough, Ashley Hayes, Jutta Williams, Katherine Chou, Andrew Smart, Alex Beutel, Valeria Espinosa, Adam Sadilek, Kaspar Molzberger, and Yoni Halpern for insightful comments on the interplay of fairness in building and deploying machine learning and for helpful comments on early versions of the manuscript. They also thank John Fahrenbach, James Williams, and their colleagues from the Chicago Center for Diabetes Translation Research for providing critical analysis of the ideas of the manuscript applied to real clinical prediction tasks.
Grant Support: Dr. Chin was supported in part by the Chicago Center for Diabetes Translation Research (grant NIDDK P30 DK092949), Robert Wood Johnson Foundation Finding Answers: Solving Disparities Through Payment and Delivery Reform Program Office, and Merck Foundation Bridging the Gap: Reducing Disparities in Diabetes Care National Program Office.
Disclosures: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M18-1990.
Corresponding Author: Alvin Rajkomar, MD, Google LLC, 1600 Amphitheatre Way, Mountain View, CA 94043; e-mail, alvinrajkomar@google.
Current Author Addresses: Drs. Rajkomar, Hardt, Howell, and Corrado: Google LLC, 1600 Amphitheatre Way, Mountain View, CA 94043.
Dr. Chin: University of Chicago, 5841 South Maryland Avenue, MC2007, Chicago, IL 60637.
Author Contributions: Conception and design: A. Rajkomar, M. Hardt.
Analysis and interpretation of the data: A. Rajkomar, M. Hardt.
Drafting of the article: A. Rajkomar, M. Hardt, M.D. Howell.
Critical revision for important intellectual content: A. Rajkomar, M. Hardt, M.D. Howell, G. Corrado, M.H. Chin.
Final approval of the article: A. Rajkomar, M. Hardt, M.D. Howell, G. Corrado, M.H. Chin.
Statistical expertise: A. Rajkomar, M. Hardt.
Administrative, technical, or logistic support: M.D. Howell.
This article was published at Annals.org on 4 December 2018.
* Drs. Rajkomar and Hardt contributed equally to this work.
Submit a Comment
Contributors must reveal any conflict of interest. Comments are moderated. Please see our information for authorsregarding comments on an Annals publication.
*All comments submitted after October 1, 2021 and selected for publication will be published online only.