To make informed decisions, patients need to understand what is likely to happen with and without treatment. It is widely accepted that natural frequencies (for example, 2 in 1000 persons) are the best way to communicate these absolute risks. Major organizations, such as the Cochrane Collaboration
(1, 2), the International Patient Decision Aid Standards Collaboration
(3), and the Medicines and Healthcare products Regulatory Agency
(4) (the United Kingdom's equivalent of the U.S. Food and Drug Administration) all recommend using natural frequencies to present absolute risks.
The evidence behind these recommendations, however, is limited and is extrapolated from studies in a very specialized context: Bayesian probability revisions in diagnostic testing. On the basis of the only randomized trial identified (which included 60 students)
(5), a 2004 systematic review recommended natural frequencies over percents
(6). A 2009 systematic review
(7) also considering only probability revisions did not recommend use of natural frequencies because it identified an additional series of randomized trials refuting the superiority of natural frequencies
(8). Most important, the only 2 direct tests of absolute risk formats for communicating treatment effects found small differences favoring percents
(9, 10). Because these trials tested only simple, artificial scenarios in highly educated, self-selected convenience samples (people actively seeking information on Harvard University's “Your Cancer Risk” Web site ([
www.diseaseriskindex.harvard.edu/update/]), the findings might not be true in more typical settings.
We conducted a randomized trial comparing comprehension of the benefits and harms of drugs when absolute risks are presented as natural frequencies, percents, or both. To test the formats among typical people facing typical decisions, we used familiar conditions, presented multiple absolute risks (because treatments have multiple benefits and harms), and recruited a nationally representative sample of U.S. adults.
Discussion
We found no evidence to support the assertion that natural frequency is the best format for communicating the benefits and harms of treatment. In fact, the percent format had slightly higher comprehension overall and at each level of numeracy and education. The combined percent plus natural frequency format was no better than the percent format alone. Comprehension of the variable frequency format was consistently lowest.
The use of natural frequencies instead of percents to communicate absolute risks has been promoted largely on intuitive and evolutionary grounds (the human mind developed the ability to learn over thousands of years by observing and counting things; in contrast, the science of probability is only a few hundred years old)
(5). However, the evidence supporting natural frequencies over percents for communicating to patients (based on 2 systematic reviews
[6, 7] and our English-language MEDLINE searches to April 2011) is limited to trials testing a specific skill: the ability to use conditional probabilities when interpreting diagnostic test results. In fact, the 2 trials testing absolute risk formats for communicating treatment effects found small differences for percents over natural frequencies
(9, 10). Nevertheless, even iconic “evidence-based” organizations have issued guidance promoting the use of natural frequencies over percents for communicating treatment effects.
Our findings challenge such guidance. They also refute the common assumption that percents should be avoided for expressing small probabilities (for example, <1%). We previously made this assumption after repeatedly finding that study participants had the most difficulty converting “1 in 1000” to “0.1%” in our 3-item numeracy test
(17)—a finding also observed in the current trial. The fact that comprehension of the 7 questions about low-probability events was the same in the percent group and the 2 natural frequency groups argues that the difficulty in converting between formats reflects trouble with manipulating decimal points rather than a comprehension problem.
Our study also highlights known problems with frequency formats. People get confused when the denominator changes—for example, deciding whether 1 in 130 or 1 in 236 is a larger number
(9, 18, 19). We wondered whether limiting denominator changes to orders of magnitude (for example, 100, 1000, and 10 000) and keeping denominators constant within rows of tables would minimize confusion and enhance the ability to discriminate between varying probabilities. Unfortunately, this format was still confusing.
Variable frequency formats may be confusing because the larger number in the denominator means a smaller probability. Another reason for confusion is “denominator neglect”
(20): People tend to focus on the numerator of a frequency and ignore the denominator. This problem is best illustrated with the variable frequency format in the cholesterol drug table, where the chance of serious muscle breakdown was 4 in 10 000 and the chance of liver inflammation was 1 in 100. Only 40% of participants in that group correctly identified serious muscle breakdown as the less common event. These incorrect responses probably reflect comparison of numerators (4 vs. 1) without considering the denominators (10 000 vs. 100).
Denominator neglect may cause problems even when the denominator is held constant, as in our natural frequency format (always x in 1000) because it magnifies numerically small effects. For example, the increase in diarrhea with the heartburn drug looked bigger to the natural frequency group (presented as “40 in 1000”) than the percent group (presented as “4%”). Heightened perception of adverse effects may explain why the natural frequency group had less enthusiasm for the heartburn drug than the percent group did. Denominator neglect matters when it leads people away from a good intervention because of a format distortion rather than a balanced weighing of benefits and harms.
In theory, combined formats should be best because they give people options and reinforce understanding by presenting the same data in different ways. We found that combined formats generally worked better than frequency formats alone: Comprehension was higher, and there was no evidence of denominator neglect. However, they worked no better than percents alone, and they are therefore probably not worth the additional visual clutter (they triple the number of values presented).
Our trial has several important strengths. We used a rich set of comprehension questions to assess understanding of both relative and absolute differences (including small and large magnitudes) within and across rows of a complex table. There were no study dropouts (randomization occurred after potential participants agreed to complete the online survey), and item nonresponse was low (≤5%). In contrast to much of the prior research using convenience samples, our participants were recruited from a large national research panel that, by design, can be weighted so that results are nationally representative (that is, they account for the sampling strategy and panel recruitment). Weighting accounts for study nonparticipation and adjusts the demographic characteristics of the panel members to match the U.S. population on age, sex, race, education, region, and metropolitan residence. Weighted and unweighted results were nearly identical. We chose to present unweighted results to preserve the simplicity of the randomized trial. The negligible effect of weighting
does suggest that the unweighted results are nationally generalizable.
Our findings should be interpreted in light of several limitations. First, comprehension was tested in a survey rather than in the setting of actual medical decisions. In addition, there is no clear standard for the level of comprehension needed to make an informed decision. That is why we judged comprehension in 3 ways: mean score, proportion of persons who “passed” the test, and the proportion that received an “A” grade. Because setting thresholds is inherently arbitrary, we adapted a familiar external benchmark—school grades (“passing” is >70% correct, and an “A” grade is >90%), a strategy we used previously
(21).
Finally, there is room for improvement. Even with the percent format, about one third of participants failed the comprehension test. This may in part reflect that participants were facing hypothetical decisions. Patients facing real decisions might have been more engaged and done better. However, part of the problem undoubtedly reflects a poor understanding of numbers. While it is tempting to conclude that none of the formats is adequate, it is important to consider the complexity of the tasks involved. Participants had to navigate complex tables to find and compare numbers. The National Assessment of Adult Literacy considers such tasks to be among the most difficult that they assess
(22). In the 2003 survey, only 53% of respondents could use a simple table to find and compare bank interest rates (requiring either subtracting or dividing 2 numbers). The fact that pass rates in our trial increased directly with education (ranging from 62% for persons with high school education or less to 85% for those with a postgraduate degree) and with numeracy (ranging from 56% for persons with low numeracy to 92% for those with the highest numeracy) highlights that although data formats matter, the main underlying need is for better education. Fortunately, evidence indicates that even a simple educational intervention can help
(21). It is also possible that comprehension would improve over time through regular exposure to absolute risks in standardized formats. Leaders in risk communication believe that it is incumbent on policymakers to move in this direction to help the public make decisions in their own interest
(23).
People who are trying to communicate data about benefit and harm to the public, patients, physicians, and policymakers must choose a format for absolute risks. Our trial shows that they should avoid variable frequencies and that they should no longer accept the assertion that natural frequencies are the best format. On the basis of our findings, we believe that the percent format is probably best. It is more succinct (requiring one half as many numbers) and slightly better than the natural frequency format, particularly for communicating the most basic data needed to compare treatment effects: absolute differences.
Choice of Methods in Presenting Scenarios to Patients
Dr. Nardone's suggestion is intended to help people develop a sense of numbers using concrete images.
For this technique to work, though, it would be important to use images familiar to the target audience. We doubt that most Americans know how many soldiers are in a batallion (we don't)? Or the capacity of the field for the local AAA baseball team (we don't even know if we have a AAA team)? Or even the number of US Senators (we knew this one)?
Even with familiar images, though, there may still be problems. Changing the denominators to accommodate chances of different magnitude may undermine communication. In our trial, people had the most trouble understandingthe variable frequency format where denominators changed by orders of magnitude (e.g. 100, 1,000, 10,000).
While Dr. Nardone's approach may be useful to teach concepts, it would be not be feasible for the kinds of applications we envision: efficiently summarizing the multiple benefits and harms of medical interventions.
Steven Woloshin, MD, MS and Lisa M. Schwartz, MD, MS
VA Outcomes Group, White River Jct., VT and the Dartmouth Institute for Health Policy and Clinical Practice
Conflict of Interest:
None declared