Academia and Clinic16 September 2003

Standardized Reporting of Clinical Practice Guidelines: A Proposal from the Conference on Guideline Standardization

FREE
    Author, Article, and Disclosure Information

    Abstract

    Despite enormous energies invested in authoring clinical practice guidelines, the quality of individual guidelines varies considerably. The Conference on Guideline Standardization (COGS) was convened in April 2002 to define a standard for guideline reporting that would promote guideline quality and facilitate implementation. Twenty-three people with expertise and experience in guideline development, dissemination, and implementation participated. A list of candidate guideline components was assembled from the Institute of Medicine Provisional Instrument for Assessing Clinical Guidelines, the National Guideline Clearinghouse, the Guideline Elements Model, and other published guideline models. In a 2-stage modified Delphi process, panelists first rated their agreement with the statement that [Item name] is a necessary component of practice guidelines on a 9-point scale. An individualized report was prepared for each panelist; the report summarized the panelist's rating for each item and the median and dispersion of rankings of all the panelists. In a second round, panelists separately rated necessity for validity and necessity for practical application. Items achieving a median rank of 7 or higher on either scale, with low disagreement index, were retained as necessary guideline components. Representatives of 22 organizations active in guideline development reviewed the proposed items and commented favorably. Closely related items were consolidated into 18 topics to create the COGS checklist. This checklist provides a framework to support more comprehensive documentation of practice guidelines. Most organizations that are active in guideline development found the component items to be comprehensive and to fit within their existing development methods.

    *For a list of participants in the COGS, see the Appendix.

    Evidence-based clinical practice guidelines can reduce the delivery of inappropriate care and support the introduction of new knowledge into clinical practice (1-3). In many cases, guidelines encapsulate the most current knowledge about best practices. Rigorously developed guidelines can translate complicated research findings into actionable recommendations for clinical care (4). Over the past decade, a plethora of guidelines has been created and published by a multitude of organizations at substantial cost.

    Despite the enormous energies invested in guideline authoring, the quality of individual guidelines varies considerably. In its landmark report, the Institute of Medicine (IOM) defined 8 desirable attributes of clinical practice guidelines: validity, reliability and reproducibility, clinical applicability, clinical flexibility, clarity, documentation, development by a multidisciplinary process, and plans for review (5).

    However, critical information that would attest to validity or would document fulfillment of the other IOM criteria is regularly absent from published guidelines. In an evaluation of 279 guidelines developed by U.S. medical specialty societies, Shaneyfelt and colleagues (6) found that guidelines published in the peer-reviewed medical literature do not adhere to established methodologic standards. Likewise, Grilli and colleagues (7) found that of 431 guidelines produced by specialty societies, 82% did not apply explicit criteria to grade the scientific evidence that supported their recommendations, 87% did not report whether a systematic literature search was performed, and 67% did not describe the type of professionals involved in guideline development. Systematic reviews of guidelines for drug therapy (8), management of depression (9), and osteoporosis (10) have confirmed marked variation in quality. Both nonadherence to methodologic standards and failure to document development activities contribute to this variation.

    We convened the Conference on Guideline Standardization (COGS) to define a standard for guideline reporting that would promote guideline quality and facilitate implementation. The proposed standard provides a checklist of components necessary for evaluation of validity and usability. The checklist is intended to minimize the quality defects that arise from failure to include essential information and to promote development of recommendation statements that are more easily implemented.

    In contrast to other instruments that have been developed for post hoc evaluation of guideline quality, the COGS checklist is intended to be used prospectively by developers to improve their product by improving documentation. The COGS panel used a systematic and rigorous process to define content of the proposed standard and to achieve consensus. The COGS panel also included a wide variety of perspectives, deliberately bringing together representatives from medical specialty societies, government agencies, and private groups that develop guidelines; journal editors and the National Guideline Clearinghouse (NGC), which disseminate guidelines; guideline implementers, including managed care representatives and informaticians; and academicians.

    Methods

    We actively sought people with diverse backgrounds from wide-ranging geographic areas to participate in the meeting. Selection criteria for participants were 1) activity in a wide variety of guidelines initiatives, 2) recognition as leaders in their field, and 3) willingness to collaborate. To maximize interaction, the number of participants in the Conference was limited.

    We set as a task for the panelists the specification and definition of a set of necessary guideline components that should be considered for reporting in all evidence-based practice guidelines. We defined necessary items as those that establish the validity of the guideline recommendations or facilitate practical application of the recommendations. We noted that many additional items might be considered appropriate components in guidelines, but sought to define a minimal set of essential elements.

    We assembled a list of candidate guideline components from the IOM Provisional Instrument for Assessing Clinical Guidelines (5), the NGC (11), and the Guideline Elements Model (12), a hierarchical model of guideline content that was created from a systematic review of published guideline descriptions. These items were supplemented with items highlighted in the literature, for example, structured abstract (13) and conflict of interest (14).

    We applied a modified Delphi approach to help focus group discussion. This approach has been widely applied for evaluation of expert opinion on appropriateness (15) and medical necessity (16), for policy development (17), and for prioritization (18). The technique has been described in detail (19). In brief, after we secured agreement to participate in the COGS panel, we gave all participants a bibliography of resources regarding guideline quality and its appraisal, guideline implementation, and the modified Delphi consensus development approach. Panelists were asked to rate their agreement with the statement that [Item name] is a necessary component of practice guidelines on a 9-point scale. Rating an item with a 9 indicated strong agreement with the statement that this item was necessary. Rating an item with a 1 indicated strong disagreement with the statement and suggested that the item was absolutely unnecessary in a guideline. A rating of 5 indicated neutrality or indifference.

    We developed a password-protected Web site that panelists used to complete the first round of ratings online before the meeting. Online rating permitted accurate and efficient data capture and analysis. For each item, the median rating and the disagreement index (defined as the Interpercentile Range divided by the Interpercentile Range Adjusted for Symmetry) were calculated (19). The disagreement index, which can be calculated for panels of any size, describes the dispersion of ratings more effectively than the mean absolute deviation from the median. Index values greater than 1 indicate disagreement. We displayed summary statistics for each item on a form that was individualized for each panelist.

    The COGS meeting was held on 26 and 27 April 2002 in New Haven, Connecticut. In a colloquy facilitated by 2 of the authors, each candidate item was discussed to ensure that all participants agreed on its definition and potential contribution to the COGS checklist and to highlight empirical evidence of its value. When appropriate, additional items were added to the list. The group determined that in the second round of ratings, it would be valuable to rate each item's necessity on 2 subscales: necessity to establish validity and necessity for practical application. The participants then rated each item on these 2 dimensions.

    Analysis

    We tallied the median score and the disagreement index for each item. We retained items with median scores of 7 or higher and disagreement indexes less than 1.0 on either scale as necessary guideline components on the checklist. To interpret necessity, the 9-point scale is divided into 3 ranges: items scoring 1 to 3 are considered unnecessary, items scoring 4 to 6 are neutral, and items scoring 7 to 9 are considered necessary (20). In this study, the threshold of 7 was chosen because it represented the lowest rating at which the participants indicated that an item was necessary for inclusion in guidelines.

    Pilot Review

    To field test the proposed checklist, we surveyed organizations that were active in guideline development. We identified all organizations that met the NGC Web site criteria for guideline display on 12 July 2002. From that list, we selected organizations that had developed 10 or more guidelines displayed on the NGC Web site. We excluded 1) organizations that participated in development of the COGS checklist [because of potential bias] and 2) government agencies and organizations based outside the United States (for logistic reasons). A draft COGS checklist and a brief survey were sent to the people identified as being responsible for guideline development at each eligible organization.

    Results

    All 23 panelists submitted first- and second-round ballots. During the discussion, participants suggested consideration of 10 new items in the second round of balloting and refined definitions of several items. Thirty-six discrete items were considered necessary to establish guideline validity; they received ratings of 7 or greater and had disagreement indexes of less than 1. Twenty-four items were considered necessary for practical application of the guideline, each with a disagreement index less than 1. Several items were rated necessary on both dimensions. Overall, 44 discrete items were considered necessary.

    Closely related items were then consolidated into 18 topics to create the COGS checklist for reporting clinical guidelines (Table). Appendix Tables 1 and 2 present a complete listing of all items rated and their scores.

    Table. The COGS Checklist for Reporting Clinical Practice Guidelines
    Appendix Table 1. Items Ranked Necessary for Guideline Validity, Median Ratings, Distribution of Ratings by Tertiles, and Disagreement Index
    Appendix Table 2. Items Ranked Necessary for Guideline Usability, Median Ratings, Distribution of Ratings by Tertiles, and Disagreement Index

    Twenty-two organizations met eligibility criteria for evaluation of the draft checklist, and all completed the survey (100% response rate). Sixteen organizations (73%) responded that they believed the checklist would be helpful for creating more comprehensive practice guidelines, and an additional 2 organizations (9%) responded that it might be helpful. Nineteen respondents (86%) indicated that documenting the proposed items would fit within their organizations' guideline development methods. Fifteen (68%) stated that they would use the proposed checklist in guideline development and an additional 4 (18%) indicated that they might. Reasons for not using the COGS checklist included We already have a system that covers these points and we are producing guidelines that must be useful for clinicians and for people doing systems improvement work. Thus they must be succinct and brief. These characteristics cannot be achieved simultaneously with comprehensiveness.

    Discussion

    We created the COGS checklist to promote the systematic reporting of precise details that are critical for understanding a guideline's development, its recommendation statements, and potential issues in its application. This structured system for reporting is designed to reduce the considerable variation in guideline documentation conventions; it is not intended to dictate a particular guideline development methodology or to interfere with the creative process.

    One factor that contributes to the variation of guidelines is the fact that guideline development panels are commonly composed of people who are inexperienced in guideline authoring. New committees of experts are often assembled to address each discrete clinical topic. The COGS checklist represents a common framework for guideline developers to help ensure that important information is included in the guideline documentation, for journal editors and other disseminators to determine that a guideline draft is ready for publication, and for implementers to inform the selection process and to facilitate the representation and translation of recommendations into tools that influence clinical behavior.

    The COGS checklist specifies the nature of information that should be documented about guidelines, but the specification of appropriate content that might be used to fulfill a checklist item is intentionally left to developers and those interested in quality appraisal. Despite the label of necessary, it is entirely possible that some guidelines rightfully might not include content for every item, but they would address explicitly whether the guideline development team considered that item.

    Although many individual organizations have devised manuals and procedures for developing guidelines, we are unaware of any consensus standard that has been proposed for use prospectively to promote the development of high-quality guidelines. Graham and colleagues (21) identified 13 instruments for evaluation of clinical practice guidelines, all of which were intended for retrospective application to guidelines that had already been created and released. Inclusion criteria of the NGC (22) have defined a de facto documentation standard for organizations desiring to post their guidelines on this Internet-accessible repository. The guidelines are represented using a structured summary that describes a limited set of specific guideline attributes. In the United Kingdom, a central guideline appraisal service has been implemented to assess all guidelines funded by the National Health Service to help ensure that guidelines are sound before they are deployed (23). An evolving appraisal instrument has been in use since 1997 to help determine which guidelines should be commended for use (24). Cluzeau and Littlejohns found that guidelines were better documented since the appraisal instrument was introduced and attributed the improvement in part to the use of the tool as an aide-memoire by guideline developers (24). Their work has formed the basis for an instrument for guideline appraisal that was further refined and tested by the AGREE (Appraisal of Guideline Research and Evaluation) Collaboration, an international partnership of researchers and policymakers (25).

    Experience with the Consolidated Statement for Reporting Trials (CONSORT) statement exemplifies the benefits of standardization of documentation (26, 27). The CONSORT statement is mandated by many prominent journals, and investigators reporting clinical trials have increasingly used the statement to improve the comprehensiveness of their reports. CONSORT's effectiveness has been demonstrated by the fact that quality scores for reports of trials have improved in journals that require use of CONSORT, and specific threats to validity have diminished (28, 29). CONSORT was not intended to be used for quality appraisal of clinical trials. Likewise, we believe COGS can be used most effectively to identify necessary components that should be documented in guidelines but should not be used (alone) to judge guideline quality or adequacy.

    A potential but real concern is the additional workload that might be imposed on already overburdened guideline development organizations by the institution of new requirements. This concern must be weighed against the substantial benefits of adopting a uniform framework for guideline documentation. In general, the organizations surveyed found the checklist to fit within their guideline development methods. It may not be possible to generalize the high approval levels for the COGS checklist indicated by the survey respondents because only organizations with a high level of guideline productivity development were surveyed. Organizations that have produced fewer guidelines may find greater conflicts between their approaches and the documentation requirements proposed by COGS, and therefore, they might find more difficulty using the checklist.

    Another difficulty with using a standardized documentation instrument is that a one size fits all approach may not work well for all guideline developers because of their development process or the topic area on which the guideline focuses. Some guideline developers are moving toward establishing iterative and ongoing guideline development, where the guideline is a living document, constantly fine-tuned on the basis of evaluation of patient outcomes or performance measures. Other guideline developers may be concerned with very narrow, circumscribed areas (for example, oncology, radiology, or pathology) where several of the COGS items are not relevant. In addition, other countries may have differences that may limit the applicability of the COGS checklist to the guideline development and reporting process. Real value of the checklist in terms of better-quality guidelines and easier implementation has not been demonstrated.

    Like the CONSORT statement, the COGS checklist will probably require maintenance revisions. Members of the group have expressed an interest in meeting at intervals to consider the experience in application of COGS. Comments from users of the COGS checklist will be collected on the COGS Web site (ycmi.med.yale.edu/COGS). We plan to measure the effect of the COGS statement on guideline documentation using a before–after evaluation.

    An important goal of the COGS meeting was to begin a conversation among guideline developers, disseminators, and implementers about how guideline statements should be written to facilitate their implementation. Specifically, one of the purposes was to address problems with implementing guidelines electronically. Guideline authors strive to make recommendations that accurately reflect the scientific evidencesometimes intentionally introducing ambiguity into the recommendations to reflect their uncertainties. Often, developers use terms that are not clearly defined, thereby presenting difficulties when recommendations are integrated into clinical decision support systems (30). Implementers must resolve these ambiguities in order to devise pragmatic strategies that will effectively influence clinicians' behavior. A sustained and productive discussion among guideline developers, disseminators, implementers, and knowledge managers about critical guideline items and clear statement of decidable and executable recommendations will help to overcome major impediments to guideline use. That work is ongoing and will be reported in the future.

    Appendix: Participants in the Conference on Guideline Standardization

    Abha Agrawal, MD (Lown Cardiovascular Institute, Brookline, Massachusetts); David Bates, MD (Partners Healthcare System, Boston, Massachusetts); Stephen M. Downs, MD, MS (Indiana University School of Medicine, Indianapolis, Indiana); Robert A. Greenes, MD, PhD (Harvard Medical School, Boston, Massachusetts); Jeremy Grimshaw, MB, ChB (Ottawa Health Research Institute, Ottawa, Ontario, Canada); Carla Herrerias, MPH (American Academy of Pediatrics, Elk Grove Village, Illinois); Charles J. Homer, MD, MPH (National Initiative for Children's Healthcare Quality, Boston, Massachusetts); Bryant T. Karras, MD (University of Washington, Seattle, Washington); Christine Laine, MD, MPH (Annals of Internal Medicine, Philadelphia, Pennsylvania); Mauricio Leon, MD (IDX Systems Corporation, Seattle, Washington); Blackford Middleton, MD, MPH, MSc (Partners Healthcare Corporation, Boston, Massachusetts); Christel Mottur-Pilson, PhD (American College of Physicians, Philadelphia, Pennsylvania); J. Marc Overhage, MD, PhD (Indiana University School of Medicine, Indianapolis, Indiana); Ian Purves, MD (Sowerby Center for Health Informatics, Newcastle, United Kingdom); Paul G. Shekelle, MD, PhD (Southern California Evidence-based Practice Center, Los Angeles, Californa); Richard N. Shiffman, MD, MCIS (Yale University School of Medicine, New Haven, Connecticut); Jean Slutsky, PA, MSPH (Agency for Healthcare Research and Quality, Rockville, Maryland); Sharon Smart, MD (Sowerby Center for Health Informatics, Newcastle, United Kingdom); Leif I. Solberg, MD (HealthPartners Medical Group, Minneapolis, Minnesota); Harold Sox, MD (Annals of Internal Medicine, Philadelphia, Pennsylvania); Samson Tu, MS (Stanford University School of Medicine, Stanford, California); Steven H. Woolf, MD, MPH (Virginia Commonwealth University, Richmond, Virginia); and Stephanie Zaza, MD, MPH (Centers for Disease Control and Prevention, Atlanta, Georgia). Not all panelists voted to include all items on the checklist.

    References

    • 1. Grimshaw JMRussell ITEffect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet1993;342:1317-22. [PMID: 7901634] CrossrefMedlineGoogle Scholar
    • 2. Merritt TAPalmer DBergman DAShiono PHClinical practice guidelines in pediatric and newborn medicine: implications for their use in practice. Pediatrics1997;99:100-14. [PMID: 8989346] CrossrefMedlineGoogle Scholar
    • 3. Woolf SHGrol RHutchinson AEccles MGrimshaw JClinical guidelines: potential benefits, limitations, and harms of clinical guidelines. BMJ1999;318:527-30. [PMID: 10024268] CrossrefMedlineGoogle Scholar
    • 4. Haines AJones RImplementing findings of research. BMJ1994;308:1488-92. [PMID: 8019284] CrossrefMedlineGoogle Scholar
    • 5. Institute of MedicineGuidelines for Clinical Practice: From Development to Use. Washington, DC: National Academy Pr; 1992. Google Scholar
    • 6. Shaneyfelt TMMayo-Smith MFRothwangl JAre guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA1999;281:1900-5. [PMID: 10349893] CrossrefMedlineGoogle Scholar
    • 7. Grilli RMagrini NPenna AMura GLiberati APractice guidelines developed by specialty societies: the need for a critical appraisal. Lancet2000;355:103-6. [PMID: 10675167] CrossrefMedlineGoogle Scholar
    • 8. Graham IDBeardall SCarter AOGlennie JHebert PCTetroe JM What is the quality of drug therapy clinical practice guidelines in Canada? CMAJ2001;165:157-63. [PMID: 11501454] MedlineGoogle Scholar
    • 9. Littlejohns PCluzeau FBale RGrimshaw JFeder GMoran SThe quantity and quality of clinical practice guidelines for the management of depression in primary care in the UK. Br J Gen Pract1999;49:205-10. [PMID: 10343424] MedlineGoogle Scholar
    • 10. Cranney AWaldegger LGraham IMan-Son-Hing MByszewski AOoi DSystematic assessment of the quality of osteoporosis guidelines. BMC Musculoskelet Disord2002;3:20. [PMID: 12174195] CrossrefMedlineGoogle Scholar
    • 11. Guideline Summary Sheet. National Guideline Clearinghouse. Accessed at www.guideline.gov/about/GuidelineSummarySheet.aspx on 20 July 2003. Google Scholar
    • 12. Shiffman RNKarras BTAgrawal AChen RMarenco LNath SGEM: a proposal for a more comprehensive guideline document model using XML. J Am Med Inform Assoc2000;7:488-98. [PMID: 10984468] CrossrefMedlineGoogle Scholar
    • 13. Hayward RSWilson MCTunis SRBass EBRubin HRHaynes RBMore informative abstracts of articles describing clinical practice guidelines. Ann Intern Med1993;118:731-7. [PMID: 8460861] LinkGoogle Scholar
    • 14. Choudhry NKStelfox HTDetsky ASRelationships between authors of clinical practice guidelines and the pharmaceutical industry. JAMA2002;287:612-7. [PMID: 11829700] CrossrefMedlineGoogle Scholar
    • 15. Park REFink ABrook RHChassin MRKahn KLMerrick NJ Physician ratings of appropriate indications for six medical and surgical procedures. Am J Public Health1986;76:766-72. [PMID: 3521341] CrossrefMedlineGoogle Scholar
    • 16. Kahan JPBernstein SJLeape LLHilborne LHPark REParker L Measuring the necessity of medical procedures. Med Care1994;32:357-65. [PMID: 8139300] CrossrefMedlineGoogle Scholar
    • 17. Lavis JNAnderson GMAppropriateness in health care delivery: definitions, measurement and policy implications. CMAJ1996;154:321-8. [PMID: 8564901] MedlineGoogle Scholar
    • 18. Brook RHKamberg CJAppropriateness of the use of cardiovascular procedures: a method and results of this application. Schweiz Med Wochenschr1993;123:249-53. [PMID: 8446857] MedlineGoogle Scholar
    • 19. Fitch KBernstein SJAguilar MDBurnand BLaCalle JRLazaro P The RAND/UCLA Appropriateness Method User's Manual. Santa Monica, CA: RAND; 2001. Google Scholar
    • 20. Leape LLHilborne LHKahan JPStason WBPark REKamberg CJ Coronary artery bypass graft: a literature review and ratings of appropriateness and necessity. Report No. JRA-02. Santa Monica, CA: RAND; 1991. Google Scholar
    • 21. Graham IDCalder LAHebert PCCarter AOTetroe JMA comparison of clinical practice guideline appraisal instruments. Int J Technol Assess Health Care2000;16:1024-38. [PMID: 11155826] CrossrefMedlineGoogle Scholar
    • 22. National Guideline Clearinghouse Inclusion Criteria. Accessed at www.guideline.gov/contact/coninclusion.aspx on 20 July 2003. Google Scholar
    • 23. Cluzeau FALittlejohns PAppraising clinical practice guidelines in England and Wales: the development of a methodologic framework and its application to policy. Jt Comm J Qual Improv1999;25:514-21. [PMID: 10522232] MedlineGoogle Scholar
    • 24. Cluzeau FALittlejohns PGrimshaw JMFeder GMoran SEDevelopment and application of a generic methodology to assess the quality of clinical guidelines. Int J Qual Health Care1999;11:21-8. [PMID: 10411286] CrossrefMedlineGoogle Scholar
    • 25. AGREE (Appraisal of Guideline Research and Evaluation) Collaboration. Accessed at www.agreecollaboration.org on 21 August 2002. Google Scholar
    • 26. Moher DSchulz KFAltman DGThe CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Ann Intern Med2001;134:657-62. [PMID: 11304106] LinkGoogle Scholar
    • 27. Altman DGSchulz KFMoher DEgger MDavidoff FElbourne D The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med2001;134:663-94. [PMID: 11304107] LinkGoogle Scholar
    • 28. Egger MJni PBartlett CValue of flow diagrams in reports of randomized, controlled trials. JAMA2001;285:1996-9. [PMID: 11308437] CrossrefMedlineGoogle Scholar
    • 29. Moher DJones ALepage LUse of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA2001;285:1992-5. [PMID: 11308436] CrossrefMedlineGoogle Scholar
    • 30. Tierney WMOverhage JMTakesue BYHarris LEMurray MDVargo DL Computerizing guidelines to improve care and patient outcomes: the example of heart failure. J Am Med Inform Assoc1995;2:316-22. [PMID: 7496881] CrossrefMedlineGoogle Scholar

    Comments