This comparative effectiveness review summarizes the benefits and harms associated with commercially available, FDA-approved FGAs and SGAs. Broad inclusion criteria were used for comparisons among FGAs and SGAs, patients, and study outcomes to address the diversity of previously published reviews.
Methods
We followed an open process for this review with input from various stakeholders, including the public
(20), and a protocol that followed standards for systematic reviews
(21–23). A full technical report with detailed search strategies, methods, and evidence tables is available from the Agency for Healthcare Research and Quality
(21).
Literature Search
We conducted comprehensive searches in MEDLINE (
Appendix Table 2), EMBASE, PsycINFO, International Pharmaceutical Abstracts, CINAHL, ProQuest Dissertations and Theses—Full Text, the Cochrane Central Register of Controlled Trials, and Scopus for studies published from 1950 to March 2012. For adverse events, we also searched the U.S. National Library of Medicine's TOXLINE and the MedEffect Canada Adverse Reaction Database.
We hand-searched proceedings from the annual meetings of the American Psychiatric Association (2008–2010) and the International College of Neuropsychopharmacology (2008–2010). We searched clinical trial registries and contacted experts in the field and authors of relevant studies. We retrieved new drug applications for each of the included interventions from the FDA Web site. We reviewed the reference lists of reviews, guidelines, and new drug applications and searched for articles citing relevant studies using Scopus Citation Tracker.
Study Selection
Two reviewers independently screened titles and abstracts. We retrieved the full text of potentially relevant studies. Two reviewers independently reviewed each article using a standardized form with a priori eligibility criteria (
Appendix Table 3). We resolved discrepancies through discussion or third-party adjudication. We included studies if they were randomized, controlled trials (RCTs); were nonrandomized, controlled trials (non-RCTs); were cohort studies with a minimum follow-up of 2 years; included adults aged 18 to 64 years with schizophrenia or related psychoses; compared a commercially available FDA-approved FGA with an FDA-approved SGA; and provided data on illness symptoms (
Appendix Table 4) or the following adverse events: diabetes mellitus, death, tardive dyskinesia, or a major metabolic syndrome.
Quality Assessment and Rating the Body of Evidence
Two reviewers independently assessed the methodological quality of included studies and resolved disagreements through discussion. We assessed RCTs and non-RCTs using the Cochrane Risk of Bias Tool
(22) and cohort studies using the Newcastle–Ottawa Scale
(24).
Two reviewers independently evaluated strength of evidence using the Grading of Recommendations Assessment, Development and Evaluation approach of the Evidence-based Practice Center Program and resolved discrepancies through discussion
(25). We examined 4 domains: risk of bias, consistency, directness, and precision. Within the grading system, randomized trials always begin with a “high” strength of evidence that can be downgraded on the basis of shortcomings in the body of evidence (for example, overall risk of bias, inconsistency between study results, indirectness of the measured outcomes, and imprecision of the pooled estimate). In contrast, observational studies (for example, cohort studies) begin with a “low” strength of evidence that can be further downgraded (similar to randomized trials) but can also, in rare cases, be upgraded. We assigned an overall grade of “high,” “moderate,” “low,” or “insufficient” strength of evidence. We graded core illness symptoms in the categories of positive symptoms, negative symptoms, general psychopathology, and global ratings or total scores (typically a compilation of positive and negative symptoms or general psychopathology, which included these symptoms plus mood states). We provided a grade for each scale that was reported in the relevant studies. We also graded the adverse events listed in the previous section.
Data Extraction
Two reviewers independently extracted data using standardized forms and resolved discrepancies by referring to the original report. We extracted information on study characteristics, populations, interventions, outcomes, and results. Primary outcomes were improved core symptoms of illness (positive and negative symptoms and general psychopathology) and 4 adverse events specified a priori. Secondary outcomes included functional outcomes; health care system use; response, remission, and relapse rates and medication adherence; health-related quality of life; other patient-oriented outcomes (for example, patient satisfaction); and general and specific measures of other adverse events (for example, extrapyramidal symptoms and weight gain).
When studies incorporated multiple relevant treatment groups or multiple follow-up periods, we extracted data from all groups for the longest follow-up period. In cases of multiple reports of the same study, we referenced the primary, or most relevant, study and extracted additional data from companion reports.
Data Analysis
We conducted meta-analyses in RevMan, version 5.01 (The Cochrane Collaboration, Nordic Cochrane Centre, Copenhagen, Denmark), using a random-effects model
(26) when studies were sufficiently similar in terms of design, population, interventions, and outcomes. We combined risk ratios for dichotomous outcomes using the DerSimonian and Laird random-effects model and combined continuous outcomes using mean differences with 95% CIs. We quantified statistical heterogeneity using the
I 2 statistic. For trials with multiple study groups, we pooled the data for all relevant groups in the same trial before including the study in any meta-analysis so that the same groups were never represented more than once in any given meta-analysis. Where measures of variance were not reported in the studies, we imputed the variance from the largest reported SD in the given meta-analysis.
We conducted subgroup and sensitivity analyses for illness or disorder subtypes, sex, age group (18 to 35 years, 36 to 54 years, and 55 to 64 years), race, comorbid conditions, drug dosage, follow-up period, previous exposure to antipsychotics, treatment of a first episode versus prior episodes, and treatment resistance. Details of these analyses are presented in the appendices to the full technical report. We report subgroup and sensitivity analyses if there was substantial heterogeneity (
I 2 ≥ 50%). For comparisons with at least 10 studies, we assessed publication bias using funnel plots and statistical tests
(27–29). For our primary outcome of core symptoms, we considered a difference of 20% to be clinically important
(7, 30). We calculated absolute differences (that is, risk differences) for adverse events to enhance interpretation of results.
Role of the Funding Source
The Agency for Healthcare Research and Quality suggested the initial questions and approved copyright assertion for the manuscript but did not participate in the literature search, data analysis, or interpretation of the results.
Discussion
Despite FGAs and SGAs being a mainstay in the treatment of schizophrenia in adults, questions remain about whether and how the various commercially available medications differ in efficacy and safety profiles
(1–6). This review provides a comprehensive synthesis of the evidence on the comparative benefits and harms of FDA-approved FGAs and SGAs. We used a broad approach to inclusion criteria for comparisons, patients, and study outcomes to bring together the diversity of previously published reviews and provide a broader perspective on evidence in the field
(1, 7–19).
We identified a large number of relevant studies (114 studies and 22 different comparisons), the majority of which were efficacy trials
(146). The most frequent comparisons involved haloperidol and risperidone (40 studies) or olanzapine (35 studies); however, the number of studies available for each comparison and outcome was often limited.
Overall, we found few differences of clinical importance between the active drugs; however, this does not imply that they are equivalent. The strength of evidence from these studies was generally low or insufficient, with considerable variation in scales and subscales used to measure symptoms. This heterogeneity, coupled with the small number of studies within specific comparisons, suggests that there is insufficient power to explain some of the negative findings and precludes firm conclusions that are needed for front-line clinical decision making.
At this time, evidence supporting the use of SGAs for negative symptoms is stronger than that supporting their use for positive symptoms; olanzapine and risperidone were found to be more efficacious than haloperidol in reducing such symptoms as blunted affect and withdrawal. This effect, however, was not observed for improving overall (global) functioning and general psychopathology. Contrary to recent reviews
(7, 8), we found no evidence of benefit in improving symptoms with clozapine compared with haloperidol, although moderate-strength evidence showed benefits for clozapine compared with chlorpromazine. Differences in study inclusion criteria between our review and previously published reviews probably account for the different outcomes, with our review including more studies from which to base conclusions. In light of the totality of evidence in this review, the ample low-quality evidence showing no difference between haloperidol and various SGAs in improving symptoms provides an inadequate evidence base to advocate for one medication over another.
The data for adverse events were of low to insufficient strength, suggesting the need for a more focused evaluation of drug safety. Despite our efforts to identify long-term safety data from observational studies, only 2 retrospective cohort studies provided follow-up data at least 2 years in duration. Short-term efficacy trials, which are accepted by the regulatory authorities, may not identify time-dependent adverse events, such as tardive dyskinesia, diabetes mellitus, the metabolic syndrome, or death. Although few studies measured mortality, some evidence suggests that treatment with FGAs or SGAs is no different after immediate use (within 24 hours) or long-term use (>12 months). The strength of evidence for other mortality-related outcomes (such as suicide-related behaviors, which is a risk in this clinical population)
(147–149) was insufficient to draw conclusions.
We found low-strength evidence for an increased incidence of the metabolic syndrome with use of olanzapine. In general, most studies showed no difference between FGAs and SGAs in terms of increased risk for the metabolic syndrome or diabetes mellitus; however, the strength of evidence was usually insufficient. Although the methodological and reporting limitations of these studies make conclusions about these outcomes premature
(150), several reviews have identified clozapine and olanzapine as contributing to greater weight gain
(7, 151–153), but this may not necessarily translate into increased risk for more severe outcomes. Further study of this trajectory is warranted with higher-quality longitudinal studies.
Our results are consistent with those of CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness)
(2), a widely cited trial in this field. CATIE was designed to evaluate whether FGAs were inferior to SGAs in efficacy and safety. Findings from CATIE suggested that the FGA perphenazine and various SGAs (olanzapine, quetiapine, risperidone, and ziprasidone) differed more in their adverse effect profiles than in their therapeutic effect profiles. The study, like this review, also showed that effectiveness across medications varied and that the difference was clinically important in some cases.
Our results are also similar to those of a recent systematic review of SGAs versus FGAs, although our review is broader in scope in terms of medications included, patient populations, and outcomes
(1). There were several methodological differences between the previous review and this one: The previous review included non–FDA-approved antipsychotics, restricted the analysis to only double-blind trials, included only studies examining optimum SGA dosage and oral route of administration, pooled data across efficacy outcome measures, and pooled different FGAs. The different methodologies may have led to slightly different conclusions about individual SGAs.
One of the unique features of our review is the strength-of-evidence assessments, which provide information on the level of confidence one can place on the results of existing studies. In most cases, the strength of evidence was insufficient or low, highlighting the likelihood that future research may change the estimates of effect and the need for a stronger evidence base to inform clinical practice. Current treatment guidelines from the American Psychiatric Association for patients with schizophrenia provide specific recommendations on medication timing (for example, acute phase or first episode) but broad variables for medication options
(154). This approach may reflect the current state of evidence for FGAs and SGAs, and as stronger evidence emerges, it may come to reflect more specific recommendations for prescribing physicians.
There were limitations in the design and quality of the primary studies. Most studies were short-term RCTs, often with an a priori hypothesis that the SGA would be more efficacious
(155). Most trials did not sufficiently report methods to prevent selection and performance bias. Few trials reported blinding study investigators and participants; single-blinded and open-label trials in this field have been found to favor SGAs over FGAs
(1). Furthermore, the individual studies and, in many cases, the pooled results may not have sufficient power to detect equivalence or noninferiority between drugs.
Most studies in this review were industry-funded (69%), which can increase the chance of proindustry findings
(156). Funding was not disclosed for 19% of studies, highlighting the need for transparency in reporting the nature and extent of financial support. The choice of medication comparisons, dosages, and outcomes in the studies included in this review may have been driven by the funder's interests and priorities. Publication and reporting of select comparisons and outcomes are other potential limitations of this body of evidence.
Few studies provided evidence for comparable patient populations. We found notable heterogeneity across studies for disorder subtypes, comorbid drug or alcohol use, treatment resistance, and number of previous episodes, which result in differential response to treatment. Furthermore, many studies were highly selective in patient enrollment, which may increase the likelihood of drug benefit and decrease the likelihood of adverse events. Detailed subgroup analyses are reported elsewhere
(21). Characteristics of the research, including drug dosages (for example, lower doses of FGAs in more recent studies) and patient populations (for example, fewer patients already exposed to FGAs or proven treatment resistance to FGAs in recent studies), also changed over time. Finally, differences in medication comparisons and dosage and outcome measurement limited our synthesis, and outcomes that are important for understanding medication adherence and persistence (a common clinical encounter in this patient population), such as sedation and restlessness, were rarely reported.
More longitudinal research is needed on the long-term safety of FGAs versus SGAs. Despite our efforts to identify long-term safety data from observational studies, only 2 retrospective cohort studies were identified. Consensus is needed on the most important comparisons between FGAs and SGAs for future studies. Short- and long-term evaluations with patient subpopulations, including those with medical and neurologic comorbid conditions, are needed. There is a need for studies investigating the influence of dose, age, and other factors, such as comorbid conditions, on serious adverse events, which would help estimate possible risks in specific patient populations. Future studies should also examine functional outcomes that are important to patients, including health-related quality of life, relationships, academic and occupational performance, and legal interactions.
Existing studies on the comparative effectiveness of individual FGAs and SGAs preclude drawing firm conclusions because of sparse data and imprecise effect estimates. There were relatively few differences of clinical importance among 114 studies. The current evidence base is inadequate for clinicians and patients to make informed decisions about treatment. Outcomes potentially important to patients were rarely assessed. Data on long-term safety are lacking and urgently needed.
Sponsorship of Research Articles Can Affect Findings
To the Editor:
In this (1) and in other contexts, the sponsorship of research articles has been shown to affect the findings. Can the authors say anything about the effects of Pharma sponsorship on the outcomes of the trials? This might be a particular problem with these vendors, as many have been paying billions of dollars in fines recently for a variety of behaviors related to promoting their products.
Thomas E. Finucane, MD
Johns Hopkins Bayview Medical Center
Reference
1. Lisa Hartling, PhD; Ahmed M. Abou-Setta, MD, PhD; Serdar Dursun, MD, PhD; Shima S. Mousavi, MD; Dion Pasichnyk, BSc; and Amanda S. Newton, RN, PhD Antipsychotics in Adults With Schizophrenia: Comparative Effectiveness of First-Generation Versus Second-Generation Medications: A Systematic Review and Meta-analysis . Ann Intern Med. 2 October 2012;157(7):498-511
Sponsorship of Research Articles Can Affect Findings: RESPONSE FROM AUTHORS
There is a strong body of evidence to support Dr. Finucane’s observation that sponsorship of research articles has been shown to affect reported results (e.g., 1). In particular, there is evidence showing that industry-sponsored studies may show results supporting the industry’s product. An important task for systematic reviewers is to explore variables that may affect the validity of the reported results, including the source of funding. In our systematic review comparing first-generation with second-generation antipsychotics (2,3), we found that 70% of the included studies were industry-funded. We also noted that funding source was not disclosed for 19% of studies.
In our review we discussed that the choice of medication comparisons, dosages and outcomes may be driven by the funder’s interests and priorities (3). We conducted extensive subgroup and sensitivity analyses to examine variables that could result in differential treatment effects. These analyses included source of funding, risk of bias, patient characteristics (e.g., race, treatment resistance), treatment characteristics (e.g., dose), and study methods (e.g., duration of followup).
We found some differences when studies were grouped according to whether or not they were supported by industry funding. The patterns were not consistent and in some cases few studies were available for the “no industry” subgroup; therefore, it was difficult to discern whether no differences within this subgroup were due to the funding source or simply inadequate statistical power to detect differences. These analyses are detailed in an appendix to our full technical report (3).
The results of our review should be interpreted within the context that most studies contributing to this body of evidence were industry-funded. Our inability to draw firm clinical conclusions stemmed from inconsistencies across studies in terms of treatment comparisons, the outcomes assessed, how outcomes were measured, and patient populations. Within this field, stakeholders need to reach consensus on the most important comparisons that will be most informative and provide the most valid and accurate information to inform clinical decisions. Further, decisions involving patients, their families and their physicians need to be made regarding the most important outcomes and these should be the focus of future research. As we have mentioned, longer-term studies are needed as well as examination of important patient subpopulations, such as those with medical and neurological comorbidities. As with all research, full disclosure of the source of funding and role of the funder in the design, analysis and reporting of results is essential to interpreting the evidence.
References
1. Sismondo S. Pharmaceutical company funding and its consequences: a qualitative systematic review. Contemp Clin Trials 2008;29(2):109-13. PMID:17919992.
2. Hartling L, Abou-Setta AM, Dursun S, Mousavi SS, Psichnyk D, Newton AS. Antipsychotics in adults with schizophrenia: comparative effectiveness of first-generation versus second-generation medications: a systematic review and meta-analysis. Ann Intern Med 2012 Aug 14. doi: 10.7326/0003-4819-157-7-201210020-00525. [Epub ahead of print]
3. Abou-Setta AM, Mousavi SS, Spooner C, Schouten JR, Pasichnyk D, Armijo-Olivo S, Beaith A, Seida JC, Dursun S, Newton AS, Hartling L. First-Generation Versus Second-Generation Antipsychotics in Adults: Comparative Effectiveness [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Aug.
Disclosures: Dr. Hartling: Contract (money to institution): AHRQ. Dr. Abou-Setta: Grant (money to institution): AHRQ. Dr. Dursun: Grants/grants pending (money to institution): CIHR-Canada, Norlien Foundation; Patents (planned, pending, or issued): sodium nitroprusside for the treatment of schizophrenia, in partnership with the University of Alberta, TEC Edmonton Office. Dr. Newton: Grant (money to institution): AHRQ; Other (money paid to author): University of Alberta Evidence-based Practice Center.