Reviews7 January 2020

Efficacy and Safety of Testosterone Treatment in Men: An Evidence Report for a Clinical Practice Guideline by the American College of Physicians

    Author, Article, and Disclosure Information



    Testosterone treatment rates in adult men have increased in the United States over the past 2 decades.


    To assess the benefits and harms of testosterone treatment for men without underlying organic causes of hypogonadism.

    Data Sources:

    English-language searches of multiple electronic databases (January 1980 to May 2019) and reference lists from systematic reviews.

    Study Selection:

    38 randomized controlled trials (RCTs) of at least 6 months' duration that evaluated transdermal or intramuscular testosterone therapies versus placebo or no treatment and reported prespecified patient-centered outcomes, as well as 20 long-term observational studies, U.S. Food and Drug Administration review data, and product labels that reported harms information.

    Data Extraction:

    Data extraction by a single investigator was confirmed by a second, 2 investigators assessed risk of bias, and evidence certainty was determined by consensus.

    Data Synthesis:

    Studies enrolled mostly older men who varied in age, symptoms, and testosterone eligibility criteria. Testosterone therapy improved sexual functioning and quality of life in men with low testosterone levels, although effect sizes were small (low- to moderate-certainty evidence). Testosterone therapy had little to no effect on physical functioning, depressive symptoms, energy and vitality, or cognition. Harms evidence reported in trials was judged to be insufficient or of low certainty for most harm outcomes. No trials were powered to assess cardiovascular events or prostate cancer, and trials often excluded men at increased risk for these conditions. Observational studies were limited by confounding by indication and contraindication.


    Few trials exceeded a 1-year duration, minimum important outcome differences were often not established or reported, RCTs were not powered to assess important harms, few data were available in men aged 18 to 50 years, definitions of low testosterone varied, and study entry criteria varied.


    In older men with low testosterone levels without well-established medical conditions known to cause hypogonadism, testosterone therapy may provide small improvements in sexual functioning and quality of life but little to no benefit for other common symptoms of aging. Long-term efficacy and safety are unknown.

    Primary Funding Source:

    American College of Physicians. (PROSPERO: CRD42018096585)

    Testosterone treatment is approved by the U.S. Food and Drug Administration (FDA) for replacement therapy for men with primary or secondary hypogonadism caused by disorders of the hypothalamus, pituitary gland, or testes, often classified as organic or classical hypogonadism (1). Testosterone treatment in these conditions is considered standard care for the development or maintenance of secondary sexual characteristics.

    Testosterone use in the United States, which has tripled in recent years, exceeds that in other countries (2–5). Much of the increase is in men with nonspecific symptoms, such as decreased energy, sexual function, and mobility, who have serum testosterone concentrations below the normal range or in the low-normal range for healthy young men for no apparent reason other than older age or comorbid conditions, such as obesity. Substantial proportions of U.S. men who receive testosterone therapy do not have testosterone levels tested before initiation of therapy (3, 6). The level of baseline testosterone that prompts initiation of such therapy varies widely: In 1 study (3), approximately 20% of men who had their testosterone level measured before initiating therapy had a level above 10.41 nmol/L (300 ng/dL).

    We evaluated the efficacy and harms of testosterone treatment in men without established conditions that cause permanent testicular or hypothalamic–pituitary dysfunction (for example, Klinefelter syndrome, orchitis, testicular trauma or radiation, or hypothalamic or pituitary tumors). This systematic review, conducted by the Evidence Synthesis Core at the Minneapolis VA Center for Care Delivery and Outcomes Research, served as the evidence base for a clinical practice guideline from the American College of Physicians. We did not address the appropriate diagnosis and evaluation of hypogonadism. Guidelines by the Endocrine Society (7) and American Urological Association (8) recommend that clinicians measure fasting morning concentrations of total testosterone on 2 occasions to diagnose hypogonadism but vary in their recommendations regarding the threshold at which to categorize a total testosterone level as low.


    Our protocol was registered in PROSPERO (CRD42018096585).

    Data Sources and Searches

    We searched MEDLINE, Embase, and the Cochrane Library for peer-reviewed randomized controlled trials (RCTs), cohort studies, and case–control studies published in English and indexed from January 1980 to May 2019. We also searched reference lists from relevant systematic reviews. Search terms included MeSH (Medical Subject Headings) terms and keywords pertaining to testosterone replacement, deficiency, and treatment (Appendix Table 1).

    Appendix Table 1. Search Strategies
    Study Selection

    Two investigators independently reviewed each study's abstract and full text to determine eligibility. Conflicts were resolved through discussion, with consultation of a third member if necessary.

    We included RCTs that assessed transdermal or intramuscular formulations of testosterone in adult men with at least 6 months of active treatment. Studies were included if they had a control group with placebo or no testosterone treatment, reported our efficacy outcomes of interest (sexual function, physical function, quality of life, mood [depression], fractures, energy or vitality, and cognition), and provided measurements of baseline total testosterone. We included 1 trial of 24 weeks' duration, TOM (Testosterone in Older Men with Mobility Limitations) (9, 10), because of its significance with respect to cardiovascular outcomes. This trial was stopped early because of concern regarding excess cardiovascular adverse events in the testosterone group. To assess serious but infrequent harms, we included observational studies with at least 1 year of follow-up and 500 participants. We excluded studies limited to men with inherited or acquired conditions known to cause permanent hypothalamic, pituitary, or testicular dysfunction and studies in which endogenous testosterone was artificially suppressed. We excluded studies evaluating oral testosterone because its use is contraindicated in men without structural or genetic causes of hypogonadism.

    Outcome Classification

    We classified the reported scales, domains, and questions into patient-centered constructs to aid in clinical interpretation. For sexual function, we focused on self-reported overall sexual function and erectile function. For physical function, we focused on subjective self-reported physical function as assessed by established instruments and objective physical function as measured by gait speed because of its well-established association with important health outcomes (11, 12). For quality of life, energy or vitality, and depression, we limited outcomes to those assessed using established instruments. Appendix Table 2 summarizes the instruments most commonly used to measure outcomes in the included RCTs. For cognition, we focused on measures of overall cognitive function and 6 established domains (attention, language, verbal memory, visuospatial memory, visuospatial function, and executive function) (Appendix Table 3).

    Appendix Table 2. Description of Commonly Reported Outcome Measures
    Appendix Table 2—Continued
    Appendix Table 3. Cognitive Outcomes

    Our primary harm outcomes were serious adverse events; a composite of adverse cardiovascular events, defined as cardiovascular death, nonfatal myocardial infarction, acute coronary syndrome, nonfatal stroke, revascularization, or heart failure exacerbation; prostate cancer; and mortality. We also assessed deep venous thrombosis or pulmonary embolism, worsening lower urinary tract symptoms, worsening sleep apnea, and withdrawals due to adverse events. We did not include intermediate measures (such as body composition, metabolic variables, hemoglobin levels, blood pressure, and prostate-specific antigen levels).

    Data Extraction and Quality Assessment

    Data extraction was completed by 1 reviewer and verified by a second. We extracted baseline characteristics of each study and population, including study design, age, race, comorbid conditions, and testosterone levels. Two investigators assessed risk of bias of RCTs using a modified Cochrane approach based on the following elements: sequence generation, allocation concealment, blinding, incomplete outcome data (attrition), and selective reporting (13). A high-risk study would have critical flaws in these elements or very high attrition (≥30%). Two investigators assessed risk of bias of observational studies; using a modification of the Agency for Healthcare Research and Quality approach, they reviewed studies for elements of selection, detection, attrition, and reporting bias (14).

    Data Synthesis and Analysis

    We organized tables by study design and testosterone formulation and sorted trials by risk-of-bias assessment. We pooled results from trials deemed to have low or medium risk of bias if populations and outcome measures were clinically comparable. Data were analyzed in R (R Foundation) (15, 16). Data for continuous efficacy outcomes were pooled using the Hartung–Knapp–Sidik–Jonkman method for random-effects models to calculate standardized mean differences (SMDs) with corresponding 95% CIs (17). However, this method for meta-analysis has been shown to underestimate uncertainty compared with the fixed-effects model when the number of trials is small, particularly if fewer than 5, and when no between-study variance exists (τ2 = 0) (17). In such cases, we meta-analyzed results with the fixed-effects model. We interpreted SMDs on the basis of Cohen's definition of small (0.2), medium (0.5), and large (0.8) effects (18). Scale scores with different directionality were adjusted to ensure that all scales pointed in the same direction when pooled. Categorical harm outcomes data were pooled using the Peto odds ratio (OR) method. Harms data were stratified by duration of follow-up in intervals of less than 12 months, 12 to less than 24 months, and 24 months or more. Absolute event rates and 95% CIs for the primary harm outcomes were pooled for each study group using the Freeman–Tukey double arcsine transformation (19). For all pooled analyses, we assessed the magnitude of statistical heterogeneity with the I2 statistic (I2 > 75% may indicate substantial heterogeneity) (20). Results were also stratified and assessed by baseline testosterone level and formulation. In sensitivity analyses, we included trials that were rated as having high risk of bias or inadequate blinding.

    Assessment of evidence certainty for our primary efficacy and harm outcomes was based on methods developed by the GRADE (Grading of Recommendations Assessment, Development and Evaluation) Working Group (21, 22). Two trained research associates graded certainty of evidence for each outcome as high, moderate, low, or very low by evaluating 4 critical domains (risk of bias, consistency, directness, and precision). Discrepancies in ratings of risk of bias and certainty of evidence were resolved by discussion, and a final determination was made through a consensus that included the principal investigator.

    Role of the Funding Source

    This review was funded by a contract with the American College of Physicians. The American College of Physicians Clinical Guidelines Committee assisted in the development of key questions, study inclusion criteria, and outcome measures of interest but was not involved in data collection, analysis, or manuscript preparation.


    We identified 38 RCTs (in 62 articles [9, 10, 23–54, 55–82]) and 20 observational studies (in 21 articles [83–103]) that met inclusion criteria (Figure 1). Baseline characteristics are reported in Supplement Table 1 (RCTs) and Supplement Table 2 (observational studies). We evaluated efficacy using results from the RCTs and safety using results from the RCTs and observational studies.

    Figure 1. Evidence search and selection.

    PAD = peripheral artery disease; RCT = randomized controlled trial.


    Sample sizes ranged from 10 (78) to 790 (32, 66, 71, 74, 82), and 15 trials enrolled at least 100 men. Most trials had less than 12 months of follow-up (k = 24); 3 studies reported follow-up to 36 months (25, 27, 49, 70, 72, 73, 81). Sixteen trials were from the United States, 14 were from Europe, and 8 were from Australia or Asia. Twenty-four studies reported at least partial industry sponsorship.

    Trials varied in their inclusion criteria. Most required a testosterone level below a stated threshold (k = 34); 29 based enrollment on total testosterone level, 2 on free testosterone level, and 3 on bioavailable testosterone, with variability in the method used to measure testosterone. Seven trials enrolled participants on the basis of a total testosterone level of at most 10.41 nmol/L (300 ng/dL), the threshold recommended by the American Urological Association to categorize a man as hypogonadal. Trials also varied in required symptoms for inclusion: Some required the presence of specific symptoms attributed to hypogonadism, whereas others did not. Two specifically required sexual symptoms (32, 60, 66, 71, 74, 82), 4 required physical or mobility limitations (9, 10, 32, 53, 65, 66, 71, 74, 75, 82), and others either required no specific symptoms or required the presence of at least 1 symptom attributed to hypogonadism.

    Mean baseline total testosterone levels were 10.41 nmol/L (300 ng/dL) or less in 20 studies and less than 9.54 nmol/L (275 ng/dL) in 11 trials (26, 30, 32, 41–46, 48, 50, 51, 58, 60, 63, 64, 66, 71, 74, 77, 79, 80, 82). Two reported only free testosterone levels (55, 67). Baseline testosterone level was greater than 13.88 nmol/L (400 ng/dL) in 5 studies (23, 24, 34, 40, 57, 76), 3 of which were restricted to men with specific underlying medical conditions and 1 of which was rated as having high risk of bias. Only 13 trials required 2 fasting morning testosterone levels, and of these only 2 (29, 30, 32, 66, 71, 74, 82) required 2 morning specimens with a testosterone level of 10.41 nmol/L (300 ng/dL) or less.

    The mean age across trials was 66 years. Eight studies restricted age to 65 years or older, and none were limited to (or predominantly enrolled) men younger than 50 years. Only 5 trials had an age threshold as low as 18 years. Race/ethnicity, comorbid conditions, and functional measures were infrequently reported (Supplement Table 1).

    Doses and formulations of testosterone varied: 19 trials used a transdermal formulation, whereas 19 used intramuscular injections. Nine adjusted dosing to achieve a targeted testosterone level (9, 10, 27, 32, 33, 35, 36, 42–47, 49, 66, 70–75, 82), and 29 used a fixed dose.

    Risk of bias was high in 11 trials, medium in 19, and low in 8 (Supplement Table 1). Many trials categorized as having high risk of bias had very high attrition rates (range, 30% to 56%). Two were open-label trials with no control group (55, 67), and another used a placebo gel while the active intervention was administered intramuscularly (26).

    Ten RCTs were limited to men with specific underlying conditions: heart failure (56, 62); chronic stable angina (59); chronic kidney disease requiring dialysis (31); cirrhosis (69); chronic obstructive pulmonary disease (76); respiratory, immune, or inflammatory disease requiring long-term glucocorticoid therapy (34); Alzheimer disease (78); mild cognitive impairment (33); and low circulating levels of insulin-like growth factor (40).

    Efficacy Outcomes

    We report overall findings for each outcome, excluding the 10 studies in special populations as well as those trials judged to have high risk of bias or inadequate blinding. We describe the results for analyses that pool data from trials of both intramuscular and transdermal preparations of testosterone. Supplement Table 3 shows results by formulation and for trials with a mean baseline testosterone level less than 10.41 nmol/L (300 ng/dL).

    The Table summarizes certainty of evidence for the primary efficacy outcomes; Supplement Table 4 shows these assessments by testosterone formulation.

    Table. Certainty of Evidence: Testosterone Therapy in Men

    The effect of testosterone did not differ significantly by testosterone formulation (intramuscular vs. transdermal), although some outcomes had few or no studies available for these indirect comparisons. When analyses were restricted to trials with a mean baseline testosterone level less than 10.41 nmol/L (300 ng/dL), results were similar to those of the primary analyses. Data for efficacy were limited in men with higher baseline testosterone levels, and only 5 trials in the primary analyses had mean baseline levels of at least 10.41 nmol/L (300 ng/dL); of these, 3 had mean baseline levels less than 11.10 nmol/L (320 ng/dL). Results did not differ in analyses that included trials judged to have high risk of bias or inadequate blinding (Supplement Table 3).

    Sexual Function.

    Supplement Table 5 summarizes sexual function outcomes. Studies varied in required level of total or free testosterone and baseline symptoms: Some required low libido or other sexual symptoms, some required a range of other symptoms attributable to hypogonadism, and some did not require any symptoms. In studies that reported baseline sexual functioning, participants on average did report clinically significant sexual dysfunction at baseline, as assessed by various measures (Supplement Table 1). The most commonly reported sexual function outcome was score on the sexual function subscale of the Aging Males' Symptoms (AMS) scale. For erectile function, the 5-item International Index of Erectile Function questionnaire and its erectile function domain were most commonly used.

    Overall or Global Sexual Function. We pooled 7 RCTs (n = 1140) that evaluated the effect of testosterone treatment on global measures of sexual function (Figure 2 [top] and Supplement Table 5). Compared with placebo, testosterone treatment was associated with an overall small improvement in global sexual function (SMD, 0.35 [95% CI, 0.23 to 0.46]; I2 = 0%; moderate-certainty evidence). In the Testosterone Trials, the primary sexual function outcome—sexual activity, as assessed by the Psychosexual Daily Questionnaire question 4—increased more with testosterone treatment than with placebo (effect size, 0.45 [CI, 0.30 to 0.60]) (71). Sexual desire, as measured by the sexual desire score of the Derogatis Interview for Sexual Functioning in Men–II, also increased more in men treated with testosterone (effect size, 0.44 [CI, 0.32 to 0.56]). The effect sizes for these outcomes in the Testosterone Trials were consistent with a clinically meaningful improvement in sexual desire and activity for men treated with testosterone (82).

    Figure 2. Primary efficacy outcomes for testosterone treatment vs. placebo (SMDs in mean change from baseline).

    IM = intramuscular; SMD = standardized mean difference; T = testosterone level; TD = transdermal.

    Erectile Function. Testosterone treatment improved erectile function compared with placebo, although the pooled effect was small (7 trials; n = 1299; SMD, 0.27 [CI, 0.09 to 0.44]; I2 = 13%; low-certainty evidence) (Table and Supplement Figure 1).

    Physical Function.

    Study populations included men with mobility limitations (9, 10, 71) or frailty (75), but most trials did not specify physical function limitations as entry criteria. Self-reported measures of physical function included the Short Form-36 Health Survey physical function subscale and the Physical Activity Scale for the Elderly. Gait speed was most commonly measured by the 6-minute walk test.

    Subjective or Self-Reported Physical Function. Five RCTs (n = 1029) provided self-reported measures of physical function (Figure 2 [middle] and Supplement Table 6) that could be pooled. Testosterone treatment did not improve subjective physical function compared with placebo (SMD, 0.15 [CI, −0.19 to 0.50]; I2 = 61%; low-certainty evidence) (Table).

    Objective Physical Function. Seven trials (n = 1063) evaluated gait speed (Supplement Table 7). Testosterone treatment was associated with a less-than-small improvement in objective physical function as measured by gait (SMD, 0.14 [CI, 0.02 to 0.27]; I2 = 0%; low-certainty evidence) (Table and Supplement Figure 2).

    The Physical Function Trial of the Testosterone Trials was limited to men who reported difficulty walking or climbing stairs and had a baseline gait speed less than 1.2 m/s on the 6-minute walk test (71). The primary outcome was the proportion of men who increased the distance walked in 6 minutes by at least 50 m. At 12 months, 20% of men assigned to testosterone achieved this threshold, versus 12% of men assigned to placebo (adjusted OR, 1.42 [CI, 0.83 to 2.45]). In the overall Testosterone Trials, including all men regardless of baseline gait speed, 21% and 13% of men in the testosterone and placebo groups, respectively, had increased the distance they walked in 6 minutes by at least 50 m at 12 months (adjusted OR, 1.76 [CI, 1.21 to 2.57]).

    Quality of Life.

    We pooled 7 RCTs (n = 1043) that reported quality of life as an outcome using the AMS scale (Supplement Tables 8 and 9). The weighted mean total score at baseline on the AMS scale for the 6 trials that reported mean baseline AMS score was 43 points (scale, 17 to 85 points), indicating moderate severity of symptoms. Testosterone treatment was associated with a small improvement in quality of life, as measured by the AMS scale (SMD, −0.33 [CI, −0.50 to −0.16]; I2 = 6%, low-certainty evidence) (Table and Figure 2 [bottom]).

    The weighted mean change in AMS score from baseline for the testosterone group was 7.0 points, compared with 3.6 points in the placebo group (weighted mean difference, −3.3 [CI, −5.2 to −1.3]; I2 = 32%). Thus, men allocated to testosterone on average moved from moderate to mild symptom severity, while the symptoms of men allocated to placebo on average remained moderately severe.

    Vitality or Fatigue.

    Three RCTs (n = 665) reported vitality and fatigue outcomes that could be pooled (Supplement Table 10). Testosterone was associated with a less-than-small improvement in self-reported fatigue or vitality (SMD, 0.17 [CI, 0.01 to 0.32]; I2 = 0%) (Supplement Figure 3).


    We pooled 5 RCTs (n = 872) that evaluated the effect of testosterone on measures of depressive symptoms (Supplement Table 11). None required depression or depressive symptoms for enrollment, and in general, most men enrolled did not have significant depressive symptoms at baseline (29, 30, 41–44, 51, 71, 77, 81). Trials measured depressive symptoms with various instruments (Beck Depression Inventory, Geriatric Depression Scale, Hospital Anxiety and Depression Scale, and Patient Health Questionnaire-9). Testosterone treatment was associated with a less-than-small improvement in depressive symptoms (SMD, −0.19 [CI, −0.32 to −0.05]; I2 = 0%) (Supplement Figure 4).


    The RCTs rarely reported fractures (Supplement Table 7). Overall, 6 fractures were reported in testosterone groups and 8 in placebo groups during treatment periods ranging from 6 to 24 months (31, 48, 61, 65, 71, 74, 75, 79, 80). During follow-up periods of 6 to 12 months, an additional 3 fractures were reported in the testosterone groups and 5 in the placebo groups.

    Cognitive Function.

    Nine studies reported cognitive outcomes (Supplement Tables 12 and 13). Because of variability in patients, scale scores, and domains assessed, we did not pool results. Follow-up periods ranged from 6 to 36 months.

    One study enrolled persons with Alzheimer disease (78), and another enrolled those with mild cognitive impairment (33). The former was a pilot study that randomly assigned 10 participants to testosterone or placebo; the authors described the findings as preliminary but encouraging, noting the small sample size (75). The study of mild cognitive impairment reported 10 measures of verbal memory, visuospatial memory, visuospatial function, and language; many of the measures had multiple outcomes (a total of 18 outcomes). The pattern of change in these outcomes over the course of the study differed by treatment group for only 1 outcome, and the authors noted that the study may have been underpowered (33).

    Of the 7 remaining studies, 5 included fewer than 45 completers and were underpowered for cognitive outcomes (30, 52, 68, 77, 81). A study enrolling 280 cognitively normal men reported that long-term treatment with testosterone did not improve cognitive function (verbal memory, visuospatial memory, language, and executive function domains) (49). The Cognitive Function Trial (66), part of the Testosterone Trials (71), enrolled 493 men with age-associated memory impairment and reported no association between testosterone treatment and improved cognitive outcomes.

    Harm Outcomes

    The Table summarizes certainty of evidence for the primary harm outcomes. Supplement Table 14 provides results by formulation limited to trials with a mean baseline testosterone level less than 10.41 nmol/L (300 ng/dL). Supplement Table 4 shows certainty of evidence for the primary harm outcomes by testosterone formulation. Results did not vary significantly by follow-up duration (<12 months, 12 to <24 months, and ≥24 months) (Supplement Table 15).

    Adverse Cardiovascular Events.

    Fourteen trials (n = 2415) reported cardiovascular events as adverse events (Supplement Table 16). Most of the trials excluded men with advanced heart failure or a recent history of myocardial infarction or stroke, and none were designed to adequately assess cardiovascular risk of testosterone therapy. Cardiovascular event definitions varied between studies and were often not prespecified; event adjudication was rarely described or done. Pooled risk for adverse cardiovascular outcomes did not differ between groups (Peto OR, 1.22 [CI, 0.66 to 2.23]; I2 = 18%; low-certainty evidence) (Table and Figure 3 [top]). Incidence of cardiovascular events was 2.3% (CI, 0.9% to 4.1%) in the testosterone group, compared with 1.5% (CI, 0.8% to 2.5%) in the placebo group (Supplement Figure 5).

    Figure 3. Primary harm outcomes from RCTs for testosterone treatment vs. placebo.

    IM = intramuscular; OR = odds ratio; RCT = randomized controlled trial; T = testosterone level; TD = transdermal.

    The TOM trial (9, 10) was stopped early because of excess cardiovascular adverse events in the testosterone group. When events were limited to our cardiovascular events of interest, 7 (7%) occurred in the testosterone group versus 1 (1%) in the placebo group. This trial was not included in our pooled analyses because of high risk of bias based on attrition rate.

    Serious Adverse Events and Withdrawals Due to Adverse Events.

    Eight trials (n = 2268) reported serious adverse events (Supplement Table 17). Ascertainment, definition, and adjudication of these events were highly variable. Incidence was similar between groups: 13.2% (CI, 3.6% to 27.4%) of men assigned to testosterone and 12.8% (CI, 4.7% to 23.9%) of those assigned to placebo had a serious adverse event (Supplement Figure 6). The Peto OR was 0.94 (CI, 0.73 to 1.21) (I2 = 0%; moderate-certainty evidence) (Figure 3 [bottom] and Table).

    Similarly, withdrawals due to adverse events did not differ between men assigned to testosterone treatment and those assigned to placebo (5.1% vs. 5.3%; Peto OR, 0.92 [CI, 0.65 to 1.28]; I2 = 0%) (Supplement Figure 7).

    Venous Thromboembolism.

    Few venous thromboembolism events were reported in any trial (0.6% in testosterone groups and 0.5% in placebo groups) (Supplement Table 16).

    Prostate Cancer and Lower Urinary Tract Symptoms.

    Trials typically excluded men with a history of prostate cancer or a prostate-specific antigen value above a predetermined level (most commonly >4.0 µg/L). Ten studies (n = 2143) reported cases of prostate cancer (Supplement Table 16). Prostate cancer ascertainment varied, and no studies were powered to detect a difference in rate between treatment groups. Prostate cancer incidence was less than 1% during the trials in all men regardless of treatment group (Peto OR, 0.97 [CI, 0.35 to 2.69]; I2 = 1%; very-low-certainty evidence) (Table and Supplement Figures 8 and 9).

    Three trials reported the number of men who developed clinically significant lower urinary tract symptoms, defined by study authors as a score above 19 (71) or above 21 (27, 49) on the International Prostate Symptom Score or as a score of at least 20 on the American Urological Association symptom scale (47) (Supplement Table 16). Clinically significant lower urinary tract symptoms were reported for 42 of 645 men (6.5%) assigned to testosterone and 30 of 592 men (5.1%) assigned to placebo (Peto OR, 1.36 [CI, 0.35 to 5.30]; I2 = 78%) (Supplement Figure 10).


    Twelve studies (n = 2727) reported mortality (Supplement Table 16). Duration of follow-up varied from 24 weeks to 3 years. None were powered to detect a difference in mortality between treatment groups, and studies typically excluded persons at the highest risk for death. Incidence of death was 0.4% (CI, 0.07% to 0.99%) in men treated with testosterone, compared with 1.5% (CI, 0.48% to 2.89%) in the placebo group (Peto OR, 0.47 [CI, 0.25 to 0.89]; I2 = 0; low-certainty evidence) (Table and Supplement Figures 11 and 12).

    Special Populations

    Two trials were limited to men with obesity; these trials were included in the overall analyses. One (63, 64) reported similar effects on quality of life and sexual function to those in our pooled analyses. An older, smaller trial (57) reported improvement in well-being and energy using nonstandard questionnaires.

    Five trials included only men with diabetes mellitus or metabolic syndrome (26, 37–39, 41–46, 50, 51); 2 were rated as having high risk of bias and were not included in the overall pooled analyses. For outcomes reported in these studies of diabetes or metabolic syndrome, results were similar to the overall pooled effects.

    No trials of testosterone replacement in opioid users met our inclusion criteria.

    The 10 trials done in populations with specific disease conditions were not included in our pooled analyses (31, 33, 34, 40, 56, 59, 62, 69, 76, 78). Three trials were judged to be at high risk of bias (31, 56, 69). Few reported any significant improvements in patient-centered outcomes (Supplement Tables 5 to 13, 16, and 17).

    Observational Studies

    Twenty observational studies (15 retrospective cohorts, 1 prospective cohort, 2 case–control studies, and 2 prospective registry studies) reported harms as outcomes of interest. Two had low, 14 medium, and 4 high risk of bias. Most reported cardiovascular end points or mortality; few reported thromboembolic disease or prostate cancer. Studies were heterogeneous in their inclusion criteria and patient populations, including baseline levels of testosterone and comorbid conditions. Duration and type of testosterone treatment, duration of follow-up, and adjustment for potential confounders also varied (Supplement Table 2). Mean or median follow-up ranged from 0.73 years (93) to 10.3 years (90), although 1 study assessed outcomes only during postoperative hospitalization (83). Five studies reported at least partial industry sponsorship.

    Most studies did not identify an increased risk for death or cardiovascular events associated with testosterone, and some reported decreased risk (92, 97). One retrospective cohort study of veterans who had coronary angiography, had a testosterone level less than 10.41 nmol/L (300 ng/dL), and were treated with testosterone found that participants had an increased risk for a combined end point of all-cause mortality, myocardial infarction, and ischemic stroke compared with those not treated with testosterone, after adjustment for potential confounders (hazard ratio, 1.29 [CI, 1.05 to 1.58]) (101).

    No study reported an increased risk for prostate cancer. A single retrospective cohort study (87) reported an increased risk for obstructive sleep apnea in men treated with testosterone. Testosterone was not associated with increased risk for pulmonary embolism or deep venous thrombosis (83, 87, 88, 95, 98) in the few observational studies that reported that outcome.

    Among 8 observational studies reporting results by formulation, results were mixed, with no consistent finding of increased risk for harm for one formulation over the other.


    Our analysis of data from 38 RCTs found that intramuscular or transdermal testosterone therapy resulted in small improvements in sexual functioning and quality of life but had little to no effect on physical functioning, depressive symptoms, energy and vitality, and cognition. Quality of life, when reported as an outcome, was typically measured using the AMS scale; the observed effect may have been driven primarily by effects on the sexual subscale. These findings are limited to men without underlying medical conditions recognized to cause hypogonadism, such as testicular failure or hypothalamic–pituitary injury. Evidence for most outcomes was of low or moderate certainty. None of the trials had adequate power to assess risk for adverse cardiovascular events, prostate cancer, thromboembolic disease, or death—the harms of most clinical interest. The observational studies were limited by likely unmeasured confounding due to indication or contraindication and must be interpreted with caution. Few trials were longer than 1 year, limiting conclusions about the benefits or harms of longer-term treatment. Participants were typically aged 60 years or older; white; and without recent cardiovascular events, history of prostate cancer, or elevated levels of prostate-specific antigen. This limits the generalizability of our findings.

    We included RCTs of at least 6 months' duration and prioritized patient-centered outcomes over intermediate measures, such as body composition or metabolic variables. Important heterogeneity existed in the entry criteria for trials, including testosterone levels, presence of symptoms attributable to hypogonadism, comorbid conditions, and extent of evaluation for underlying causes of low testosterone levels. However, we believe that the resulting mix of patients likely reflects clinical practice, especially among primary care physicians who provide most evaluation and treatment of these patients. The largest randomized placebo-controlled trial of testosterone therapy for older men, the Testosterone Trials, enrolled men on the basis of 2 fasting morning levels of total testosterone that averaged less than 9.54 nmol/L (275 ng/dL), no underlying conditions known to cause hypogonadism, and the presence of symptoms associated with hypogonadism; the findings from the several coordinated trials that make up the Testosterone Trials are generally consistent with our findings.

    We did not find differences in reported outcomes by testosterone formulation (intramuscular vs. transdermal). However, patient characteristics, testosterone levels, and outcome reporting varied in studies of intramuscular versus transdermal formulations. No clinical trial that met our other entry criteria directly compared an intramuscular versus a transdermal formulation of testosterone. Furthermore, we did not systematically search for or include studies that directly compared different formulations without a control group.

    The FDA recently approved an oral formulation of testosterone undecanoate for use in the United States (104). Our review did not evaluate the evidence for efficacy or harms of oral testosterone. The FDA specifically stated that the oral formulation is contraindicated in “[m]en with hypogonadal conditions, such as ‘age-related hypogonadism,' that are not associated with structural or genetic etiologies,” citing demonstrated increases in blood pressure and lack of established efficacy (105).

    We limited our review of harm outcomes to serious adverse events, major adverse cardiovascular events, withdrawals due to adverse events, prostate cancer, worsening lower urinary tract symptoms, venous thromboembolism, worsening or development of sleep apnea, and mortality. Other potential risks of testosterone therapy are recognized but were outside the scope of this review. These include but are not limited to polycythemia, elevated prostate-specific antigen levels, increased blood pressure, gynecomastia, skin reaction to transdermal products, testicular atrophy, infertility or azoospermia, and fluid retention, as well as risk for transfer to others of the transdermal preparations, a concern owing to risk for virilization in women or children (106). Because of concern about inadequate data regarding harms of testosterone treatment in older men with age-related hypogonadism, the FDA has required companies that manufacture these products to conduct a controlled clinical trial to evaluate the effects of testosterone therapy on cardiovascular outcomes (1). This trial, TRAVERSE (Testosterone Replacement Therapy for Assessment of Long-term Vascular Events and Efficacy ResponSE in Hypogonadal Men), began enrollment in May 2018 and will follow participants for up to 5 years for cardiovascular safety and prostate safety, as well as efficacy outcomes (107).

    Because deaths were few and entry criteria for RCTs excluded persons at highest risk for death, we cannot make definitive conclusions about testosterone's effect on mortality. However, our findings do not suggest an increased risk for death with testosterone treatment.

    Our findings are generally consistent with those of other systematic reviews (108–125), with occasional exceptions (for example, the trial by Huo and colleagues [126]), despite variable inclusion and exclusion criteria for study selection, various outcomes assessed, and methodological differences. Most have found low- to moderate-certainty evidence of small beneficial effects on sexual function, little to no evidence of benefit for other clinical efficacy outcomes, and inadequate evidence to make definitive conclusions about cardiovascular and other long-term harms.

    In conclusion, intramuscular or transdermal testosterone treatment in men with low testosterone levels not associated with specific, well-established medical conditions known to cause primary or secondary hypogonadism may result in small improvements in sexual function and self-reported quality of life but little to no benefit for other common symptoms of aging, including fatigue or decreased energy, reduced physical function, and reduced cognition. Evidence is inadequate about the long-term benefits or serious harms of testosterone treatment.



    Sign In to Submit A Comment