Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and ElaborationFREE
Submit a Comment
Contributors must reveal any conflict of interest. Comments are moderated. Please see our information for authorsregarding comments on an Annals publication.
Abstract
Diagnostic and Prognostic Prediction Models
Development, Validation, and Updating of Prediction Models
Incomplete and Inaccurate Reporting
The TRIPOD Statement
Development of TRIPOD
The TRIPOD Statement: Explanation and Elaboration
Aim and Outline of This Document
Use of Examples
Use of TRIPOD
The TRIPOD Checklist
Title and Abstract
Title
Item 1. Identify the study as developing and/or validating a multivariable prediction model, the target population, and the outcome to be predicted. [D;V]
Development and validation of a clinical score to estimate the probability of coronary artery disease in men and women presenting with suspected coronary disease (115). [Diagnosis; Development; Validation]
Development and external validation of prognostic model for 2 year survival of non small cell lung cancer patients treated with chemoradiotherapy (116). [Prognosis; Development; Validation]
Predicting the 10 year risk of cardiovascular disease in the United Kingdom: independent and external validation of an updated version of QRISK2 (117). [Prognosis; Validation]
Development of a prediction model for 10 year risk of hepatocellular carcinoma in middle-aged Japanese: the Japan Public Health Center based Prospective Study Cohort II (118). [Prognosis; Development]
Development and validation of a logistic regression derived algorithm for estimating the incremental probability of coronary artery disease before and after exercise testing (119). [Diagnosis; Development; Validation]
Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation (120). [Prognosis; Validation]
External validation of the SAPS II APACHE II and APACHE III prognostic models in South England: a multicentre study (121). [Prognosis; Validation]
Abstract
Item 2. Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions. [D;V]
OBJECTIVE: To develop and validate a prognostic model for early death in patients with traumatic bleeding.
DESIGN: Multivariable logistic regression of a large international cohort of trauma patients.
SETTING: 274 hospitals in 40 high, medium, and low income countries.
PARTICIPANTS: Prognostic model development: 20,127 trauma patients with, or at risk of, significant bleeding, within 8 hours of injury in the Clinical Randomisation of an Antifibrinolytic in Significant Haemorrhage (CRASH 2) trial. External validation: 14,220 selected trauma patients from the Trauma Audit and Research Network (TARN), which included mainly patients from the UK.
OUTCOMES: In hospital death within 4 weeks of injury.
RESULTS: 3076 (15%) patients died in the CRASH 2 trial and 1765 (12%) in the TARN dataset. Glasgow coma score, age, and systolic blood pressure were the strongest predictors of mortality. Other predictors included in the final model were geographical region (low, middle, or high income country), heart rate, time since injury, and type of injury. Discrimination and calibration were satisfactory, with C statistics above 0.80 in both CRASH 2 and TARN. A simple chart was constructed to readily provide the probability of death at the point of care, and a web based calculator is available for a more detailed risk assessment (http://crash2.lshtm.ac.uk).
CONCLUSIONS: This prognostic model can be used to obtain valid predictions of mortality in patients with traumatic bleeding, assisting in triage and potentially shortening the time to diagnostic and lifesaving procedures (such as imaging, surgery, and tranexamic acid). Age is an important prognostic factor, and this is of particular relevance in high income countries with an aging trauma population (123). [Prognosis; Development]
OBJECTIVE: To validate and refine previously derived clinical decision rules that aid the efficient use of radiography in acute ankle injuries.
DESIGN: Survey prospectively administered in two stages: validation and refinement of the original rules (first stage) and validation of the refined rules (second stage).
SETTING: Emergency departments of two university hospitals.
PATIENTS: Convenience sample of adults with acute ankle injuries: 1032 of 1130 eligible patients in the first stage and 453 of 530 eligible patients in the second stage.
MAIN OUTCOME MEASURES: Attending emergency physicians assessed each patient for standardized clinical variables and classified the need for radiography according to the original (first stage) and the refined (second stage) decision rules. The decision rules were assessed for their ability to correctly identify the criterion standard of fractures on ankle and foot radiographic series. The original decision rules were refined by univariate and recursive partitioning analyses.
MAIN RESULTS: In the first stage, the original decision rules were found to have sensitivities of 1.0 (95% confidence interval [CI], 0.97 to 1.0) for detecting 121 malleolar zone fractures, and 0.98 (95% CI, 0.88 to 1.0) for detecting 49 midfoot zone fractures. For interpretation of the rules in 116 patients, kappa values were 0.56 for the ankle series rule and 0.69 for the foot series rule. Recursive partitioning of 20 predictor variables yielded refined decision rules for ankle and foot radiographic series. In the second stage, the refined rules proved to have sensitivities of 1.0 (95% CI, 0.93 to 1.0) for 50 malleolar zone fractures, and 1.0 (95% CI, 0.83 to 1.0) for 19 midfoot zone fractures. The potential reduction in radiography is estimated to be 34% for the ankle series and 30% for the foot series. The probability of fracture, if the corresponding decision rule were “negative,” is estimated to be 0% (95% CI, 0% to 0.8%) in the ankle series, and 0% (95% CI, 0% to 0.4%) in the foot series.
CONCLUSION: Refinement and validation have shown the Ottawa ankle rules to be 100% sensitive for fractures, to be reliable, and to have the potential to allow physicians to safely reduce the number of radiographs ordered in patients with ankle injuries by one third. Field trials will assess the feasibility of implementing these rules into clinical practice (124). [Diagnosis; Validation; Updating]
Introduction
Background and Objectives
Item 3a. Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models. [D;V]
Confronted with acute infectious conjunctivitis most general practitioners feel unable to discriminate between a bacterial and a viral cause. In practice more than 80% of such patients receive antibiotics. Hence in cases of acute infectious conjunctivitis many unnecessary ocular antibiotics are prescribed. … To select those patients who might benefit most from antibiotic treatment the general practitioner needs an informative diagnostic tool to determine a bacterial cause. With such a tool antibiotic prescriptions may be reduced and better targeted. Most general practitioners make the distinction between a bacterial cause and another cause on the basis of signs and symptoms. Additional diagnostic investigations such as a culture of the conjunctiva are seldom done mostly because of the resulting diagnostic delay. Can general practitioners actually differentiate between bacterial and viral conjunctivitis on the basis of signs and symptoms alone? … A recently published systematic literature search summed up the signs and symptoms and found no evidence for these assertions. This paper presents what seems to be the first empirical study on the diagnostic informativeness of signs and symptoms in acute infectious conjunctivitis (130). [Diagnosis; Development]
In the search for a practical prognostic system for patients with parotid carcinoma, we previously constructed a prognostic index based on a Cox proportional hazards analysis in a source population of 151 patients with parotid carcinoma from the Netherlands Cancer Institute. [The] Table … shows the pretreatment prognostic index PS1, which combines information available before surgery, and the post treatment prognostic index PS2, which incorporates information from the surgical specimen. For each patient, the index sums the properly weighted contributions of the important clinicopathologic characteristics into a number corresponding to an estimated possibility of tumor recurrence. These indices showed good discrimination in the source population and in an independent nationwide database of Dutch patients with parotid carcinoma. According to Justice et al, the next level of validation is to go on an international level. … For this purpose, an international database was constructed from patients who were treated in Leuven and Brussels (Belgium) and in Cologne (Germany), where the prognostic variables needed to calculate the indices were recorded, and predictions were compared with outcomes. In this way, we tried to achieve further clinical and statistical validation (131). [Prognosis; Validation]
Any revisions and updates to a risk prediction model should be subject to continual evaluation (validation) to show that its usefulness for routine clinical practice has not deteriorated, or indeed to show that its performance has improved owing to refinements to the model. We describe the results from an independent evaluation assessing the performance of QRISK2 2011 on a large dataset of general practice records in the United Kingdom, comparing its performance with earlier versions of QRISK and the NICE adjusted version of the Framingham risk prediction model (117). [Prognosis; Validation]
Item 3b. Specify the objectives, including whether the study describes the development or validation of the model or both. [D;V]
The aim of this study was to develop and validate a clinical prediction rule in women presenting with breast symptoms, so that a more evidence based approach to referral—which would include urgent referral under the 2 week rule—could be implemented as part of clinical practice guidance (142). [Diagnosis; Development; Validation]
In this paper, we report on the estimation and external validation of a new UK based parametric prognostic model for predicting long term recurrence free survival for early breast cancer patients. The model's performance is compared with that of Nottingham Prognostic Index and Adjuvant Online, and a scoring algorithm and downloadable program to facilitate its use are presented (143). [Prognosis; Development; Validation]
Even though it is widely accepted that no prediction model should be applied in practice before being formally validated on its predictive accuracy in new patients no study has previously performed a formal quantitative (external) validation of these prediction models in an independent patient population. Therefore we first conducted a systematic review to identify all existing prediction models for prolonged ICU length of stay (PICULOS) after cardiac surgery. Subsequently we validated the performance of the identified models in a large independent cohort of cardiac surgery patients (46). [Prognosis; Validation]
Methods
Source of Data
Item 4a. Describe the study design or source of data (for example, randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable. [D;V]
The population based sample used for this report included 2489 men and 2856 women 30 to 74 years old at the time of their Framingham Heart Study examination in 1971 to 1974. Participants attended either the 11th examination of the original Framingham cohort or the initial examination of the Framingham Offspring Study. Similar research protocols were used in each study, and persons with overt coronary heart disease at the baseline examination were excluded (144). [Prognosis; Development]
Data from the multicentre, worldwide, clinical trial (Action in Diabetes and Vascular disease: preterax and diamicron MR controlled evaluation) (ADVANCE) permit the derivation of new equations for cardiovascular risk prediction in people with diabetes. … ADVANCE was a factorial randomized controlled trial of blood pressure (perindopril indapamide versus placebo) and glucose control (gliclazide MR based intensive intervention versus standard care) on the incidence of microvascular and macrovascular events among 11,140 high risk individuals with type 2 diabetes … DIABHYCAR (The non insulin dependent diabetes, hypertension, microalbuminuria or proteinuria, cardiovascular events, and ramipril study) was a clinical trial of ramipril among individuals with type 2 diabetes conducted in 16 countries between 1995 and 2001. Of the 4912 randomized participants, 3711 … were suitable for use in validation. Definitions of cardiovascular disease in DIABHYCAR were similar to those in ADVANCE. … Predictors considered were age at diagnosis of diabetes, duration of diagnosed diabetes, sex, … and randomized treatments (blood pressure lowering and glucose control regimens) (145). [Prognosis; Development; Validation]
We did a multicentre prospective validation study in adults and an observational study in children who presented with acute elbow injury to five emergency departments in southwest England UK. As the diagnostic accuracy of the test had not been assessed in children we did not think that an interventional study was justified in this group (146). [Diagnosis; Validation]
We conducted such large scale international validation of the ADO index to determine how well it predicts mortality for individual subjects with chronic obstructive pulmonary disease from diverse settings, and updated the index as needed. Investigators from 10 chronic obstructive pulmonary disease and population based cohort studies in Europe and the Americas agreed to collaborate in the International chronic obstructive pulmonary disease Cohorts Collaboration Working Group (147). [Prognosis; Validation; Updating]
Item 4b. Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up. [D;V]
This prospective temporal validation study included all patients who were consecutively treated from March 2007 to June 2007 in 19 phase I trials at the Drug Development Unit, Royal Marsden Hospital (RMH), Sutton, United Kingdom. … [A]ll patients were prospectively observed until May 31, 2008 (177). [Prognosis; Validation]
All consecutive patients presenting with anterior chest pain (as a main or minor medical complaint) over a three to nine week period (median length, five weeks) from March to May 2001 were included. … Between October 2005 and July 2006, all attending patients with anterior chest pain (aged 35 years and over; n = 1249) were consecutively recruited to this study by 74 participating GPs in the state of Hesse, Germany. The recruitment period lasted 12 weeks for each practice (178). [Diagnosis; Development; Validation]
The derivation cohort was 397 consecutive patients aged 18 years or over of both sexes who were admitted to any of four internal medicine wards at Donostia Hospital between 1 May and 30 June 2008 and we used no other exclusion criteria. The following year between 1 May and 30 June 2009 we recruited the validation cohort on the same basis: 302 consecutive patients aged 18 or over of both sexes who were admitted to any of the same four internal medicine wards at the hospital (179). [Prognosis; Development]
Participants
Item 5a. Specify key elements of the study setting (e.g., primary care, secondary care, general population), including number and location of centers. [D;V]
We built on our previous risk prediction algorithm (QRISK1) to develop a revised algorithm … QRISK2. We conducted a prospective cohort study in a large UK primary care population using a similar method to our original analysis. We used version 19 of the QRESEARCH database (www.qresearch.org). This is a large validated primary care electronic database containing the health records of 11 million patients registered from 551 general practices (139). [Prognosis; Development; Validation]
Item 5b. Describe eligibility criteria for participants. [D;V]
One hundred and ninety two patients with cutaneous lymphomas were evaluated at the Departments of Dermatology at the UMC Mannheim and the UMC Benjamin Franklin Berlin from 1987 to 2002. Eighty six patients were diagnosed as having cutaneous T cell lymphoma (CTCL) as defined by the European Organisation for Research and Treatment of Cancer classification of cutaneous lymphomas, including mycosis fungoides, Sezary Syndrome and rare variants. … Patients with the rare variants of CTCL, parapsoriasis, cutaneous pseudolymphomas and cutaneous B cell lymphomas were excluded from the study. … Staging classification was done by the TNM scheme of the mycosis fungoides Cooperative Group. A diagnosis of Sezary Syndrome was made in patients with erythroderma and >1000 Sezary cells mm) in the peripheral blood according to the criteria of the International Society for Cutaneous Lymphomas (ISCL) (193). [Prognosis; Development]
Inclusion criteria were age 12 years and above, and injury sustained within 7 days or fewer. The authors selected 12 as the cutoff age because the emergency department receives, in the main, patients 12 years and above while younger patients were seen at a neighboring children's hospital about half a mile down the road from our hospital. In this, we differed from the original work by Stiell, who excluded patients less than 18 years of age. Exclusion criteria were: pregnancy, altered mental state at the time of consultation, patients who had been referred with an x ray study, revisits, multiply traumatized patients, and patients with isolated skin injuries such as burns, abrasions, lacerations, and puncture wounds (194). [Diagnosis; Validation]
Item 5c. Give details of treatments received, if relevant. [D;V]
Data from the multi-centre, worldwide, clinical trial (Action in Diabetes and Vascular disease: preterax and diamicron-MR controlled evaluation) (ADVANCE) permit the derivation of new equations for cardiovascular risk prediction in people with diabetes. … ADVANCE was a factorial randomized controlled trial of blood pressure (perindopril indapamide versus placebo) and glucose control (gliclazide MR based intensive intervention versus standard care) on the incidence of microvascular and macrovascular events among 11,140 high risk individuals with type 2 diabetes, recruited from 215 centres across 20 countries in Asia, Australasia, Europe and Canada. … Predictors considered were age at diagnosis of diabetes, duration of diagnosed diabetes, sex, systolic blood pressure, diastolic blood pressure, mean arterial blood pressure, pulse pressure, total cholesterol, high-density lipoprotein and non high-density lipoprotein and triglycerides, body mass index, waist circumference, Predictors waist to hip ratio, blood pressure lowering medication (i.e. treated hypertension), statin use, current smoking, retinopathy, atrial fibrillation (past or present), logarithmically transformed urinary albumin/creatinine ratio (ACR) and serum creatinine (Scr), haemoglobin A1c (HbA1c), fasting blood glucose and randomized treatments (blood pressure lowering and glucose control regimens) (145). [Prognosis; Development; Validation]
Outcome
Item 6a. Clearly define the outcome that is predicted by the prediction model, including how and when assessed. [D;V]
Outcomes of interest were any death, coronary heart disease related death, and coronary heart disease events. To identify these outcomes, cohort participants were followed over time using a variety of methods, including annual telephone interviews, triennial field center examinations, surveillance at ARIC community hospitals, review of death certificates, physician questionnaires, coroner/medical examiner reports, and informant interviews. Follow up began at enrollment (1987 to 1989) and continued through December 31, 2000. Fatal coronary heart disease included hospitalized and nonhospitalized deaths associated with coronary heart disease. A coronary heart disease event was defined as hospitalized definite or probable myocardial infarction, fatal coronary heart disease, cardiac procedure (coronary artery bypass graft, coronary angioplasty), or the presence of serial electrocardiographic changes across triennial cohort examinations. Event classification has been described in detail elsewhere [ref] (210). [Prognosis; Development]
Definite urinary tract infection was defined as ≥108 colony forming units (cfu) per litre of a single type of organism in a voided sample ≥107 cfu/L of a single organism in a catheter sample or any growth of a single organism in a suprapubic bladder tap sample. Probable urinary tract infection was defined as ≥107 cfu/L of a single organism in a voided sample ≥106 cfu/L of a single organism in a catheter sample ≥108 cfu/L of two organisms in a voided sample or ≥107 cfu/L of two organisms from a catheter sample (211). [Diagnosis; Development; Validation]
Patient charts and physician records were reviewed to determine clinical outcome. Patients generally were seen postoperatively at least every 3–4 months for the first year, semi annually for the second and third years, and annually thereafter. Follow up examinations included radiological imaging with computed tomography in all patients. In addition to physical examination with laboratory testing, intravenous pyelography, cystoscopy, urine cytology, urethral washings and bone scintigraphy were carried out if indicated. Local recurrence was defined as recurrence in the surgical bed, distant as recurrence at distant organs. Clinical outcomes were measured from the date of cystectomy to the date of first documented recurrence at computed tomography, the date of death, or the date of last follow up when the patient had not experienced disease recurrence (212). [Prognosis; Development]
Breast Cancer Ascertainment: Incident diagnoses of breast cancer were ascertained by self-report on biennial follow up questionnaires from 1997 to 2005. We learned of deaths from family members, the US Postal Service, and the National Death Index. We identified 1084 incident breast cancers, and 1007 (93%) were confirmed by medical record or by cancer registry data from 24 states in which 96% of participants resided at baseline (213). [Prognosis; Validation]
Item 6b. Report any actions to blind assessment of the outcome to be predicted. [D;V]
All probable cases of serious bacterial infection were reviewed by a final diagnosis committee composed of two specialist paediatricians (with experience in paediatrics infectious disease and respiratory medicine) and in cases of pneumonia a radiologist. The presence or absence of bacterial infection [outcome] was decided blinded to clinical information [predictors under study] and based on consensus (211). [Diagnosis; Development; Validation]
Liver biopsies were obtained with an 18 gauge or larger needle with a minimum of 5 portal tracts and were routinely stained with hematoxylin-eosin and trichrome stains. Biopsies were interpreted according to the scoring schema developed by the METAVIR group by 2 expert liver pathologists … who were blinded to patient clinical characteristics and serum measurements. Thirty biopsies were scored by both pathologists, and interobserver agreement was calculated by use of κ statistics (223). [Diagnosis; Development; Validation]
The primary outcome [acute myocardial infarction coronary revascularization or death of cardiac or unknown cause within 30 days] was ascertained by investigators blinded to the predictor variables. If a diagnosis could not be assigned a cardiologist … reviewed all the clinical data and assigned an adjudicated outcome diagnosis. All positive and 10% of randomly selected negative outcomes were confirmed by a second coinvestigator blinded to the standardized data collection forms. Disagreements were resolved by consensus (224). [Prognosis; Development]
Predictors
Item 7a. Clearly define all predictors used in developing the multivariable prediction model, including how and when they were measured. [D;V]
The following data were extracted for each patient: gender, aspartate aminotransferase in IU/L, alanine aminotransferase in IU/L, aspartate aminotransferase/alanine aminotransferase ratio, total bilirubin (mg/dl), albumin (g/dl), transferrin saturation (%), mean corpuscular volume (μm3), platelet count ( × 103/mm3), and prothrombin time(s). … All laboratory tests were performed within 90 days before liver biopsy. In the case of repeated test, the results closest to the time of the biopsy were used. No data obtained after the biopsy were used (228). [Diagnosis; Development]
Forty three potential candidate variables in addition to age and gender were considered for inclusion in the AMI [acute myocardial infarction] mortality prediction rules. … These candidate variables were taken from a list of risk factors used to develop previous report cards in the California Hospital Outcomes Project and Pennsylvania Health Care Cost Containment Council AMI “report card” projects. Each of these comorbidities was created using appropriate ICD 9 codes from the 15 secondary diagnosis fields in OMID. The Ontario discharge data are based on ICD 9 codes rather than ICD 9 CM codes used in the U.S., so the U.S. codes were truncated. Some risk factors used in these two projects do not have an ICD 9 coding analog (e.g., infarct subtype, race) and therefore were not included in our analysis. The frequency of each of these 43 comorbidities was calculated, and any comorbidity with a prevalence of <1% was excluded from further analysis. Comorbidities that the authors felt were not clinically plausible predictors of AMI mortality were also excluded (185). [Prognosis; Development; Validation]
Each screening round consisted of two visits to an outpatient department separated by approximately 3 weeks. Participants filled out a questionnaire on demographics, cardiovascular and renal disease history, smoking status, and the use of oral antidiabetic, antihypertensive, and lipid lowering drugs. Information on drug use was completed with data from community pharmacies, including information on class of antihypertensive medication. … On the first and second visits, blood pressure was measured in the right arm every minute for 10 and 8 minutes, respectively, by an automatic Dinamap XL Model 9300 series device (Johnson & Johnson Medical Inc., Tampa, FL). For systolic and diastolic BP, the mean of the last two recordings from each of the 2 visit days of a screening round was used. Anthropometrical measurements were performed, and fasting blood samples were taken. Concentrations of total cholesterol and plasma glucose were measured using standard methods. Serum creatinine was measured by dry chemistry (Eastman Kodak, Rochester, NY), with intra assay coefficient of variation of 0.9% and interassay coefficient of variation of 2.9%. eGFR [estimated glomerular filtration rate] was estimated using the Modification of Diet in Renal Disease (MDRD) study equation, taking into account gender, age, race, and serum creatinine. In addition, participants collected urine for two consecutive periods of 24 hours. Urinary albumin concentration was determined by nephelometry (Dade Behring Diagnostic, Marburg, Germany), and UAE [urinary albumin excretion] was given as the mean of the two 24 hour urinary excretions. As a proxy for dietary sodium and protein intake, we used the 24 hour urinary excretion of sodium and urea, respectively (229). [Prognosis; Development]
Item 7b. Report any actions to blind assessment of predictors for the outcome and other predictors. [D;V]
A single investigator blinded to clinical data and echocardiographic measurements performed the quantitative magnetic resonance image analyses. [The aim was to specifically quantify the incremental diagnostic value of magnetic resonance beyond clinical data to include or exclude heart failure] (236). [Diagnosis; Development; Incremental value]
Blinded to [other] predictor variables and patient outcome [a combination of nonfatal and fatal cardiovascular disease and overall mortality within 30 days of chest pain onset], 2 board certified emergency physicians … classified all electrocardiograms [one of the specific predictors under study] with a structured standardized format … (224). [Prognosis; Development]
Investigators, blinded to both predictor variables and patient outcome, reviewed and classified all electrocardiograms in a structured format according to current standardized reporting guidelines. Two investigators blinded to the standardized data collection forms ascertained outcomes. The investigators were provided the results of all laboratory values, radiographic imaging, cardiac stress testing, and cardiac catheterization findings, as well as information obtained during the 30 day follow up phone call (237). [Diagnosis; Validation]
Sample Size
Item 8. Explain how the study size was arrived at. [D;V]
We estimated the sample size according to the precision of the sensitivity of the derived decision rule. As with previous decision rule studies we prespecified 120 outcome events to derive a rule that is 100% sensitive with a lower 95% confidence limit of 97.0% and to have the greatest utility for practicing emergency physicians we aimed to include at least 120 outcome events occurring outside the emergency department (in hospital or after emergency department discharge). Review of quality data from the Ottawa hospital indicated that 10% of patients who presented to the emergency department with chest pain would meet outcome criteria within 30 days. We estimated that half of these events would occur after hospital admission or emergency department discharge. The a priori sample size was estimated to be 2400 patients (224). [Diagnosis; Development]
Our sample size calculation is based on our primary objective (i.e., to determine if preoperative coronary computed tomography angiograph has additional predictive value beyond clinical variables). Of our two objectives, this objective requires the largest number of patients to ensure the stability of the prediction model. … On the basis of the VISION Pilot Study and a previous non-invasive cardiac testing study that we undertook in a similar surgical population, we expect a 6% event rate for major perioperative cardiac events in this study. Table 2 presents the various sample sizes needed to test four variables in a multivariable analysis based upon various event rates and the required number of events per variable. As the table indicates, if our event rate is 6% we will need 1000 patients to achieve stable estimates. If our event rate is 4%, we may need up to 1500 patients. We are targeting a sample size of 1500 patients but this may change depending on our event rate at 1000 patients (242). [Prognosis; Development]
All available data on the database were used to maximise the power and generalisability of the results (243). [Diagnosis; Development]
We did not calculate formal sample size calculations because all the cohort studies are ongoing studies. Also there are no generally accepted approaches to estimate the sample size requirements for derivation and validation studies of risk prediction models. Some have suggested having at least 10 events per candidate variable for the derivation of a model and at least 100 events for validation studies. Since many studies to develop and validate prediction models are small a potential solution is to have large scale collaborations as ours to derive stable estimates from regression models that are likely to generalize to other populations. Our sample and the number of events far exceeds all approaches for determining samples sizes and therefore is expected to provide estimates that are very robust (147). [Prognosis; Validation]
We calculated the study sample size needed to validate the clinical prediction rule according to a requirement of 100 patients with the outcome of interest (any intra-abdominal injury present), which is supported by statistical estimates described previously for external validation of clinical prediction rules. In accordance with our previous work, we estimated the enrolled sample would have a prevalence rate of intra-abdominal injury of 10%, and thus the total needed sample size was calculated at 1,000 patients (244). [Diagnosis; Validation]
Missing Data
Item 9. Describe how missing data were handled (for example, complete-case analysis, single imputation, multiple imputation), with details of any imputation method. [D;V]
We assumed missing data occurred at random depending on the clinical variables and the results of computed tomography based coronary angiography and performed multiple imputations using chained equations. Missing values were predicted on the basis of all other predictors considered the results of computed tomography based coronary angiography as well as the outcome. We created 20 datasets with identical known information but with differences in imputed values reflecting the uncertainty associated with imputations. In total 667 (2%) clinical data items were imputed. In our study only a minority of patients underwent catheter based coronary angiography. An analysis restricted to patients who underwent catheter based coronary angiography could have been influenced by verification bias. Therefore we imputed data for catheter based coronary angiography by using the computed tomography based procedure as an auxiliary variable in addition to all other predictors. Results for the two procedures correlate well together especially for negative results of computed tomography based coronary angiography. This strong correlation was confirmed in the 1609 patients who underwent both procedures (Pearson r = 0.72). Since its data were used for imputation the computed tomography based procedure was not included as a predictor in the prediction models. Our approach was similar to using the results of computed tomography based coronary angiography as the outcome variable when the catheter based procedure was not performed (which was explored in a sensitivity analysis). However this approach is more sophisticated because it also takes into account other predictors and the uncertainty surrounding the imputed values. We imputed 3615 (64%) outcome values for catheter based coronary angiography. Multiple imputations were performed using Stata/SE 11 (StataCorp) (256). [Diagnosis; Development]
If an outcome was missing, the patient data were excluded from the analysis. Multiple imputation was used to address missingness in our nonoutcome data and was performed with SAS callable IVEware (Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI). Multiple imputation has been shown to be a valid and effective way of handling missing data and minimizes bias that may often result from excluding such patients. Additionally, multiple imputation remains valid even if the proportion of missing data is large. The variables included in the multiple imputation model were the 4 outcomes, age, sex, ICD-9 E codes, emergency department Glasgow coma score, out of hospital Glasgow coma score, Injury Severity Score, mechanism of trauma, and trauma team notification. Ten imputed data sets were created as part of the multiple imputation, and all areas under the receiver operating characteristic curve were combined across the 10 imputed data sets with a standard approach. Although there is no reported conventional approach to combining receiver operating characteristic curves from imputed data sets, we averaged the individual sensitivity and specificity data across the 10 imputed data sets and then plotted these points to generate the curves in our results (257). [Prognosis; Validation]
We split the data into development (training) and validation (test) data sets. The development data included all operations within the first 5 years; the validation data included the rest. To ensure reliability of data, we excluded patients who had missing information on key predictors: age, gender, operation sequence, and number and position of implanted heart valves. In addition, patients were excluded from the development data if they were missing information on >3 of the remaining predictors. Any predictor recorded for <50% of patients in the development data was not included in the modeling process, resulting in the exclusion of left ventricular end diastolic pressure, pulmonary artery wedge pressure, aortic valve gradient, and active endocarditis. Patients were excluded from the validation data if they had missing information on any of the predictors in the risk model. To investigate whether exclusions of patients as a result of missing data had introduced any bias, we compared the key preoperative characteristics of patients excluded from the study with those included. Any remaining missing predictor values in the development data were imputed by use of multiple imputation techniques. Five different imputed data sets were created (258). [Prognosis; Development; Validation]
Statistical Analysis Methods
Item 10a. Describe how predictors were handled in the analyses. [D]
For the continuous predictors age, glucose, and Hb [hemoglobin], a linear relationship with outcome was found to be a good approximation after assessment of nonlinearity using restricted cubic splines (262). [Prognosis]
Fractional polynomials were used to explore presence of nonlinear relationships of the continuous predictors of age, BMI [body mass index], and year to outcome (258). [Prognosis]
The nonlinear relationships between these predictor variables and lung cancer risk were estimated using restricted cubic splines. Splines for age, pack-years smoked, quit-time and smoking duration were prepared with knot placement based on the percentile distributions of these variables in smokers only. Knots for age were at 55, 60, 64, and 72 years. Knots for pack-years were at 3.25, 23.25 and 63 pack-years. Knots for quit-time were at 0, 15, and 35 years. Knots for duration were at 8, 28, and 45 years (263). [Prognosis]
Item 10b. Specify type of model, all model-building procedures (including any predictor selection), and methods for internal validation. [D]
We used the Cox proportional hazards model in the derivation dataset to estimate the coefficients associated with each potential risk factor [predictor] for the first ever recorded diagnosis of cardiovascular disease for men and women separately (278). [Prognosis]
All clinical and laboratory predictors were included in a multivariable logistic regression model (outcome: bacterial pneumonia) (279). [Diagnosis]
We chose risk factors based on prior meta-analyses and review; their ease of use in primary care settings; and whether a given risk factor was deemed modifiable or reversible by changing habits (i.e., smoking) or through therapeutic intervention; however, we were limited to factors that already had been used in the two baseline cohorts that constituted EPISEM (282). [Prognosis]
Candidate variables included all demographic, disease-related factors and patterns of care from each data source that have been shown to be a risk factor for mortality following an intensive care episode previously. Variables were initially selected following a review of the literature and consensus opinion by an expert group comprising an intensivist, general physician, intensive care trained nurse, epidemiologists, and a statistician. The identified set was reviewed and endorsed by 5 intensivists and a biostatistician who are familiar with the ANZICS APD (283). [Prognosis]
We selected 12 predictor variables for inclusion in our prediction rule from the larger set according to clinical relevance and the results of baseline descriptive statistics in our cohort of emergency department patients with symptomatic atrial fibrillation. Specifically, we reviewed the baseline characteristics of the patients who did and did not experience a 30-day adverse event and selected the 12 predictors for inclusion in the model from these 50 candidate predictors according to apparent differences in predictor representation between the 2 groups, clinical relevance, and sensibility. … [T]o limit colinearity and ensure a parsimonious model, Spearman's correlations were calculated between the clinically sensible associations within our 12 predictor variables. Specifically, Spearman's correlations were calculated between the following clinically sensible associations: (1) history of hypertension status and β-blocker and diuretic use, and (2) history of heart failure and β-blocker home use, diuretic home use, peripheral edema on physical examination, and dyspnea in the emergency department (284). [Prognosis]
We used multivariable logistic regression with backward stepwise selection with a P value greater than 0.05 for removal of variables, but we forced variables [predictors] that we considered to have great clinical relevance back into the model. We assessed additional risk factors [predictors] from clinical guidelines for possible additional effects (286). [Diagnosis]
Clinically meaningful interactions were included in the model. Their significance was tested as a group to avoid inflating type I error. All interaction terms were removed as a group, and the model was refit if results were nonsignificant. Specifically, interactions between home use of β-blockers and diuretics and between edema on physical examination and a history of heart failure were tested (284). [Prognosis]
We assessed internal validity with a bootstrapping procedure for a realistic estimate of the performance of both prediction models in similar future patients. We repeated the entire modeling process including variable selection … in 200 samples drawn with replacement from the original sample. We determined the performances of the selected prediction model and the simple rule that were developed from each bootstrap sample in the original sample. Performance measures included the average area under the ROC curve, sensitivity and specificity for both outcome measures, and computed tomography reduction at 100% sensitivity for neurosurgical interventions within each bootstrap sample (286). [Diagnosis]
Item 10c. For validation, describe how the predictions were calculated. [V]
To evaluate the performance of each prostate cancer risk calculation, we obtained the predicted probability for any prostate cancer and for aggressive prostate cancer for each patient from the PRC [Prostate Cancer Prevention Trial risk calculator] (http://deb.uthscsa.edu/URO RiskCalc/Pages/uroriskcalc.jsp) and from the SRC [Sunnybrook nomogram–based prostate cancer risk calculator] (www.prostaterisk.ca) to evaluate each prediction model performance (306). [Diagnosis]
To calculate the HSI [Hepatic Steatosis Index], we used the formula given by Lee et al [ref] to calculate the probability of having hepatic steatosis as follows:
with presence of diabetes mellitus (DM) = 1; and absence of DM = 0. ALT and AST indicate alanine aminotransferase and aspartate aminotransferase, respectively (307). [Diagnosis]
Open source code to calculate the QCancer (Colorectal) scores are available from www.qcancer.org/colorectal/ released under the GNU Lesser General Public Licence, version 3 (308). [Prognosis]
Item 10d. Specify all measures used to assess model performance and, if relevant, to compare multiple models. [D;V]
We assessed the predictive performance of the QRISK2- 2011 risk score on the THIN cohort by examining measures of calibration and discrimination. Calibration refers to how closely the predicted 10 year cardiovascular risk agrees with the observed 10 year cardiovascular risk. This was assessed for each 10th of predicted risk, ensuring 10 equally sized groups and each five year age band, by calculating the ratio of predicted to observed cardiovascular risk separately for men and for women. Calibration of the risk score predictions was assessed by plotting observed proportions versus predicted probabilities and by calculating the calibration slope.
Discrimination is the ability of the risk score to differentiate between patients who do and do not experience an event during the study period. This measure is quantified by calculating the area under the receiver operating characteristic curve statistic; a value of 0.5 represents chance and 1 represents perfect discrimination. We also calculated the D statistic and R2 statistic, which are measures of discrimination and explained variation, respectively, and are tailored towards censored survival data. Higher values for the D statistic indicate greater discrimination, where an increase of 0.1 over other risk scores is a good indicator of improved prognostic separation (117). [Prognosis; Validation]
First, we compared the abilities of the clinical decision rule and the general practitioner judgement in discriminating patients with the disease from patients without the disease, using receiver operating characteristic (ROC) curve analysis. An area under the ROC curve (AUC) of 0.5 indicates no discrimination, whereas an AUC of 1.0 indicates perfect discrimination. Then, we constructed a calibration plot to separately examine the agreement between the predicted probabilities of the decision rule with the observed outcome acute coronary syndrome and we constructed a similar calibration plot for the predicted probabilities of the general practitioner. Perfect predictions should lie on the 45-degree line for agreement with the outcome in the calibration plot (318). [Diagnosis; Development]
The accuracy of [the] internally validated and adjusted model was tested on the data of the validation set. The regression formula from the developed model was applied to all bakery workers of the validation set. The agreement between the predicted probabilities and the observed frequencies for sensitization (calibration) was evaluated graphically by plotting the predicted probabilities (x-axis) by the observed frequencies (y-axis) of the outcome. The association between predicted probabilities and observed frequencies can be described by a line with an intercept and a slope. An intercept of zero and a slope of one indicate perfect calibration. … The discrimination was assessed with the ROC area (319). [Diagnosis; Development]
We assessed the incremental prognostic value of biomarkers when added to the GRACE score by the likelihood ratio test. We used 3 complementary measures of discrimination improvement to assess the magnitude of the increase in model performance when individual biomarkers were added to GRACE: change in AUC (ΔAUC), integrated discrimination improvement (IDI), and continuous and categorical net reclassification improvement (NRI). To get a sense of clinical usefulness, we calculated the NRI (>0.02), which considers 2% as the minimum threshold for a meaningful change in predicted risk. Moreover, 2 categorical NRIs were applied with prespecified risk thresholds of 6% and 14%, chosen in accord with a previous study, or 5% and 12%, chosen in accord with the observed event rate in the present study. Categorical NRIs define upward and downward reclassification only if predicted risks move from one category to another. Since the number of biomarkers added to GRACE remained small (maximum of 2), the degree of overoptimism was likely to be small. Still, we reran the ΔAUC and IDI analyses using bootstrap internal validation and confirmed our results (338). [Prognosis; Incremental Value]
We used decision curve analysis (accounting for censored observations) to describe and compare the clinical effects of QRISK2-2011 and the NICE Framingham equation. A model is considered to have clinical value if it has the highest net benefit across the range of thresholds for which an individual would be designated at high risk. Briefly, the net benefit of a model is the difference between the proportion of true positives and the proportion of false positives weighted by the odds of the selected threshold for high risk designation. At any given threshold, the model with the higher net benefit is the preferred model (117). [Prognosis; Validation]
Item 10e. Describe any model updating (for example, recalibration) arising from the validation, if done. [V]
The coefficients of the [original diagnostic] expert model are likely subject to overfitting, as there were 25 diagnostic indicators originally under examination, but only 36 vignettes. To quantify the amount of overfitting, we determine [in our validation dataset] the shrinkage factor by studying the calibration slope b when fitting the logistic regression model … :
logit (P (Y = 1)) = a + b * logit (p)
where [Y = 1 indicates pneumonia (outcome) presence in our validation set and] p is the vector of predicted probabilities. The slope b of the linear predictor defines the shrinkage factor. Well calibrated models have b ≈ 1. Thus, we recalibrate the coefficients of the genuine expert model by multiplying them with the shrinkage factor (shrinkage after estimation) (368). [Diagnosis; Model Updating; Logistic]
In this study, we adopted the [model updating] approach of “validation by calibration” proposed by Van Houwelingen. For each risk category, a Weibull proportional hazards model was fitted using the overall survival values predicted by the [original] UISS prediction model. These expected curves were plotted against the observed Kaplan-Meier curves, and possible differences were assessed by a “calibration model,” which evaluated how much the original prognostic score was valid on the new data by testing 3 different parameters (α, β, and γ). If the joint null hypothesis on α = 0, β = −1, and γ = 1 was rejected (i.e., if discrepancies were found between observed and expected curves), estimates of the calibration model were used to recalibrate predicted probabilities. Note that recalibration does not affect the model's discrimination accuracy. Specific details of this approach are reported in the articles by Van Houwelingen and Miceli et al (369). [Prognosis; Model Updating; Survival]
Results of the external validation prompted us to update the models. We adjusted the intercept and regression coefficients of the prediction models to the Irish setting. The most important difference with the Dutch setting is the lower Hb cutoff level for donation, which affects the outcome and the breakpoint in the piecewise linear function for the predictors previous Hb level. Two methods were applied for updating: recalibration of the model and model revision. Recalibration included adjustment of the intercept and adjustment of the individual regression coefficients with the same factor, that is, the calibration slope. For the revised models, individual regression coefficients were separately adjusted. This was done by adding the predictors to the recalibrated model in a step forward manner and to test with a likelihood ratio test (p < 0.05) if they had added value. If so, the regression coefficient for that predictor was adjusted further (370). [Diagnostic; Model Updating; Logistic]
Risk Groups
Item 11. Provide details on how risk groups were created, if done. [D;V]
Once a final model was defined, patients were divided into risk groups in 2 ways: 3 groups according to low, medium, and high risk (placing cut points at the 25th and 75th percentiles of the model's risk score distribution); and 10 groups, using Cox's cut points. The latter minimize the loss of information for a given number of groups. Because the use of 3 risk groups is familiar in the clinical setting, the 3-group paradigm is used hereafter to characterize the model (374). [Prognosis; Development; Validation]
One of the goals of this model was to develop an easily accessible method for the clinician to stratify risk of patients preparing to undergo head and neck cancer surgery. To this end, we defined 3 categories of transfusion risk: low (≤15%), intermediate (15%-24%) and high (≥25%). (375) [Prognosis; Validation]
Patients were identified as high risk if their 10 year predicted cardiovascular disease risk was ≥20%, as per the guidelines set out by NICE (117). [Prognosis; Validation]
Three risk groups were identified on the basis of PI [prognostic index] distribution tertiles. The low-risk subgroup (first tertile, PI ≤8.97) had event-free survival (EFS) rates at 5 and 10 years of 100 and 89% (95% CI, 60–97%), respectively. The intermediate-risk subgroup (second tertile, 8.97 < PI 10.06) had EFS rates at 5 and 10 years of 95% (95% CI, 85–98%) and 83% (95% CI, 64–93%), respectively. The high-risk group (third tertile, PI > 10.06) had EFS rates at 5 and 10 years of 85% (95% CI, 72–92%) and 44% (95% CI, 24–63%), respectively (376). [Prognosis; Development]
Finally, a diagnostic rule was derived from the shrunken, rounded, multivariable coefficients to estimate the probability of heart failure presence, ranging from 0% to 100%. Score thresholds for ruling in and ruling out heart failure were introduced based on clinically acceptable probabilities of false-positive (20% and 30%) and false-negative (10% and 20%) diagnoses (377). [Diagnosis; Development; Validation]
Development Versus Validation
Item 12. For validation, identify any differences from the development study in setting, eligibility criteria, outcome, and predictors. [V]
… the summed GRACE risk score corresponds to an estimated probability of all-cause mortality from hospital discharge to 6 months. … [I]ts validity beyond 6 months has not been established. In this study, we examined whether this GRACE risk score calculated at hospital discharge would predict longer term (up to 4 years) mortality in a separate registry cohort … (379). [Prognosis; Different outcome]
The Wells rule was based on data obtained from referred patients suspected of having deep vein thrombosis who attended secondary care outpatient clinics. Although it is often argued that secondary care outpatients are similar to primary care patients, differences may exist because of the referral mechanism of primary care physicians. The true diagnostic or discriminative accuracy of the Wells rule has never been formally validated in primary care patients in whom DVT is suspected. A validation study is needed because the performance of any diagnostic or prognostic prediction rule tends to be lower than expected from data in the original study when it is applied to new patients, particularly when these patients are selected from other settings. We sought to quantify the diagnostic performance of the Wells rule in primary care patients and compare it with the results reported in the original studies by Wells and colleagues (188). [Diagnosis; Different setting]
When definitions of variables were not identical across the different studies (for example physical activity), we tried to use the best available variables to achieve reasonable consistency across databases. For example, in NHANES, we classified participants as “physically active” if they answered “more active” to the question, “Compare your activity with others of the same age.” Otherwise, we classified participants as “not physically active.” In ARIC, physical activity was assessed in a question with a response of “yes” or “no”, whereas in CHS, we dichotomized the physical activity question into “no” or “low” versus “moderate” or “high” (380). [Prognosis; Different predictors]
As the NWAHS did not collect data on use of antihypertensive medications, we assumed no participants were taking antihypertensive medications. Similarly, as the BMES did not collect data on a history of high blood glucose level, we assumed that no participants had such a history (381). [Prognostic; Different Predictors]
Results
Participants
Item 13a. Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. [D;V]
We calculated the 10 year estimated risk of cardiovascular for every patient in the THIN cohort using the QRISK2-2011 risk score … and 292 928 patients (14.1%) were followed up for 10 years or more (117). [Prognosis; Validation]
At time of analysis, 204 patients (66%) had died. The median follow-up for the surviving patients was 12 (range 1-84) months (391). [Prognosis; Development]
Median follow-up was computed according to the “reverse Kaplan Meier” method, which calculates potential follow-up in the same way as the Kaplan–Meier estimate of the survival function, but with the meaning of the status indicator reversed. Thus, death censors the true but unknown observation time of an individual, and censoring is an end-point (Schemper & Smith, 1996) (392). [Prognosis; Development]
Item 13b. Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome. [D;V]
Item 13c. For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors, and outcome). [V]
Model Development
Item 14a. Specify the number of participants and outcome events in each analysis. [D]
Item 14b. If done, report the unadjusted association between each candidate predictor and outcome. [D]
Model Specification
Item 15a. Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point). [D]
Item 15b. Explain how to use the prediction model. [D]
Model Performance
Item 16. Report performance measures (with confidence intervals) for the prediction model. [D;V]
Model Updating
Item 17. If done, report the results from any model updating (i.e., model specification, model performance). [V]
For the recalibrated models, all regression coefficients were multiplied by the slope of the calibration model (0.65 for men and 0.63 for women). The intercept was adjusted by multiplying the original value by the calibration slope and adding the accompanying intercept of the calibration model (−0.66 for men and −0.36 for women). To derive the revised models, regression coefficients of predictors that had added value in the recalibrated model were further adjusted. For men, regression coefficients were further adjusted for the predictors deferral at the previous visit, time since the previous visit, delta Hb, and seasonality. For women, regression coefficients were further adjusted for deferral at the previous visit and delta Hb … available as supporting information in the online version of this paper, for the exact formulas of the recalibrated and revised models to calculate the risk of Hb deferral) (370). [Diagnostic; Model Updating; Logistic]
The mis-calibration of Approach 1 indicated the need for re-calibration and we obtained a uniform shrinkage factor when we fitted logit(P(Y = 1)) = a + b*logit(p) in Approach 2. We obtained the estimates a = −1.20 and b = 0.11, indicating heavy shrinkage (368). [Diagnostic; Model Updating; Logistic]
Results of the performance of the original clinical prediction model compared with that of different models extended with genetic variables selected by the lasso method are presented in Table 3. Likelihood ratio tests were performed to test the goodness of fit between the two models. The AUC curve of the original clinical model was 0.856. Addition of TLR4 SNPs [single-nucleotide polymorphisms] to the clinical model resulted in a slightly decreased AUC. Addition of TLR9-1237 to the clinical model slightly increased the AUC curve to 0.861, though this was not significant (p = 0.570). NOD2 SNPs did not improve the clinical model (423). [Prognostic; Model Updating; Logistic]
Discussion
Limitations
Item 18. Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data). [D;V]
The most important limitation of the model for predicting a prolonged ICU stay is its complexity. We believe this complexity reflects the large number of factors that determine a prolonged ICU stay. This complexity essentially mandates the use of automated data collection and calculation. Currently, the infrequent availability of advanced health information technology in most hospitals represents a major barrier to the model's widespread use. As more institutions incorporate electronic medical records into their process flow, models such as the one described here can be of great value.
Our results have several additional limitations. First, the model's usefulness is probably limited to the U.S. because of international differences that impact ICU stay. These differences in ICU stay are also likely to adversely impact the use of ICU day 5 as a threshold for concern about a prolonged stay. Second, while capturing physiologic information on day 1 is too soon to account for the impact of complications and response to therapy, day 5 may still be too early to account for their effects. Previous studies indicate that more than half of the complications of ICU care occur after ICU day 5. Third, despite its complexity, the model fails to account for additional factors known to influence ICU stay. These include nosocomial infection, do not resuscitate orders, ICU physician staffing, ICU acquired paralysis, and ICU sedation practices. Fourth, the model's greatest inaccuracy is the under-prediction of remaining ICU stays of 2 days or less. We speculate that these findings might be explained by discharge delays aimed at avoiding night or weekend transfers or the frequency of complications on ICU days 6 to 8 (424). [Prognosis; Development; Validation]
This paper has several limitations. First, it represents assessments of resident performance at 1 program in a single specialty. In addition, our program only looks at a small range of the entire population of US medical students. The reproducibility of our findings in other settings and programs is unknown. Second, we used subjective, global assessments in conjunction with summative evaluations to assess resident performance. Although our interrater reliability was high, there is no gold standard for clinical assessment, and the best method of assessing clinical performance remains controversial. Lastly, r 2 = 0.22 for our regression analysis shows that much of the variance in mean performance ratings is unexplained. This may be due to limited information in residency applications in such critical areas as leadership skills, teamwork, and professionalism (425). [Prognosis; Development]
Interpretation
Item 19a. For validation, discuss the results with reference to performance in the development data, and any other validation data. [V]
The ABCD2 score was a combined effort by teams led by Johnston and Rothwell, who merged two separate datasets to derive high-risk clinical findings for subsequent stroke. Rothwell's dataset was small, was derived from patients who had been referred by primary care physicians and used predictor variables scored by a neurologist one to three days later. Johnston's dataset was derived from a retrospective study involving patients in California who had a transient ischemic attack.
Subsequent studies evaluating the ABCD2 score have been either retrospective studies or studies using information from databases. Ong and colleagues found a sensitivity of 96.6% for stroke within seven days when using a score of more than two to determine high risk, yet 83.6% of patients were classified as high-risk. Fothergill and coworkers retrospectively analyzed a registry of 284 patients and found that a cutoff score of less than 4 missed 4 out of 36 strokes within 7 days. Asimos and colleagues retrospectively calculated the ABCD2 score from an existing database, but they were unable to calculate the score for 37% of patients, including 154 of the 373 patients who had subsequent strokes within 7 days. Sheehan and colleagues found that the ABCD2 score discriminated well between patients who had a transient ischemic attack or minor stroke versus patients with transient neurologic symptoms resulting from other conditions, but they did not assess the score's predictive accuracy for subsequent stroke. Tsivgoulis and coworkers supported using an ABCD2 score of more than 2 as the cutoff for high risk based on the results of a small prospective study of patients who had a transient ischemic attack and were admitted to hospital. The systematic review by Giles and Rothwell found a pooled AUC of 0.72 (95% CI 0.63–0.82) for all studies meeting their search criteria, and an AUC of 0.69 (95% CI 0.64–0.74) after excluding the original derivation studies. The AUC in our study is at the low end of the confidence band of these results, approaching 0.5 (434). [Prognosis]
Item 19b. Give an overall interpretation of the results considering objectives, limitations, results from similar studies, and other relevant evidence. [D;V]
Our models rely on demographic data and laboratory markers of CKD [chronic kidney disease] severity to predict the risk of future kidney failure. Similar to previous investigators from Kaiser Permanente and the RENAAL study group, we find that a lower estimated GFR [glomerular filtration rate], higher albuminuria, younger age, and male sex predict faster progression to kidney failure. In addition, a lower serum albumin, calcium, and bicarbonate, and a higher serum phosphate also predict a higher risk of kidney failure and add to the predictive ability of estimated GFR and albuminuria. These markers may enable a better estimate of measured GFR or they may reflect disorders of tubular function or underlying processes of inflammation or malnutrition.
Although these laboratory markers have also previously been associated with progression of CKD, our work integrates them all into a single risk equation (risk calculator and Table 5, and smartphone app, available at www.qxmd.com/Kidney-Failure-Risk-Equation). In addition, we demonstrate no improvement in model performance with the addition of variables obtained from the history (diabetes and hypertension status) and the physical examination (systolic blood pressure, diastolic blood pressure, and body weight). Although these other variables are clearly important for diagnosis and management of CKD, the lack of improvement in model performance may reflect the high prevalence of these conditions in CKD and imprecision with respect to disease severity after having already accounted for estimated GFR and albuminuria (261). [Prognosis; Development; Validation]
Implications
Item 20. Discuss the potential clinical use of the model and implications for future research. [D;V]
The likelihood of influenza depends on the baseline probability of influenza in the community, the results of the clinical examination, and, optionally, the results of point of care tests for influenza. We determined the probability of influenza during each season based on data from the Centers for Disease Control and Prevention. A recent systematic review found that point of care tests are approximately 72% sensitive and 96% accurate for seasonal influenza. Using these data for seasonal probability and test accuracy, the likelihood ratios for flu score 1, a no-test/test threshold of 10% and test/treat threshold of 50%, we have summarized a suggested approach to the evaluation of patients with suspected influenza in Table 5. Physicians wishing to limit use of anti-influenza drugs should consider rapid testing even in patients who are at high risk during peak flu season. Empiric therapy might be considered for patients at high risk of complications (181). [Diagnosis; Development; Validation; Implications for Clinical Use]
To further appreciate these results, a few issues need to be addressed. First, although outpatients were included in the trial from which the data originated, for these analyses we deliberately restricted the study population to inpatients, because the PONV [postoperative nausea and vomiting] incidence in outpatients was substantially less frequent (34%) and because different types of surgery were performed (e.g. no abdominal surgery). Accordingly, our results should primarily be generalized to inpatients. It should be noted that, currently, no rules are available that were derived on both inpatients and outpatients. This is still a subject for future research, particularly given the increase of ambulatory surgery (437). [Prognosis; Incremental Value; Implications for Clinical Use]
Our study had several limitations that should be acknowledged. We combined data from 2 different populations with somewhat different inclusion criteria, although the resulting dataset has the advantage of greater generalizability because it includes patients from 2 countries during 2 different flu seasons and has an overall pretest probability typical of that for influenza season. Also, data collection was limited to adults, so it is not clear whether these findings would apply to younger patients. Although simple, the point scoring may be too complex to remember and would be aided by programming as an application for smart phones and/or the Internet (181). [Diagnosis; Development; Validation; Limitations; Implications for Research]
Other Information
Supplementary Information
Item 21. Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets. [D;V]
The design and methods of the RISK-PCI trial have been previously published [ref]. Briefly, the RISK-PCI is an observational, longitudinal, cohort, single, center trial specifically designed to generate and validate an accurate risk model to predict major adverse cardiac events after contemporary pPCI [primary percutaneous coronary intervention] in patients pretreated with 600 mg clopidogrel. Patients were recruited between February 2006 and December 2009. Informed consent was obtained from each patient. The study protocol conforms to the ethical guidelines of the Declaration of Helsinki. It was approved by a local research ethics committee and registered in the Current Controlled Trials Register—ISRCTN83474650—(www.controlled-trials.com/ISRCTN83474650) (443). [Prognosis; Development]
User-friendly calculators for the Reynolds Risk Scores for men and women can be freely accessed at www.reynoldsriskscore.org (444). [Prognosis; Incremental Value]
Open source codes to calculate the QFractureScores are available from www.qfracture.org released under the GNU lesser general public licence-version 3. (315). [Prognosis; Validation]
Funding
Item 22. Give the source of funding and the role of the funders for the present study. [D;V]
The Reynolds Risk Score Project was supported by investigator-initiated research grants from the Donald W. Reynolds Foundation (Las Vegas, Nev) with additional support from the Doris Duke Charitable Foundation (New York, NY), and the Leducq Foundation (Paris, France). The Women's Health Study cohort is supported by grants from the National Heart, Lung, and Blood Institute and the National Cancer Institute (Bethesda, Md) (208). [Prognosis; Development]
The Clinical and Translational Service Center at Weill Cornell Medical College provided partial support for data analyses. The funding source had no role in the design of our analyses, its interpretation, or the decision to submit the manuscript for publication (380). [Diagnosis; Development; Validation]
Concluding Remarks
Appendix: Members of the TRIPOD Group
References
Information & Authors
Information
Published In
History
Keywords
Copyright
Authors
Metrics & Citations
Metrics
Citations
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.
For more information or tips please see 'Downloading to a citation manager' in the Help menu.
Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann Intern Med.2015;162:W1-W73. [Epub 6 January 2015]. doi:10.7326/M14-0698
View More
Get Access
Login Options:
Purchase
You will be redirected to acponline.org to sign-in to Annals to complete your purchase.
Access to EPUBs and PDFs for FREE Annals content requires users to be registered and logged in. A subscription is not required. You can create a free account below or from the following link. You will be redirected to acponline.org to create an account that will provide access to Annals. If you are accessing the Free Annals content via your institution's access, registration is not required.
Create your Free Account
You will be redirected to acponline.org to create an account that will provide access to Annals.
Comments
0 Comments