Supplement2 June 2020

Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data

    Author, Article, and Disclosure Information


    Electronic health records (EHRs) are an increasingly important source of real-world health care data for observational research. Analyses of data collected for purposes other than research require careful consideration of data quality as well as the general research and reporting principles relevant to observational studies. The core principles for observational research in general also apply to observational research using EHR data, and these are well addressed in prior literature and guidelines. This article provides additional recommendations for EHR-based research. Considerations unique to EHR-based studies include assessment of the accuracy of computer-executable cohort definitions that can incorporate unstructured data from clinical notes and management of data challenges, such as irregular sampling, missingness, and variation across time and place. Principled application of existing research and reporting guidelines alongside these additional considerations will improve the quality of EHR-based observational studies.

    Observational research helps to advance clinical knowledge and inform the practice of medicine. Electronic health records (EHRs) contain large quantities of health care data that are captured during care and are an increasingly important resource for conducting observational health research (1). The potential value of these data relates to the large volume of data drawn from real-world practice that may include more diverse patients and conditions than are feasible to include in studies that rely on primary data collection (2, 3). Although EHRs typically provide larger quantities of clinical data than are available from surveys, registries, and clinical trials, the quality of these data—which were not collected for research purposes—raises important research and reporting considerations.

    The core considerations for observational research are the same whether the research uses data collected primarily for research purposes or EHR data collected during the course of care. These core considerations are well described in reporting guidelines, such as STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) (4). The RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) (5) guidelines extend the STROBE guideline with related recommendations for studies using routinely collected health data, which are directly relevant to EHR-based studies.

    This article is intended to complement existing guidelines by describing additional research and reporting issues that should be considered when conducting, reporting, and interpreting EHR-based studies. Issues encountered in our own prior research (6–8) and discussed by collaborative groups, such as the Observational Health Data Sciences and Informatics (OHDSI) (6) initiative, inform our recommendations. Issues that we address include assessment of the accuracy of algorithmic cohort definitions and electronic phenotyping that can incorporate unstructured data, such as that from clinical notes (9, 10), and managing common irregularities of EHR data that can bias study results, such as irregular sampling, missingness, and nonstationarity across time and place (11–13). We use 2 examples to illustrate some of these issues.

    Example 1: Identifying Primary Care Patients With High-Risk Opioid Use

    We conducted a study to quantify the prevalence of chronic opioid use and determine whether primary care prescribing guidelines could decrease it (7). Because primary data collection or manual chart abstraction would be prohibitively expensive, we used EHR data collected in the context of routine primary care. We initially sought to identify a cohort of patients with “prescription opioid misuse”; behaviors of interest included breach of opioid pain contracts (14), medication diversion, premature refills, and chronic use of high dosages. Unfortunately, we quickly found it difficult to develop an accurate computer-executable definition of “prescription opioid misuse” because formal diagnostic codes were sparse and inconsistent. This is commonly the case for clinical data not closely linked to billing or compliance incentives (15, 16). For many medical conditions, less than 10% of the affected individuals' EHRs contain the respective International Classification of Diseases (ICD) diagnosis codes (17). Diagnostic coding accuracy also varies across settings, provider types, and whether a billing code specialist assigned the code (16,18–20).

    Documentation and workflow variability introduced additional challenges. Notably, clinicians did not use a standardized electronic note template for screening questionnaires that could facilitate simple text recognition of such terms as “opioid contract” or “prescription drug monitoring program.” Thus, we had to define our cohort on the basis of alternate structured elements, such as total quantity of opioids prescribed within a given time window, while excluding patients with any history of a cancer-related diagnosis. Subsequent studies by other researchers illustrate that using algorithmic natural-language processing to refine and validate cohort definitions can identify one-third more patients with opioid misuse than identified by diagnosis codes alone (21, 22).

    Example 2: Predicting Diagnostic Test Results

    In another study, we sought to identify low-yield diagnostic tests by using EHR data available at the time of test ordering to predict whether common inpatient laboratory tests, such as magnesium, sodium, creatinine, and blood cultures, would yield abnormal results (23, 24). The values of common vital signs and laboratory tests were identified as important predictors of subsequent test results, but so was the existence and number of such measurements, so our model included these counts.

    Issues to Consider When Conducting Observational Studies of EHR Data

    Developing “Executable” Cohorts in EHRs

    Algorithmic approaches to using EHR data to identify patient cohorts expand the feasibility of large-scale observational research but require validation. These algorithms are referred to using such terms as “cohort definitions,” “health outcomes of interest,” “inclusion/exclusion criteria,” or “phenotypes.” A key step in “electronic phenotyping” is translating human-understandable descriptions into computer-executable definitions (25, 26). This step may involve simple logic that combines structured elements. Example 1 used this approach when we identified patients receiving chronic pain care as those who received opioid prescriptions from primary care providers while excluding patients with opioid prescriptions from oncology providers because these prescriptions may be for palliative care. Other approaches use probabilistic algorithms to estimate the likelihood that a patient belongs to a cohort of interest on the basis of patterns of data observed in other similar patients.

    Augmenting electronic phenotyping algorithms by including additional content from clinical notes is a popular approach, but it is not a cure-all because there can be gross documentation inconsistencies from copy-and-paste templates (27, 28) and notes may ultimately only provide incremental information beyond deliberate use of more consistently available structured data elements (29). These additional layers of complexity require their own evaluation, consistent with recommendations 6.1 and 6.2 from RECORD (5). For a sample of the cases considered, a reference standard must be established for whether they meet the cohort definition. This often requires manual chart review by multiple domain experts, with assessment of interrater reliability (for example, kappa score) (30). The algorithmic approach can then be evaluated relative to the reference standard in terms of diagnostic and information retrieval metrics (31) of precision (positive predictive value) and recall (sensitivity). This allows researchers and reviewers to assess whether the algorithmic cohort definition can be extrapolated to larger samples with satisfactory results. Such projects as Phenotype KnowledgeBase (32) and OHDSI support these efforts by collecting a growing number of publicly available, human-understandable, and computer-executable definitions.

    EHR Data Irregularities

    Confounding, a well-recognized challenge in all observational research, is magnified when studies use broadly available EHR data collected by individuals providing care rather than by those curating data for research or billing purposes. For example, because sicker patients tend to receive more testing and treatment, confounding by indication (33) can bias the predictive value of laboratory results (13). Strategies for addressing such confounding is an important topic that is well covered in existing literature (34–42).

    Missing data is another challenge in observational studies that can be magnified when EHR data are used. Data in an EHR are often missing not at random (43, 44). Gaps in a patient's record may be a result of loss to follow-up or transition to another care provider or insurer. Alternatively, data may be missing because of errors in populating a database record or incomplete linkage of different records belonging to one patient. When data are missing related to patient- or provider-specific factors, such as the patient being too sick to seek health care, the missing-at-random assumption is violated. Statistical methods generally used to handle missing data include multiple imputation and inverse probability weighting (43, 45, 46) and have been applied to studies using EHR data (11, 44). Another challenge is “nondata” generated by copying and pasting of note information or inappropriate carry-forward of discontinued medications or resolved diagnoses or symptoms. In some situations, it is possible to discern the presence of the workflow that is producing the nondata (such as audit logs for copied text), but defining true data can be challenging.

    Temporal Data Complexity

    Electronic health records can provide high-resolution, time-stamped longitudinal data. Yet, misinterpretation of such time stamps can inadvertently “leak” future data into predictive models. For example, observational analysis may indicate that length of hospital stay is associated with growth of resistant bacteria in blood cultures, but length of stay would not be useful for point-of-care predictions because it is future information. More insidious are misleading EHR time stamps, such as clinical progress notes whose contents may have a time stamp corresponding to note initiation rather than to the timing of clinical events. The time between clinical care decisions, note initiation, and note completion may be separated by many hours or even days, and thus the content of the note may reflect knowledge obtained in the future relative to note initiation. Similarly, using a hospital diagnosis-related group (DRG) for sepsis is unlikely to be valid for intrahospital bacteremia predictions, because the DRG codes are routinely assigned after hospitalization by coders after review of completed documentation (16, 47). These irregularities warrant clear specification of source and timing of available data elements in EHR-based studies, and whether they would be available in the respective live clinical settings they are intended to apply to.

    Data Nonstationarity

    Care captured in EHRs changes over time, often rapidly, as a result of the introduction of new tests and therapies, new clinical evidence, changing incentives, and EHR infrastructure alterations (such as changing vendors, modules, or naming standards). In one study predicting future hospital practices, the relevance of EHR data decayed with a half-life of about 4 months for overall practice trends (48). For individual patient charts, static clinical information can be outdated within a matter of hours (49). In another example, time variation had a strong effect on the performance of wound healing prediction models (50). Such change represents nonstationarity, in which the data-generating process changes over time (51). However, observational studies often report findings from a single snapshot of a data set in time.

    Changes in coding and documentation practices or introduction of new EHR software versions also drive data nonstationarity. As a result, study variable definitions developed by using historical data or data from a different source (such as a different health system) may find fewer subjects or the wrong subjects, while associations between treatment and effect may not hold when replicating analyses with different data (52). Nonstationarity can similarly affect calibration and clinical utility of predictive models (53). Diagnostics summarizing how longitudinal EHR data sets change over time can support observational study reporting, such as descriptive statistics year-over-year on the prevalence of categories of data (for example laboratory records, procedure records, mortality data) as well as specific data values (for example, the frequency of specific diagnosis codes).

    In example 2 (laboratory diagnostic prediction), validation on “future” data may better reflect whether the models will be generalizable to future data streams than would random cross-validation or hold-out test sets. In other words, researchers should develop models on early years of data while evaluating on later years of data. Furthermore, nonstationarity indicates that models and cohort definitions based on EHR data probably will need to be regularly updated to match current data structures and processes.

    Multisite Data Variability and Common Data Models

    Reproducibility and replication are well-accepted principles for high-quality observational research but can raise particular challenges for EHR-based studies when different clinical sites use different EHR vendors. Even with a common EHR vendor or otherwise interoperable data structures (for example, Fast Healthcare Interoperability Resources [FHIR] [54]), the idiosyncrasies of local implementation will probably require a laborious, manual, and potentially ambiguous mapping of semantic meaning of data elements. In our laboratory diagnostics example, we wanted to assess reproducibility across multiple sites (Stanford University; University of California, San Francisco; and University of Michigan), requiring manual reconciliation between each site's slightly different data representations. For example, one site may use the term “WBC,” another “white blood cells,” and yet another “white cells.” Other data have less clear reconciliation options, such as one site consolidating aerobic and anaerobic blood culture tests into a single result while another separates the 2 types of tests, preventing directly comparable results across sites.

    Consolidating standard terminologies and common data models (CDMs), such as that used in the Observational Medical Outcomes Partnership (OMOP), can facilitate multisite observational studies (55). Distributing executable analysis code can in turn provide the most explicit documentation of subtle study design choices and embedded assumptions that may be unclear in the methods sections of study reports. Provision of code enables review by external experts and can promote replication. The use of CDMs can in turn enable researchers to use turnkey tools for EHR data diagnostics and analysis developed within the respective research communities.

    Even if CDMs are used, the processes that convert raw EHR data to research variables require careful consideration and documentation because they may introduce unexpected and unquantified variation in data sets, affecting downstream analyses. For example, following OHDSI conventions to convert EHR data to the OMOP CDM involves mapping source diagnosis codes (such as ICD codes) to Systematized Nomenclature of Medicine (SNOMED) codes, but individual sites may define custom mappings such that different ICD codes may be mapped to the same SNOMED code. Such tools as OHDSI Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) (56) provide a mechanism to generate reports on data quality by flagging potential errors, such as implausible dates or missing data fields. Other research collaboratives, such as the National Patient-Centered Clinical Research Network (PCORnet) or Sentinel Initiative, have developed related approaches and frameworks for data quality assessment (57–59). In cases where site-to-site variability is directly measurable, such as that introduced by different mappings between terminologies, researchers should consider analyses to quantify and report the effect of site variability on measured associations.

    In conclusion, EHRs contain large quantities of real-world health care data and are an increasingly important data resource for observational research. Yet, analysis of data collected for nonresearch purposes requires consideration of data quality and observational research and reporting principles. Most of the important considerations for observational research using EHRs are the same as for observational research using other data and are well addressed by existing recommendations. In the Table, we summarize the considerations for EHR-based observational research that we discussed in this article and provide suggestions for reporting on these issues. Our hope is that the principled application of existing research and reporting guidelines alongside these additional considerations will improve the quality of EHR-based observational studies that drive continuously learning health care systems (60).

    Table. Recommendations for Additional Research and Reporting Considerations for Observational Research Conducted by Using EHR Data



    Sign In to Submit A Comment