SubHero Banner

John Seeger, PhD, chief scientific officer, Optum® Epidemiology, shares his perspectives on how to use electronic health records (EHR) most effectively.

Optum data sources have historically been used by our Epidemiology group and by many of our clients to answer a wide range of questions, from background epidemiology to comparative drug safety or effectiveness.

Traditionally, these studies have been sourced from claims data, and the limitations this data source imposes have increasingly been recognized.

More recently, the emergence of EHR data assembled into large databases has resulted in a convergence between the clinical perspective of care, which involves a single patient at a time, with the epidemiological perspective of health effects within and across populations. 

EHR can provide both exquisite clinical detail and large patient population numbers, but there are important issues to consider to ensure its most effective use.

The real-world aspect of EHR data is very different from protocol-driven data

Effective use of EHR data requires the creation of a new analytic paradigm. I see it as a mistake to attempt to fit EHR data into a research and analytic paradigm that has previously been developed for claims, registry or trial data.

When EHR data is used appropriately with a model that has been developed specifically to address its unique features, it is an invaluable data resource that can provide an abundance of insight. 

Always keep in mind that to use EHR effectively, you have to look through the right lens.

Be aware that with EHR, data is not missing

It is a common misapplication of the EHR paradigm to expect data to be available as if it had been collected according to a protocol in the same way it is in a clinical trial. In routine care, some data may not have been collected as part of the patient’s care on a particular day. 

What is captured in routine care may differ from what would be captured according to a protocol — and may differ from standards of care. Because of this, a researcher needs to approach the study design and analysis of EHR data differently.

Remember that the mechanism underlying missing data is different with EHR. Data in the EHR is not missing because someone was supposed to collect it and didn’t. It could simply be missing because it was not relevant for patient care. Maybe blood pressure wasn’t taken that day, maybe the patient couldn’t get their blood drawn, etc.

Understanding the data-generating mechanism is key

When observed patterns in the data don’t match expectation, it’s essential to consider the process by which data from the patient encounter became part of the research file. 

How well can you answer: How did the data point become a part of the patient’s EHR? How did the patient’s EHR become part of the EHR database? What transformations have been applied to the data point in the process of forming the EHR database?

Obtaining answers to questions like these will help address why the expectation differs from observation.

There may be errors in the data, which is why you need to understand key aspects of the mechanism and plan accordingly.

It is important to set rules

The transformation of the varied different electronic medical record (EMR) systems into a single uniform dataset necessarily involves some changing of the source data into the common structure. These transformation steps may result in artifacts that don’t appear to make sense clinically. 

It may not be possible to track down and explain each and every anomaly observed. This is why it is necessary to determine which data points and values you are comfortable using in the research and develop rules for handling those that do not fit the rules. 

There may be some things present in the data that don’t make sense. It’s a good idea to work with clinical colleagues who can set fences around what you are going to be dealing with.

Trial design needs to address the fact that EHR is not a closed system

To effectively use EHR, it’s important to recognize that patient services that do not come from providers who contribute to the EHR database will not be visible in the same way that that they are in a closed system, such as a health insurance claims database. 

However, there are study designs that can be used to mitigate the consequences of this. These can include use of contemporaneous cohort designs that lead to relative incidence measures where the incomplete follow-up might be the same in the exposed and comparison cohorts, so it drops out of the equation.

It’s important to know what outcome you are looking to study and design the trial so that you can adequately capture EHR data for that outcome.

In summary, EHR databases can be used to provide the basis for sound inference, provided they are used appropriately. 

I see so much potential in EHR databases and their ability to give us greater clinical insight, which enables a study to address research questions that might previously have only been resolved through trials.