Description:
Background: Non-experimental studies using large healthcare databases may be well-suited for addressing relevant questions in clinical oncology that pertain to the safety and effectiveness of medications. They complement randomized trials by including frail and complex patients seen in routine care that reflect real-world practice patterns and treatment adherence. Historically, pharmacoepidemiology research in the oncology setting has been limited, mainly due to poor capture of important confounding factors in real-world data sources (e.g., tumor grade, histology, and location, laboratory values, biomarkers, and performance status). However, more recently, quality and availability of secondary data in oncology have been emerging in specialized electronic health record (EHR) systems. These longitudinal databases are derived from several major sources of clinical information: 1) Physician medication ordering systems, 2) Physician notes from outpatient oncology encounters, 3) Molecular diagnostics, 4) Structured fields within the health record. Collectively, such data sources permit ascertainment of patients’ demographics, cancer types, treatment history, and an array of confounders and health outcomes necessary for comparative effectiveness studies of oncology drugs. Despite these advancements, the use of oncology EHR databases still poses many challenges that stem from a lack of linkage to alternative data sources, such as claims or high-quality tumor registries. This results in poor capture of out-of-network encounters, medical procedures, or inpatient encounters, as well as missing data. Consequently, it is unknown whether these challenges can be overcome with currently available epidemiological and statistical methods, and ultimately if these data are suitable for clinical investigations.
The objectives of this body of work are to: 1) explore the utility of specialty oncology EHR databases in comparative effectiveness research; 2) build a framework that will support drawing causal conclusions from EHR-based studies in the oncology setting in light of the limitations of EHRs; and 3) identify and implement markers for data quality and study validity that can be used to assess confidence in findings. To achieve these objectives, two comparative effectiveness studies of first-line treatments for advanced breast cancer were conducted and calibrated against randomized clinical trials—the PALOMA-2 trial and the PARSIFAL trial. Additionally, an algorithm was constructed to predict completeness in an EHR-based oncology cohort, which was subsequently implemented in the two comparative effectiveness studies as a sensitivity analysis. In particular, effect estimates in the non-randomized studies were calculated among subjects with increasingly higher levels of predicted data completeness to see if the estimates converged to the randomized trial estimates. In this way, predicted completeness was assessed as a potential tool to improve study validity.
Methods: To construct the prediction algorithm for data completeness, a Medicare-linked EHR database derived from two academic medical centers in Massachusetts was used. This linked database was constructed from many sources of clinical information; namely, healthcare claims (inpatient, outpatient, and pharmacy), physician drug orders, unstructured notes, and billing codes from medical procedures and inpatient or outpatient provider encounters. This permitted ascertainment of patient demographics, vitals, height and weight, medical procedures, medications, timing of provider encounters, and diagnoses, which were used to create candidate predictors of data completeness. The study population consisted of subjects that had a year of continuous enrollment in Medicare, were at least 65 years old, and had one or more outpatient oncology encounter in the EHR system. Data completeness was quantified by the “continuity ratio,” defined as the yearly proportion of outpatient encounters reported to Medicare that were captured by EHR data. Least absolute shrinkage and selection operator (LASSO) regression was used to select candidate predictors, which were regressed on continuity ratio. The performance of the final model was assessed using the coefficient of determination and Spearman’s correlation of predicted vs. observed EHR-continuity. We quantified misclassification of several comorbidities and medications within deciles of continuity ratio by calculating the ratio and standardized difference of the proportion of subjects classified as having each covariate when using outpatient EHR data alone vs. outpatient EHR data and claims.
For the first comparative effectiveness study, an oncology EHR database derived from outpatient oncology practices within the US Oncology Network was used to estimate the rate of time-to-next-treatment (TTNT) in palbociclib-letrozole users versus letrozole-only users. TTNT was chosen as an endpoint because it was well-observed in the EHR database and appeared to serve as a meaningful surrogate for treatment effectiveness in the PALOMA-2 trial. All eligibility criteria, treatments, and outcome variables were defined to mimic the trial as closely as possible. Patients with evidence of a breast cancer subtype inconsistent with the PALOMA-2 study population (i.e., hormone-negative, HER-2 positive) were excluded. To address missing data, 50 complete datasets were constructed using multiple imputation by chained equations. In each of the imputed datasets, a Cox proportional hazards model was fit to estimate the hazard ratio of TTNT in an intention-to-treat analysis analogous to the trial. All 50 estimates were subsequently pooled.
In the second comparative effectiveness study, a similar approach was undertaken. We used the same longitudinal EHR data from outpatient oncology practices across the US to emulate the PARSIFAL trial in its treatments and selection criteria as closely as possible. Multiple imputation was employed to account for missing data in patient characteristics. Baseline characteristics were compared and hazard ratios with 95% confidence intervals for overall survival were estimated fitting a multivariable proportional hazards model. Findings in both comparative effectiveness studies were compared to their respective RCTs result with qualitative assessment and standardized difference estimates.
Results: In the PALOMA-2 emulation study, there were 3,836 study-eligible advanced breast cancer subjects. The hazard ratio for TTNT in the observational study (HR: 0.62; 95% CI: 0.56-0.68) was closely aligned with that of the randomized trial (HR: 0.64; 95% CI: 0.52-0.78) (Standardized Difference = -0.05). In the PARSIFAL trial emulation, 1,886 subjects were selected into the study cohort following application of all eligibility criteria. Although the 3-year survival was meaningfully lower in clinical practice (59%) compared to the RCT (78%), the relative effect size was HR=1.07 (95% CI: 0.86 – 1.35), similar to the RCT (HR=1.00; 0.68 – 1.48, Standardized Difference = 0.04). Restriction of the study cohort by increasing levels of continuity ratio did not appreciably influence effect estimates in the PALOMA-2 trial emulation, but shifted the effect estimate of the PARSIFAL trial emulation away from the RCT estimate with wider confidence intervals.
Conclusion: This body of work calls for more emulations using a principled approach and methods for addressing the various threats to validity that can arise from the use of oncology EHR databases. Likewise, agreed-upon reporting standards can facilitate summarization of global efforts in advancing the use of RWD in clinical oncology. In the context of comparative effectiveness studies of oncology drugs, confounding may not be the most critical issue given the current data density in oncology EHR systems. Rather, it may be that more complete data will be needed for specific outcomes and possibly biomarkers. Overall, the field of real-world evidence in oncology is developing in a very positive direction as we are applying causal inference methods and as data sources continue to evolve and become richer in data granularity and continuity.