The Challenge
Memorial Sloan Kettering (MSK) is a world-renowned cancer center conducting dozens of investigator-initiated trials. Unlike industry-sponsored trials, these trials had limited budgets and relied heavily on clinical staff to perform both patient care and research data entry.
The Data Quality Problem
- • Dual Data Entry: Clinical staff entered data into EMR (Epic) for patient care, then re-entered the same data into EDC (Medidata Rave) for research—creating discrepancies
- • High Query Rates: Average query rate was 12.3% across trials, with most queries related to missing or inconsistent lab values, vital signs, and adverse events
- • Delayed Data Entry: Research coordinators entered data 2-4 weeks after patient visits due to workload, making real-time monitoring impossible
- • No Standardization: Each trial had its own data flow, making it impossible to scale quality improvements across the portfolio
- • Limited IT Resources: Small clinical research IT team couldn't support custom integrations for every trial
The Director of Clinical Research Operations needed a scalable solution to improve data quality without increasing coordinator workload or requiring massive IT investment.
The Solution
I was hired as a System Analyst (promoted from Clinical Research Specialist) to design and implement a standardized data quality framework. The approach focused on reducing dual data entry and automating data quality checks.
Phase 1: EMR-to-EDC Mapping (Months 1-4)
- Data Flow Analysis: Mapped data flow for 12 trials, identifying 85 common data points collected in both EMR and EDC (labs, vitals, AEs, concomitant meds)
- Epic-to-Rave Integration: Worked with IT to build HL7 interface extracting data from Epic and pre-populating Rave forms
- Data Transformation Rules: Created mapping rules to transform Epic data formats (LOINC codes, SNOMED terms) to Rave-compatible formats
- Validation Logic: Implemented validation rules to flag discrepancies between EMR and EDC data for coordinator review
Phase 2: Automated Reconciliation (Months 5-8)
- SQL-Based Reconciliation Scripts: Built SQL scripts to compare EMR data against EDC data, generating discrepancy reports
- Weekly Reconciliation Reports: Automated weekly reports sent to coordinators highlighting missing or inconsistent data
- Priority Scoring: Ranked discrepancies by impact (critical safety data vs. non-critical demographics) to focus coordinator attention
- Closed-Loop Workflow: Coordinators reviewed reports, resolved discrepancies, and marked items as complete in tracking database
Phase 3: Standardization & Training (Months 9-12)
- Standard Data Flow Template: Created reusable template for future trials, reducing setup time from 3 months to 2 weeks
- Coordinator Training: Trained 25 research coordinators on new workflows, emphasizing how integration reduced their workload
- Documentation: Created SOPs, data flow diagrams, and troubleshooting guides for IT and coordinators
- Continuous Improvement: Established quarterly review process to refine mapping rules and add new data points
The Results
Quantitative Outcomes
Qualitative Outcomes
- Coordinator Satisfaction: Post-implementation survey showed 92% of coordinators felt the integration reduced their workload
- Scalability: Template approach allowed 8 new trials to adopt the integration in Year 2 with minimal IT effort
- Real-Time Monitoring: PI and study teams could now monitor trial progress in real-time instead of waiting for quarterly reports
- Audit Readiness: Sponsor audits found "well-documented data flow with appropriate quality controls"
Real-World Example: Catching Data Discrepancies Early
In Month 10, the weekly reconciliation report flagged a discrepancy for Patient 042: EMR showed platelet count of 45,000 (Grade 3 thrombocytopenia), but EDC showed 145,000 (normal).
The coordinator investigated and discovered a transcription error during manual data entry. The correct value was 45,000, which should have triggered a dose modification per protocol.
Impact: The error was caught within 48 hours of the lab result, allowing the PI to adjust treatment before the next dose. Without automated reconciliation, this would have been discovered during quarterly monitoring (8 weeks later), potentially compromising patient safety.
Technical Implementation Details
For those interested in the technical approach:
- •Integration Architecture: HL7 interface extracted data from Epic (ADT, ORU, DFT messages), transformed via Mirth Connect, loaded into Rave via API
- •Database Design: Created staging database (SQL Server) to store Epic data, apply transformation rules, and track reconciliation status
- •Reconciliation Logic: SQL stored procedures compared Epic vs. Rave data, calculated discrepancy scores, generated coordinator reports
- •Reporting: SSRS reports emailed to coordinators weekly, with drill-down capability to patient-level discrepancies
Lessons Learned
- 1.Start with Common Data Points: Focusing on the 85 common data points (vs. trying to automate everything) delivered 80% of the value with 20% of the effort
- 2.Coordinator Buy-In is Critical: Coordinators were initially skeptical that integration would work. Pilot success on 2 trials convinced them to adopt.
- 3.Template Approach Scales: Creating a reusable template allowed the solution to scale from 12 trials to 20+ trials without proportional IT effort
- 4.Automated Reconciliation > Manual Checks: Weekly automated reports were far more effective than quarterly manual reconciliation, catching issues before they became problems
