Case Study: MSK Data Quality Initiative

The Challenge

Memorial Sloan Kettering (MSK) is a world-renowned cancer center conducting dozens of investigator-initiated trials. Unlike industry-sponsored trials, these trials had limited budgets and relied heavily on clinical staff to perform both patient care and research data entry.

The Data Quality Problem

• Dual Data Entry: Clinical staff entered data into EMR (Epic) for patient care, then re-entered the same data into EDC (Medidata Rave) for research—creating discrepancies
• High Query Rates: Average query rate was 12.3% across trials, with most queries related to missing or inconsistent lab values, vital signs, and adverse events
• Delayed Data Entry: Research coordinators entered data 2-4 weeks after patient visits due to workload, making real-time monitoring impossible
• No Standardization: Each trial had its own data flow, making it impossible to scale quality improvements across the portfolio
• Limited IT Resources: Small clinical research IT team couldn't support custom integrations for every trial

The Director of Clinical Research Operations needed a scalable solution to improve data quality without increasing coordinator workload or requiring massive IT investment.

The Solution

I was hired as a System Analyst (promoted from Clinical Research Specialist) to design and implement a standardized data quality framework. The approach focused on reducing dual data entry and automating data quality checks.

Phase 1: EMR-to-EDC Mapping (Months 1-4)

Data Flow Analysis: Mapped data flow for 12 trials, identifying 85 common data points collected in both EMR and EDC (labs, vitals, AEs, concomitant meds)
Epic-to-Rave Integration: Worked with IT to build HL7 interface extracting data from Epic and pre-populating Rave forms
Data Transformation Rules: Created mapping rules to transform Epic data formats (LOINC codes, SNOMED terms) to Rave-compatible formats
Validation Logic: Implemented validation rules to flag discrepancies between EMR and EDC data for coordinator review

Phase 2: Automated Reconciliation (Months 5-8)

SQL-Based Reconciliation Scripts: Built SQL scripts to compare EMR data against EDC data, generating discrepancy reports
Weekly Reconciliation Reports: Automated weekly reports sent to coordinators highlighting missing or inconsistent data
Priority Scoring: Ranked discrepancies by impact (critical safety data vs. non-critical demographics) to focus coordinator attention
Closed-Loop Workflow: Coordinators reviewed reports, resolved discrepancies, and marked items as complete in tracking database

Phase 3: Standardization & Training (Months 9-12)

Standard Data Flow Template: Created reusable template for future trials, reducing setup time from 3 months to 2 weeks
Coordinator Training: Trained 25 research coordinators on new workflows, emphasizing how integration reduced their workload
Documentation: Created SOPs, data flow diagrams, and troubleshooting guides for IT and coordinators
Continuous Improvement: Established quarterly review process to refine mapping rules and add new data points

The Results

Quantitative Outcomes

18%

Data Quality Improvement

Query rate decreased from 12.3% to 10.1% across 12 trials

40%

Faster Data Entry

EMR pre-population reduced data entry time from 45 min to 27 min per patient visit

85%

Data Points Automated

85 of 100 common data points now pre-populated from EMR

2 weeks

Faster Data Availability

Data available in EDC within 48 hours of visit (down from 2-4 weeks)

Qualitative Outcomes

Coordinator Satisfaction: Post-implementation survey showed 92% of coordinators felt the integration reduced their workload
Scalability: Template approach allowed 8 new trials to adopt the integration in Year 2 with minimal IT effort
Real-Time Monitoring: PI and study teams could now monitor trial progress in real-time instead of waiting for quarterly reports
Audit Readiness: Sponsor audits found "well-documented data flow with appropriate quality controls"

Real-World Example: Catching Data Discrepancies Early

In Month 10, the weekly reconciliation report flagged a discrepancy for Patient 042: EMR showed platelet count of 45,000 (Grade 3 thrombocytopenia), but EDC showed 145,000 (normal).

The coordinator investigated and discovered a transcription error during manual data entry. The correct value was 45,000, which should have triggered a dose modification per protocol.

Impact: The error was caught within 48 hours of the lab result, allowing the PI to adjust treatment before the next dose. Without automated reconciliation, this would have been discovered during quarterly monitoring (8 weeks later), potentially compromising patient safety.

Technical Implementation Details

For those interested in the technical approach:

•
Integration Architecture: HL7 interface extracted data from Epic (ADT, ORU, DFT messages), transformed via Mirth Connect, loaded into Rave via API
•
Database Design: Created staging database (SQL Server) to store Epic data, apply transformation rules, and track reconciliation status
•
Reconciliation Logic: SQL stored procedures compared Epic vs. Rave data, calculated discrepancy scores, generated coordinator reports
•
Reporting: SSRS reports emailed to coordinators weekly, with drill-down capability to patient-level discrepancies

Lessons Learned

1.
Start with Common Data Points: Focusing on the 85 common data points (vs. trying to automate everything) delivered 80% of the value with 20% of the effort
2.
Coordinator Buy-In is Critical: Coordinators were initially skeptical that integration would work. Pilot success on 2 trials convinced them to adopt.
3.
Template Approach Scales: Creating a reusable template allowed the solution to scale from 12 trials to 20+ trials without proportional IT effort
4.
Automated Reconciliation > Manual Checks: Weekly automated reports were far more effective than quarterly manual reconciliation, catching issues before they became problems

Data Quality Initiative at Memorial Sloan Kettering

Key Results