361 Views
March 18, 26
スライド概要
From Insurance Records to Epidemiology: Provenance and Processing of Japanese Claims Data
2026年1月24日
https://doi.org/10.37737/ace.27004
京都大学大学院医学研究科社会健康医学系専攻薬剤疫学分野 講師
From Insurance Records to Epidemiology: Provenance and Processing of Japanese Claims Data Yoko M. Nakao, Toshiki Fukasawa, and Koji Kawakami Annals of Clinical Epidemiology
Administrative claims data in Japan – Benefits Central resources for epidemiology and health services research Large sample sizes, longitudinal coverage, and standardized coding “Ready-made” sources for descriptive, causal, and predictive research 2
Administrative claims data in Japan – Shortcomings By-products of insurance and reimbursement system, not purpose-built research registries Captured and missing information determined by institutional arrangements, fee schedules, and clinical workflows Risk of misinterpreting variables, underestimating bias, and over-trusting sophisticated analyses 3
Study question and key insights Study question How must investigators understand the generation and processing of the Japanese claims data for carrying out rigorous research? Key insight By explicitly understanding data provenance and data processing 4
Data journey from real-world patients to analysis-ready datasets Target population Managing the data's chain of custody Understanding how data travel from the study world to analysis, and how they change along the way. Reality Study population Analysis-ready database 01 02 03 04 05 06 Clinical encounters Documentation Billing/claims reviews Database construction Extraction Processing 5
Elements of data provenance Purpose of capture Coding systems and versions Extraction cadence and latency Native unit of record ID handling Custodian edits Coverage frame Separation of medical vs dispensing 6
Elements of data processing Null/unknown policy Parsing and normalization Unit of analysis Date anchors and time granularity Eligibility windows and continuous enrolment Re-billing and void handling Death ascertainment hierarchy Cross-record linkage keys Code/master version locking Medication timing interpretation Refresh model and backfills Outcome ascertainment logic Facility/provider recoding Line consolidation within claims 7
Linking provenance and processing to the research question Provenance Processing Research question 8
Descriptive studies Investigators should confirm: Who is included in the database Which clinical events are observable as claims Whether both are recorded consistently over time 9
Causal studies Investigators should confirm: Whether treatment strategies can be reconstructed with sufficient timing detail to define a clear time zero Whether outcomes and key covariates are observable in the relevant care settings Whether follow-up and censoring events are captured consistently over time 10
Prediction modelling Investigators should: First specify where and when the model will be used because this determines what information is available at the moment of prediction Provenance clarifies which variables are routinely recorded in that setting 11
From healthcare services and insurance payments to construction of claims databases Healthcare services Copayment Medical institutions, pharmacies Insurance card issuance Insured persons Premium payment Insurers Examination and payment organizations Payment Payment Claim Claim NDB KDB, Kokuho Database; NDB, National Database of Health Insurance Claims and Specific Health Checkups. KDB JMDC DB, DeSC DB 12
Converting raw claims data into high-quality analytical datasets Parsing and interpreting multiple record types *Common claim *Insurer *Medical institution and pharmacy information *Disease and diagnosis *Procedure *Medication, prescription, and dispensation *Coding data Standardizing variables and constructing relational database structures *Beneficiary *Diagnosis *Procedure *Drug Constructing master tables for code interpretation *Diagnosis master *Procedure master *Drug master *Facility master Building cohorts tailored to specific research questions 13
Comparative effectiveness study of sustained treatment strategies for antihypertensive drugs Patients with hypertension Antihypertensive drug treatment Outcomes 14
Discussion – Implications of clear provenance and processing Better recognition of limitations Design of appropriate epidemiological studies Interpretation of findings in light of data history 15
Conclusion Rather than treating claims data as neutral ‘healthcare data’, this perspective: Highlights their origins as administrative records Emphasizes that understanding their generation and transformation is a prerequisite for valid epidemiology 16 16
Thank You Thank you Yoko M. Nakao, Toshiki Fukasawa, and Koji Kawakami Annals of Clinical Epidemiology