STATUS: Active Project
Privacy preserving record linkage (PPRL) is a linking methodology that works to mitigate privacy concerns when linking person level data from disparate data sources. PPRL allows for additional privacy protections that encrypt or mask personally identifiable information (PII) used for person level matching. There are both open source and commercial PPRL tools available to researchers; however, different PPRL tools may produce different linkage results, which in turn can potentially affect patient-centered outcomes research (PCOR) findings. To maximize the potential of PPRL for PCOR data resources, it is important to understand the attributes of available tools and their linkage accuracy.
Leveraging the extensive NCHS-CDC linked data repository, this project will assess linkage results from three PPRL tools currently in use or in development within the Department of Health and Human Services (HHS). NCHS-CDC will conduct analyses comparing linkage results obtained through the tools with linked data resources developed using gold standard linkage methods. The project will assess a variety of scenarios, including PII that is non-standardized, incomplete (e.g., missing unique identification numbers), and of varying levels of quality.
PROJECT PURPOSE & GOALS
To foster transparency and increase confidence in the validity of data resources created through PPRL, this project will assess open-source and commercial PPRL tools and gather lessons learned regarding working with PPRL.
The purpose of this project is to address the following objectives:
Adapt open-source PPRL tools and obtain licenses for commercial PPRL tools.
Compare PPRL tools’ performance to the benchmark NCHS-CDC linked data files.
Conduct an analysis that considers the security and re-identification risks of the three PPRL tools to join records across multiple data sources.
Engage HHS stakeholders and disseminate findings of the project.