Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Establishing the Governance, Legal and Analytical Framework for a Federated Linked Data System: Creating a New Data Research Environment

Establishing the Governance, Legal and Analytical Framework for a Federated Linked Data System: Creating a New Data Research Environment
  • National Institutes of Health (NIH)/National Cancer Institute (NCI)
Start Date
  • 04/30/2024
OS-PCORTF Strategic Plan Goal Alignment
  • Primary: Goal 1:Data Capacity for National Health Priorities
  • Secondary: Goal 2. Data Standards and Linkages for Longitudinal Research


STATUS: Active Project


Leadership across the HHS has selected the Cancer Moonshot as one of two use cases for the HHS Data Strategy. This proposal supports one of the four workstreams identified as part of the HHS Data Strategy Cancer Moonshot use case, “Develop a governance framework for a federated linked data system”. The objectives for this workstream focus on developing a framework for data access/use governance, security/privacy controls, technical blueprint and a sustainable operating model for a federated linked data system to facilitate the linking of data from multiple sources and types for cancer patients, such as administrative claims, clinical, etc., while preserving PII/PHI.

Developing a framework for a federated linked data system will address several issues that HHS faces when attempting to link datasets from multiple sources and making them available for patient-centered outcomes research and other health-related research:

  • First, due to growing data security concerns and an increase in data breaches, there are significant risks associated with linking datasets that are stored in multiple systems. Agencies have strict privacy and security controls in place that limit the movement of data and linking of data. Therefore, it is essential to develop a framework that enables data to be linked, queried, and analyzed without transfer from the primary storage location.
  • Second, it can be difficult to share and link different types of data. For instance, cancer registry data, such as the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) Program or the Center for Disease Control’s National Program of Cancer Registries, are collected as part of public health surveillance and without informed consent from each individual, while for many genomic datasets, especially those generated as part of a clinical trial, individual informed consent is obtained. The ability to link and combine a range of data types and those collected for purposes other than research, is extremely valuable for comparative effectiveness research and patient centered outcomes research in addition to health research in general. Creating a standard framework will address data linking and sharing challenges, thereby creating more robust and comprehensive data and analytic resources.
  • Finally, each agency spends a lot of money and time to generate, store, protect and use their data, typically applying their process use case by use case. The federated linked data system model would allow for each agency to gain access to additional data, allow sharing of ideas and methods across agencies and create time/effort/monetary efficiencies.


The overarching goal of this project is to build and strengthen data infrastructure for cancer research, including PCOR studies, by developing a federated linked data system governance framework that will enable secure linkage of data across agencies and care settings and therefore improve researchers’ access to comprehensive and longitudinal patient-level data.

The resulting legal and governance framework, in conjunction with the technical infrastructure developed by the ARPA-H project and appropriate access/security/privacy controls will create an operational, scalable federated data system that improves the capacity for researchers to address questions important to patients, caregivers, clinicians, and policymakers.


A governance framework for a secure federated linked data system, including a framework for data access/use governance, security/privacy controls, technical blueprint, and a sustainable operating model.

This governance framework will improve (1) the efficiency with which PCOR studies can be designed and conducted, given the availability of accessible, standardized, and linked data and (2) the robustness of the evidence that is generated, by leveraging a more comprehensive set of data and analytic resources.