Development and Experience with Cancer Risk Prediction Models Using Federated Databases and Electronic Health Records

Limor Appelbaum, MD; Irving D. Kaplan, MD; Matvey B. Palchuk, MD, MS; Steven Kundrot, BS, MBA; Jessamine P. Winer-Jones, PHD; Martin Rinard, PHD

doi:10.36255/exon-publications-digital-health-federated-databases

PDF HTML XML

Published: Apr 29, 2022

DOI: https://doi.org/10.36255/exon-publications-digital-health-federated-databases

Keywords:

cancer risk prediction models, electronic health records, federated network, machine learning, pancreatic duct adenocarcinoma

Limor Appelbaum, MD

Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA

Irving D. Kaplan, MD

Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA

Matvey B. Palchuk, MD, MS

TriNetX, LLC, Cambridge, MA, USA

Steven Kundrot, BS, MBA

TriNetX, LLC, Cambridge, MA, USA

Jessamine P. Winer-Jones, PHD

TriNetX, LLC, Cambridge, MA, USA

Martin Rinard, PHD

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

ABSTRACT

Early diagnosis is critical to improving survival rates of lethal cancers, such as pancreatic duct adenocarcinoma (PDAC). However, there are no reliable screening test for these cancers. In this chapter, we present potential methods for predicting early, evolving cancers by leveraging readily available electronic health record (EHR) data and machine learning. We discuss the various aspects of our collaborative experience, involving clinical and computer scientists, in navigating the process of using EHRs to develop cancer risk prediction models. This chapter is intended to serve as a guide to others preforming this type of research. We cover the different steps involved, based on our initial experience of model development using single-institution data, including data acquisition, querying and downloading data, protecting patient confidentiality, data curation, model development, and validation. Challenges encountered when using single-institution data is presented, along with lessons learned. Drawing from our experience working with a federated database of EHR data from multiple institutions to develop a risk prediction model for PDAC, we also discuss how many of these challenges can be addressed by using such a federated database of EHR data. We also discuss future clinical opportunities that may arise from leveraging data from a federated network, such as the deployment of risk models for clinical studies.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Book

Digital Health

Section

Chapter 2

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright of individual chapters belongs to the respective authors. The authors grant unrestricted publishing and distribution rights to the publisher. The electronic versions of the chapters are published under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). Users are allowed to share and adapt the chapters for any non-commercial purposes as long as the authors and the publisher are explicitly identified and properly acknowledged as the original source. The books in their entirety are subject to copyright by the publisher. The reproduction, modification, republication and display of the books in their entirety, in any form, by anyone, for commercial purposes are strictly prohibited without the written consent of the publisher.

Article Sidebar

Main Article Content

Downloads

Metrics

Article Details