DATA4MICCAI
NIH Cancer Imaging Data Repositories for Biomedical Data Science Research
Date: Sept 27, 2021
Time: 10am-2pm (EDT) / 14:00-18:00 (UTC)
The National Cancer Institute (NCI), of the National Institutes of Health (NIH), has made significant investments in the
creation and development of public data repositories to enable and promote sharing and secondary analysis of cancer
imaging data through an open science approach. Since its inception in 2011, The Cancer Imaging Archive (TCIA) has
provided the imaging research community with a stable and reliable resource for sharing de-identified clinical
radiology (DICOM), and more recently digital pathology images of a variety of cancers. A growing number of
collections in TCIA contain clinical metadata, annotations, and third-party analysis results.
Recently NCI has launched the Cancer Research Data Commons (CRDC), an enterprise of cloud-based data
repositories and resources dedicated to key information modalities in cancer research (including genomics,
proteomics, and imaging) to provide the research community with a virtual and expandable infrastructure to enable
cross-domain data analysis and archival. Adhering to FAIR principles in cancer informatics, CRDC aims to further
enable innovations in data science in oncology. Imaging Data Commons (IDC),
the imaging node in CRDC, publicly
released in October 2020, connects researchers with image collections from TCIA and beyond, through a robust
infrastructure containing metadata, image-derived data (segmentations, annotations, and image-based analysis
results), allowing image browsing and connectivity to Cloud Resources for image computation, data analysis, and
archival of the results. Direct connectivity of IDC data to robust Google analytics and ML tools is an important feature
of this platform.
This tutorial aims to familiarize attendees with TCIA, IDC, their similarities and differences in features and capabilities,
including search, browsing, downloading (through TCIA), viewing and cohort collection (through IDC), and cloud
computation through the NCI Cloud Resources. All levels are welcome.
Attendees will be able to follow the demonstration session at their own pace, or during the tutorial, following
the learning materials included in the IDC documentation. To be able to follow all of the exercises and demonstration
notebooks the attendees will need to have a Google Cloud Project with billing activated. IDC provides free cloud
credits to facilitate exploration of the resource. The attendees are encouraged to request these credits in advance using
the IDC cloud credit application form.
Agenda
10:00 – 10:50 am Cancer Imaging Data Repositories
- 10 am - 10:10 am Introduction – K. Farahani
- 10:10 am - 10:30 am The Cancer Imaging Archive (TCIA)– J. Freymann
- 10:30 am - 10:50 am Imaging Data Commons (IDC) – A. Fedorov, R. Kikinis (slides)
- 10:50 am – 10:55 am Q & A
- 10:55 am Break
11 am – 11:55 am Demonstration session: TCIA Features
- 11:00 am - 11: 15 am Browsing Radiology and Pathology Data – J. Kirby
- 11:15 am - 11:30 am Searching with the Radiology Data Portal – L. Tarbox
- 11:30 am - 11:45 am Additional tools and resources – F. Prior
- 11:45 am – 11:55 am Q & A
- 11:55 am Break
12 pm – 12:30 pm Project MONAI and public data from TCIA and IDC – S. Aylward
- 12:30 pm – 12:35 pm Q & A
- 12:35 pm Break
12:45 pm – 1:35 pm Demonstration session: IDC Features and Cloud Compute
- 12:45 pm - 1:00 pm Imaging Data Commons Features - A. Fedorov
- short demonstration videos available here
- overview paper about IDC published in "Cancer Research" is here
- 1:00 pm - 1:15 pm Introduction to Google Cloud (GCP) Platform – W. Longabaugh (slides)
- 1:15 pm - 1:30 pm Utilizing IDC and GCP to support AI imaging research – D. Bontempi
- IDC notebooks are available here
- 1:30 pm – 1:45 pm Q &A
1:30 pm – 2:00 pm Open Discussion
Organizers
- Keyvan Farahani, National Cancer Institute
- Andrey Fedorov, Brigham and Women's Hospital / Harvard Medical School
- John Freymann, Fredrick National Laboratory for Cancer Research
- Justin Kirby, Fredrick National Laboratory for Cancer Research
- Ulrike Wagner, Fredrick National Laboratory for Cancer Research
- Fred Prior, University of Arkansas Medical School
- Lawrence Tarbox, University of Arkansas Medical School
- Bill Longabaugh, Institute for Systems Biology
- Suzanne Paquette, Institute for Systems Biology
- David Pot, General Dynamics IT
- Ron Kikinis, Brigham and Women's Hospital / Harvard Medical School
MICCAI 2020 tutorial page