Cancer databases

TCGA

The Cancer Genome Atlas
"The landmark multi-omics atlas of 20k+ tumors across 33 cancer types."
multi-omics20k-tumorslandmark

About the resource

The Cancer Genome Atlas, a joint NCI/NHGRI program running 2006–2018, characterised more than 20,000 primary tumors and matched normals across 33 cancer types using whole-exome and whole-genome sequencing, RNA-seq, miRNA-seq, methylation, copy-number, reverse-phase protein arrays and clinical follow-up.

Its data sit in the Genomic Data Commons (GDC) as a harmonised, GRCh38-aligned reference and are mirrored in dozens of downstream portals (cBioPortal, the UCSC Xena browser, Firebrowse, the GDC Data Portal, ISB-CGC and Terra). TCGA remains the most-analysed cancer dataset in the world — a generation of disease-subtype, biomarker and pan-cancer findings rest on it.

What you'd use it for

  1. 01Run a pan-cancer or single-cancer-type multi-omics analysis
  2. 02Pull harmonised TCGA data into a cloud workspace
  3. 03Use as the reference cohort for biomarker discovery
  4. 04Cross-reference candidate driver alterations against TCGA frequencies

How you access it

GDC Data PortalGDC APICloud workspaces (Terra/ISB-CGC)Mirrored derived data in cBioPortal/Xena

Closely related resources