Phasing analysis of lung cancer genomes using a long read sequencer (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-022-31133-6.pdf

Phasing analysis of lung cancer genomes using a long read sequencer

ARTICLE https://doi.org/10.1038/s41467-022-31133-6 OPEN Phasing analysis of lung cancer genomes using a long read sequencer 1234567890():,; Yoshitaka Sakamoto1,6, Shuhei Miyake1,6, Miho Oka1,2,6, Akinori Kanai1, Yosuke Kawai 3, Satoi Nagasawa1, Yuichi Shiraishi4, Katsushi Tokunaga 3, Takashi Kohno 5, Masahide Seki 1, Yutaka Suzuki 1 ✉ & Ayako Suzuki 1 ✉ Chromosomal backgrounds of cancerous mutations still remain elusive. Here, we conduct the phasing analysis of non-small cell lung cancer specimens of 20 Japanese patients. By the combinatory use of short and long read sequencing data, we obtain long phased blocks of 834 kb in N50 length with >99% concordance rate. By analyzing the obtained phasing information, we reveal that several cancer genomes harbor regions in which mutations are unevenly distributed to either of two haplotypes. Large-scale chromosomal rearrangement events, which resemble chromothripsis events but have smaller scales, occur on only one chromosome, and these events account for the observed biased distributions. Interestingly, the events are characteristic of EGFR mutation-positive lung adenocarcinomas. Further integration of long read epigenomic and transcriptomic data reveal that haploid chromosomes are not always at equivalent transcriptomic/epigenomic conditions. Distinct chromosomal backgrounds are responsible for later cancerous aberrations in a haplotype-speciﬁc manner. 1 Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan. 2 Ono Pharmaceutical Co., Ltd, Ibaraki, Japan. 3 Genome Medical Science Project (Toyama), National Center for Global Health and Medicine, Tokyo, Japan. 4 Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan. 5 Division of Genome Biology, National Cancer Center Research Institute, Tokyo, Japan. 6These authors contributed equally: Yoshitaka Sakamoto, Shuhei Miyake, Miho Oka. ✉email: ; NATURE COMMUNICATIONS | (2022)13:3464 | https://doi.org/10.1038/s41467-022-31133-6 | www.nature.com/naturecommunications 1 ARTICLE L NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-31133-6 arge-scale cancer genome studies have revealed numerous cancer-related mutations and identiﬁed key driver genes1. Several relevant drug targets and biomarkers have been identiﬁed, such as EGFR and BRAF 2–5. So far, most studies have been conducted using short read sequencers. Therefore, our current knowledge has been limited mainly to mutations that occur in small-scale regions of genomes; the so-called single nucleotide variants (SNVs) and short insertions and deletions (indels). Recently, larger genomic structural variants (SVs) have been identiﬁed in the genomes of various cancer types. These SVs are expected to have no less biological and clinical relevance. For example, both the chromosomal inversion and translocation generate oncogenic fusion genes, such as BCR-ABL6, EML4-ALK 7, and KIF5B-RET 8. In tumor-suppressor genes, such as TP53, RB1, and PTEN, large deletions frequently occur, thereby inactivating the expression and functions of these genes9. The Pan-Cancer Analysis of Whole Genomes Consortium has also focused on large-scale genomic aberrations in addition to SNVs. The consortium reported the SV signatures of 38 cancer subtypes10. Despite the potential relevance of SVs, conventional detection methods are based on short read sequencing data11 and have limited validity toward the precise detection of SVs. In fact, the conventional analytical methodology may infer the presence of SVs but can only partially reveal their complete structures. To achieve a more direct and precise detection of SVs, long read sequencing should be employed for interrogating of various aspects of cancer genomes. For this purpose, experimental and bioinformatics procedures for long read sequencing have recently recorded substantial progress. Although the ﬁdelity of existing long read sequencing technologies remains ~90% for a single-pass read, several efforts have been collectively made to improve sequence accuracy12. For example, circular consensus sequencing has been developed as a means to construct more accurate sequences with 99% identity in the PacBio platform13. Recently, Oxford Nanopore Technologies (ONT) have announced the release of Q20 chemistry and basecalling system that enables single-pass sequencing with more than 99% accuracy. It is now realistic to use long read sequencers to systematically analyze a wider range of cancerous mutations, such as SNVs, relatively large-scale SVs and chromosomal-level rearrangements. In fact, several reports on the cancer genome long read analysis have recently revealed that, occasionally, newly discovered SVs demonstrate complex patterns of genomic aberrations14–16. Another unique advantage of employing long read sequencing for cancer genome analysis lies in its potential to reveal chromosomal contexts in which cancerous mutations are harbored16. Long read sequences should directly represent a mutual relationship between two mutations detected in the same read at a single-molecule level. This so-called “haplotype phasing analysis” would shed more light on a particular event occurring in a cancer type on either of the chromosomes of diploid genomes at a single molecule and haplotype resolution17. Each haplotype may reside in a distinct condition, which might be due to their differential DNA methylation or other epigenomic statuses possibly caused by the original lineage-speciﬁc regulations or other cancerous aberrant regulations at later steps18. Therefore, the consequentially occurring mutation patterns might serve as the footprints of the cancer genome evolution and could contain essential information for elucidating the causes and effects of mutations in the same cancer genomes. It is possible that a better understanding of such chromosomal contexts of cancerous mutations will shed new light on cancerous events for patient cases whose molecular etiology remains unknown from previous short read sequencing and provide a novel therapeutic insight. 2 In this study, we conduct a phasing analysis of cancer genomes combining short and long read sequencing technologies. We use whole-genome sequencing (WGS) data obtained from Japanese non-small cell lung cancer patients, where we identify a series of complex SVs14. We have further enriched sequencing depths for accurate phasing analysis and performed epigenome and transcriptome analyses. As such, we reveal the cancerous mutations from their chromosomal backgrounds’ perspective. Here, we demonstrate that the obtained phasing results provide essential information for understanding the history of mutations and their possible causes. Results Phasing analysis of a lung cancer genome. We performed our phasing analysis using the long and short read WGS data obtained from 20 non-small cell lung cancer specimens of Japanese patients. W (...truncated)