Reconstructing spatial organizations of chromosomes through manifold learning (pdf)

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://academic.oup.com/nar/article-pdf/46/8/e50/24782540/gky065.pdf

Reconstructing spatial organizations of chromosomes through manifold learning

Nucleic Acids Research Reconstructing spatial organizations of chromosomes through manifold learning Guangxiang Zhu 2 Wenxuan Deng 1 Hailin Hu 0 Rui Ma 2 Sai Zhang 2 Jinglin Yang 2 Jian Peng 4 Tommy Kaplan 3 Jianyang Zeng 2 0 School of Medicine, Tsinghua University , Beijing 100084 , China 1 Department of Biostatistics, Yale University , New Haven, CT , USA 2 Institute for Interdisciplinary Information Sciences, Tsinghua University , Beijing 100084 , China 3 School of Computer Science and Engineering, The Hebrew University of Jerusalem , Jerusalem, 91904 , Israel 4 Department of Computer Science, University of Illinois at Urbana-Champaign , Urbana, IL , USA Decoding the spatial organizations of chromosomes has crucial implications for studying eukaryotic gene regulation. Recently, chromosomal conformation capture based technologies, such as Hi-C, have been widely used to uncover the interaction frequencies of genomic loci in a high-throughput and genome-wide manner and provide new insights into the folding of three-dimensional (3D) genome structure. In this paper, we develop a novel manifold learning based framework, called GEM (Genomic organization reconstructor based on conformational Energy and Manifold learning), to reconstruct the threedimensional organizations of chromosomes by integrating Hi-C data with biophysical feasibility. Unlike previous methods, which explicitly assume specific relationships between Hi-C interaction frequencies and spatial distances, our model directly embeds the neighboring affinities from Hi-C space into 3D Euclidean space. Extensive validations demonstrated that GEM not only greatly outperformed other stateof-art modeling methods but also provided a physically and physiologically valid 3D representations of the organizations of chromosomes. Furthermore, we for the first time apply the modeled chromatin structures to recover long-range genomic interactions missing from original Hi-C data. INTRODUCTION The three-dimensional (3D) organizations of chromosomes in nucleus are closely related to diverse genomic functions, such as transcription regulation, DNA replication and genome integrity ( 1–4 ). Therefore, decoding the 3D genomic architecture has important implications in revealing the underlying mechanisms of gene activities. Unfortunately, our current understanding on the 3D genome folding and the related cellular functions still remains largely limited. In recent years, the proximity ligation based chromosome conformation capture (3C) ( 5,6 ), and its extended methods, such as Hi-C ( 7 ) and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) ( 8 ), have provided a revolutionary tool to study the 3D organizations of chromosomes at different resolutions in various cell types, organisms and species by measuring the interaction frequencies between genomic loci nearby in space. To gain better mechanistic insights into understanding the 3D folding of the genome, it is necessary to reconstruct the 3D spatial arrangements of chromosomes based on the interaction frequencies derived from 3C-based data. Indeed, the modeling results of 3D genome structure can shed light on the relationship between complex chromatin structure and its regulatory functions in controlling genomic activities ( 1–4 ). However, the modeling of 3D chromatin structure is not a trivial task, as it is often complicated by uncertainty and sparsity in experimental data, as well as high dynamics and stochasticity of chromatin structure itself. Generally speaking, in the 3D genome structure modeling problem, we are given Hi-C data, which can be represented by a matrix where each element represents the interaction frequency of a pair of genomic loci, and our goal is to reconstruct the 3D organization of genome structure and obtain the 3D spatial coordinates of all genomic loci. In practice, in addition to Hi-C data, additional known constraints, such as the shape and size of the nucleus, can also be integrated to achieve more reliable modeling results and further enhance the physical and biological relevance of the reconstructed genomic structure ( 9,10 ). In recent years, numerous computational methods have been developed to reconstruct the 3D organizations of chromosomes ( 5,7,11–28 ). Most of these approaches, such as the multidimensional scaling (MDS) ( 29,30 ) based method, ChromSDE ( 17 ), ShRec3D ( 18 ) and miniMDS ( 27 ), heavily depended on the formula F∝1/D to represent the conversion from interaction frequencies F to spatial distances D (where is a constant). Instead of using the above relationship of inverse proportion, BACH ( 16 ) employed a Poisson distribution to define the relation between Hi-C interaction frequencies, spatial distances and other genomic features (e.g., fragment length, GC content and mappability score). After converting Hi-C interaction frequencies into distances, these previous modeling approaches applied various strategies to recon (...truncated)