HiCmapTools: a tool to access HiC contact maps.

Bioinformatics, Feb 2022

With the development of HiC technology, more and more HiC sequencing data have been produced. Although there are dozens of packages that can turn sequencing data into contact maps, there is no appropriate tool to query contact maps in order to extract ...

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8832839/pdf/

HiCmapTools: a tool to access HiC contact maps.

(2022) 23:64 Chang et al. BMC Bioinformatics https://doi.org/10.1186/s12859-022-04589-y BMC Bioinformatics Open Access SOFTWARE HiCmapTools: a tool to access HiC contact maps Jia‑Ming Chang1* , Yi‑Fu Weng1, Wei‑Ting Chang1, Fu‑An Lin1 and Giacomo Cavalli2 *Correspondence: 1 Department of Computer Science, National Chengchi University, 11605 Taipei City, Taiwan Full list of author information is available at the end of the article Abstract Background: With the development of HiC technology, more and more HiC sequenc‑ ing data have been produced. Although there are dozens of packages that can turn sequencing data into contact maps, there is no appropriate tool to query contact maps in order to extract biological information from HiC datasets. Results: We present HiCmapTools, a tool for biologists to efficiently calculate and analyze HiC maps. The complete program provides multi-query modes and analysis tools. We have validated its utility on two real biological questions: TAD loop and TAD intra-density. Conclusions: HiCmapTools supports seven access options so that biologists can quantify contact frequency of the interest sites. The tool has been implemented in C++ and R and is freely available at https://github.com/changlabtw/hicmaptools and documented at https://hicmaptools.readthedocs.io/. Keywords: Hi-C, Topologically Associating Domains (TADs), 3D genome, Juicer, hicpipe Background With the invention of the microscope, researchers gained a preliminary understanding of the chromosome’s tertiary structure. However, it was difficult to gain a more global picture, that is, until the development of chromosome conformation capture (3C) [1] and its variations 4C [2], 5C [3], and HiC [4], which have made available spatial information on the whole genome. There are now many HiC pipelines available [5, 6]. However, there is no suitable tool to access HiC map results except for visualization (Table 1), that is, no systematic way to extract HiC contact information for a specific query. For example, given the list of CTCF binding sites, a custom script is needed to compute all pairwise contacts between them. Therefore, we have developed HiCmapTools, which helps biologists efficiently query HiC maps and perform permutation tests. It supports seven query modes and attempts to cover the most frequent needs of biologists who use HiC to study chromatin contacts and their putative function. © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate‑ rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Chang et al. BMC Bioinformatics (2022) 23:64 Page 2 of 8 Table 1 Comparison between HiCmapTools and other current tools applied to HiC sequencing datasets Function HiCmapTools HiCPro [7] Juicer [8], Juicebox [9] gcMapExplorer [10] Generate HiC map x o o x Visualization x x o o Format transformation x o o o Extra submap o x dump x Query HiC map o MAKE_VIEWPOINTS.PY* x x * Generates a BED profile from a specified viewpoint (similar to the -bait query mode of HicmapTools) Implementation HiCmapTools is implemented in C+ + , which facilitates using common programming data structures and functions from the Standard Template Library (STL). Users input HiC maps in either .hic format generated by Juicer [8] or bin-contact pair files following hicpipe [11, 12]. The input contact map is stored as a hash structure using pair bins as keys. The size of the bin is specified by the user (-in_hic_resol for .hic) or depends on the input file (bin-contact pair files). A query is binned into a corresponding key based on its position to facilitate efficient extraction of contact frequency via STL hash operation (O(1) for lookup). Also, we measure the significance of the extracted frequencies using permutation tests which rank the frequency among random samples. The usage of the query mode and random test are explained below. Query mode We use seven query modes to meet the needs of biologists, as illustrated in Fig. 1. The query input is expected to be in BED format, in which each line is considered as an individual query. Sample query files are available at https://hicmaptools.readthedocs.io/en/ latest/format.html#query-file. 1. bait: calculate average contacts from downstream to upstream (controlled by -ner_ bin) of a position of interest (white rectangle). For example, biologists can measure the average contact frequency around a PRE binding site. 2. local: list all contacts inside an interval (white cross). All contacts inside a gene body can be extracted by querying specific gene loci. 3. loop: contact frequency between two ends of a loop. As an example query, biologists can test whether gene looping exists [13] by calculating the contact frequency of its promoter with the transcription termination site. A gene of interest is listed as one row in a BED file. 4. pair: contacts between a pair of regions (contact between regions X and Y, white crosses). For instance, contact frequencies between a gene promoter and an enhancer are extracted by querying their positions. 5. sites: contacts between specific sites (contacts between three sites, including diagonal). As an example, given the list of chromatin insulator sites, HiCmapTools calculates all pairwise contacts among these sites, such that users can check whether any pair of binding sites interact with each other. Chang et al. BMC Bioinformatics (2022) 23:64 Fig. 1 Illustration of query modes. Numbers indicate the corresponding query modes. HiC data is Drosophila chr3R:2000k..10000k [14] 6. submap: sub contact map of regions of interest. The HiC map is stored efficiently by keeping only selected regions (i.e., a region containing long-range contacts between two loci such as the Drosophila Antp-C and the BX-C). 7. TAD: sum and average of contacts within specific TAD regions (white dashed square at the top right of Fig. 1). Biologists can quantify chromatin compaction within a TAD by measuring the average intra-TAD contact frequency. This might be u (...truncated)


This is a preview of a remote PDF: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8832839/pdf/
Article home page: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8832839

J. Chang, Y. Weng, W. Chang, F. Lin, G. Cavalli. HiCmapTools: a tool to access HiC contact maps., Bioinformatics, 2022, pp. 64, Volume 23, Issue 1, DOI: 10.1186/s12859-022-04589-y