HiCmapTools: a tool to access HiC contact maps.
(2022) 23:64
Chang et al. BMC Bioinformatics
https://doi.org/10.1186/s12859-022-04589-y
BMC Bioinformatics
Open Access
SOFTWARE
HiCmapTools: a tool to access HiC contact
maps
Jia‑Ming Chang1* , Yi‑Fu Weng1, Wei‑Ting Chang1, Fu‑An Lin1 and Giacomo Cavalli2
*Correspondence:
1
Department of Computer
Science, National Chengchi
University, 11605 Taipei City,
Taiwan
Full list of author information
is available at the end of the
article
Abstract
Background: With the development of HiC technology, more and more HiC sequenc‑
ing data have been produced. Although there are dozens of packages that can turn
sequencing data into contact maps, there is no appropriate tool to query contact maps
in order to extract biological information from HiC datasets.
Results: We present HiCmapTools, a tool for biologists to efficiently calculate and
analyze HiC maps. The complete program provides multi-query modes and analysis
tools. We have validated its utility on two real biological questions: TAD loop and TAD
intra-density.
Conclusions: HiCmapTools supports seven access options so that biologists can
quantify contact frequency of the interest sites. The tool has been implemented in
C++ and R and is freely available at https://github.com/changlabtw/hicmaptools and
documented at https://hicmaptools.readthedocs.io/.
Keywords: Hi-C, Topologically Associating Domains (TADs), 3D genome, Juicer,
hicpipe
Background
With the invention of the microscope, researchers gained a preliminary understanding
of the chromosome’s tertiary structure. However, it was difficult to gain a more global
picture, that is, until the development of chromosome conformation capture (3C) [1]
and its variations 4C [2], 5C [3], and HiC [4], which have made available spatial information on the whole genome. There are now many HiC pipelines available [5, 6]. However,
there is no suitable tool to access HiC map results except for visualization (Table 1), that
is, no systematic way to extract HiC contact information for a specific query. For example, given the list of CTCF binding sites, a custom script is needed to compute all pairwise contacts between them. Therefore, we have developed HiCmapTools, which helps
biologists efficiently query HiC maps and perform permutation tests. It supports seven
query modes and attempts to cover the most frequent needs of biologists who use HiC
to study chromatin contacts and their putative function.
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate‑
rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi
cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Chang et al. BMC Bioinformatics
(2022) 23:64
Page 2 of 8
Table 1 Comparison between HiCmapTools and other current tools applied to HiC sequencing
datasets
Function
HiCmapTools
HiCPro [7]
Juicer [8],
Juicebox [9]
gcMapExplorer
[10]
Generate HiC map
x
o
o
x
Visualization
x
x
o
o
Format transformation
x
o
o
o
Extra submap
o
x
dump
x
Query HiC map
o
MAKE_VIEWPOINTS.PY*
x
x
*
Generates a BED profile from a specified viewpoint (similar to the -bait query mode of HicmapTools)
Implementation
HiCmapTools is implemented in C+ + , which facilitates using common programming
data structures and functions from the Standard Template Library (STL). Users input
HiC maps in either .hic format generated by Juicer [8] or bin-contact pair files following hicpipe [11, 12]. The input contact map is stored as a hash structure using pair
bins as keys. The size of the bin is specified by the user (-in_hic_resol for .hic) or
depends on the input file (bin-contact pair files). A query is binned into a corresponding key based on its position to facilitate efficient extraction of contact frequency via
STL hash operation (O(1) for lookup). Also, we measure the significance of the extracted
frequencies using permutation tests which rank the frequency among random samples.
The usage of the query mode and random test are explained below.
Query mode
We use seven query modes to meet the needs of biologists, as illustrated in Fig. 1. The
query input is expected to be in BED format, in which each line is considered as an individual query. Sample query files are available at https://hicmaptools.readthedocs.io/en/
latest/format.html#query-file.
1. bait: calculate average contacts from downstream to upstream (controlled by -ner_
bin) of a position of interest (white rectangle). For example, biologists can measure
the average contact frequency around a PRE binding site.
2. local: list all contacts inside an interval (white cross). All contacts inside a gene body
can be extracted by querying specific gene loci.
3. loop: contact frequency between two ends of a loop. As an example query, biologists
can test whether gene looping exists [13] by calculating the contact frequency of its
promoter with the transcription termination site. A gene of interest is listed as one
row in a BED file.
4. pair: contacts between a pair of regions (contact between regions X and Y, white
crosses). For instance, contact frequencies between a gene promoter and an enhancer
are extracted by querying their positions.
5. sites: contacts between specific sites (contacts between three sites, including diagonal). As an example, given the list of chromatin insulator sites, HiCmapTools calculates all pairwise contacts among these sites, such that users can check whether any
pair of binding sites interact with each other.
Chang et al. BMC Bioinformatics
(2022) 23:64
Fig. 1 Illustration of query modes. Numbers indicate the corresponding query modes. HiC data is Drosophila
chr3R:2000k..10000k [14]
6. submap: sub contact map of regions of interest. The HiC map is stored efficiently by
keeping only selected regions (i.e., a region containing long-range contacts between
two loci such as the Drosophila Antp-C and the BX-C).
7. TAD: sum and average of contacts within specific TAD regions (white dashed square
at the top right of Fig. 1). Biologists can quantify chromatin compaction within a
TAD by measuring the average intra-TAD contact frequency. This might be u (...truncated)