Subtypes of associated protein–DNA (Transcription Factor-Transcription Factor Binding Site) patterns
Tak-Ming Chan
2
Kwong-Sak Leung
2
Kin-Hong Lee
2
Man-Hon Wong
2
Terrence Chi-Kong Lau
1
Stephen Kwok-Wing Tsui
0
3
0
School of Biomedical Sciences, The Chinese University of Hong Kong
, Shatin, N. T
1
Department of Biology and Chemistry, The City University of Hong Kong
, Kowloon
2
Department of Computer Science & Engineering, The Chinese University of Hong Kong
, Shatin, N. T.
3
Hong Kong Bioinformatics Centre
, Shatin, N. T.,
Hong Kong
In protein-DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEILCAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.
-
ProteinDNA interactions play a central role in genetic
activities (1,2). In particular, transcription factor (TF,
the protein side) and transcription factor binding site
(TFBS, the DNA side) bindings are critical and primary
proteinDNA interactions to be deciphered for gene
regulation. So TF-TFBS bindings will be our focus throughout
the article. Despite the great variations shown among
different whole-length TF and TFBS sequences, part of them
are conserved as TF binding domains (tens to hundreds of
residues) and TFBS motifs (usually several to 20 residues),
respectively. Within the distance of forming hydrogen
bonds, short TF and TFBS subsequences show more
conserved patterns. These associated short binding
subsequences (within 10 residues; 68 in our experiments) of
both TFs and TFBSs surrounding the interacting bonds
are denoted as binding cores. However, predicting these
short binding cores on both the TF and TFBS sides from
sequences only is very challenging.
Amino acid residue variations in the TF binding cores
may lead to intriguing different corresponding TFBS
sub-patterns. For example, [A/P]KV[E/V]IL-CA[C/G][C/
G]TG may be found to be TF-TFBS binding cores.
Specifically, PKVEIL may bind to CAG[C/G]TG,
whereas PKVVIL to CACGTG. We denote such
associated TF-TFBS residue variations as subtypes,
which should be studied in the context of associated
TF-TFBS core bindings. PKVEIL-CAG[C/G]TG versus
PKVVIL-CACGTG (3rd column) and
PKVEILCAG[C/G]TG versus PKVVIL-CACGTG (4th column)
are two related (column-specific) subtype pairs. Subtypes
can reflect familial specificities, exhibit changed binding
preferences, distinguish conserved residues from flexible
ones and reveal novel binding mechanisms. Although
such high-resolution details are usually extracted from
3D structures with huge experimental efforts, abundant
low-resolution binding sequence data can be exploited to
predict both testable TF-TFBS binding cores and
subtypes.
In this article, we for the first time introduce and study
residue varying subtypes based on our sequence-based
associated TF-TFBS pattern discovery (3). The brief
review is first given in the following sub-sections.
TF-TFBS subtype discovery methods are detailed in
Materials and Methods section. Experimental results
and verifications are reported in Results and Analysis
section, before the final Discussion and Conclusion
section.
TF-TFBS bindings in gene regulation
Because of functional importance of TF-TFBS bindings
on regulation, core interaction subsequences from bound
TFs and TFBSs are less likely to mutate and exhibit
recognizable patterns (i.e. being conserved) from similar
TF-TFBS bindings. The short conserved subsequence
patterns on either TF or TFBS side are called motifs.
TFs have relatively long conserved regions called
domains of up to hundreds of amino acids (AA), but the
core interaction subsequences interacting with TFBSs are
shown to be highly specific (3,4). TFBS motifs are usually
s (...truncated)