In silico analysis of methyltransferase domains involved in biosynthesis of secondary metabolites
BMC Bioinformatics
In silico analysis of methyltransferase domains involved in biosynthesis of secondary metabolites
Mohd Zeeshan Ansari 0
Jyoti Sharma 0
Rajesh S Gokhale 0
Debasisa Mohanty 0
0 Address: National Institute of Immunology, Aruna Asaf Ali Marg , New Delhi-110067 , India
Background: Secondary metabolites biosynthesized by polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) family of enzymes constitute several classes of therapeutically important natural products like erythromycin, rapamycin, cyclosporine etc. In view of their relevance for natural product based drug discovery, identification of novel secondary metabolite natural products by genome mining has been an area of active research. A number of different tailoring enzymes catalyze a variety of chemical modifications to the polyketide or nonribosomal peptide backbone of these secondary metabolites to enhance their structural diversity. Therefore, development of powerful bioinformatics methods for identification of these tailoring enzymes and assignment of their substrate specificity is crucial for deciphering novel secondary metabolites by genome mining. Results: In this work, we have carried out a comprehensive bioinformatics analysis of methyltransferase (MT) domains present in multi functional type I PKS and NRPS proteins encoded by PKS/NRPS gene clusters having known secondary metabolite products. Based on the results of this analysis, we have developed a novel knowledge based computational approach for detecting MT domains present in PKS and NRPS megasynthases, delineating their correct boundaries and classifying them as N-MT, C-MT and O-MT using profile HMMs. Analysis of proteins in nr database of NCBI using these class specific profiles has revealed several interesting examples, namely, C-MT domains in NRPS modules, N-MT domains with significant homology to C-MT proteins, and presence of NRPS/PKS MTs in association with other catalytic domains. Our analysis of the chemical structures of the secondary metabolites and their site of methylation suggested that a possible evolutionary basis for the presence of a novel class of N-MT domains with significant homology to C-MT proteins could be the close resemblance of the chemical structures of the acceptor substrates, as in the case of pyochelin and yersiniabactin. These two classes of MTs recognize similar acceptor substrates, but transfer methyl groups to N and C positions on these substrates. Conclusion: We have developed a novel knowledge based computational approach for identifying MT domains present in type I PKS and NRPS multifunctional enzymes and predicting their site of methylation. Analysis of nr database using this approach has revealed presence of several novel MT domains. Our analysis has also given interesting insight into the evolutionary basis of the novel substrate specificities of these MT proteins.
-
Background
Nonribosomal peptide synthetases (NRPSs), polyketide
synthases (PKSs) and fatty acid synthases (FASs) employ a
common biosynthetic strategy to synthesize their
metabolic products by stepwise condensation of simple amino
or carboxylic acid monomers. The core catalytic domains
involved in the biosynthesis of the
polyketide/nonribosomal peptide/fatty acid backbone moieties are
ketosynthase (KS), acyltransferase (AT), dehydratase (DH),
enoylreductase (ER), ketoreductase (KR), acyl carrier
protein (ACP), condensation (C), adenylation (A) and
thiolation (T) [1,2]. Apart from these core catalytic domains, a
number of auxiliary functional domains, often called
tailoring domains, introduce a variety of different chemical
modifications to the backbone moieties of these
secondary metabolites to further increase their structural
diversity. Bioinformatics analysis of various catalytic domains
present in NRPS and PKS proteins has been an area of
active research in recent years [3-8]. These studies [3-8]
have not only led to development of novel computational
methods for in silico identification of secondary
metabolites by genome mining [9-16], they have also guided
rational reprogramming of secondary metabolite
biosynthetic pathways to generate designed "natural products"
[12,17-20]. However, all these studies including our
earlier work have concentrated on core catalytic domains and
no detailed bioinformatics analyses have been carried out
for important tailoring enzymes like, methyltransferases.
Methyltransferase (MT) domains present in NRPS and
PKS clusters constitute a major class of tailoring domains/
enzymes involved in biosynthesis of secondary
metabolites. They catalyze the transfer of methyl group from
Sadenosylmethionine (SAM or AdoMet) to the carbon,
nitrogen or oxygen atoms at various positions on the
backbones of polyketides, nonribosomal peptides and
fatty acids and therefore have been classified as C-MT,
NMT and O-MT respectively depending upon their site of
methylation. These enzymatic domains in general have a
bidomain structure, where the first subdomain contains
the binding site for methyl group donor, while the second
subdomain harbors the binding site for acceptor substrate
[21,22]. The presence of MT domains in multifunctional
NRPS and PKS proteins is generally inferred from
chemical structure of the secondary metabolite products. There
are only few in vitro studies on enzymatic characterization
of NRPS/PKS MT domains [23-27]. A recent study on MT
domains from type II PKS biosynthetic pathways has
revealed interesting correlation between regioselectivity of
methylation and MT sequence [24]. However, no such
analysis has been carried out for MT domains present in
type I PKS or NRPS proteins. In contrast to type II PKS MTs
which are stand alone proteins, MT domains in type I PKS
and NRPS are present along with other catalytic domains
on a single polypeptide chain. Therefore, it has been
difficult to decipher the correct length and domain boundaries
for MT domains in type I PKS or NRPS proteins. Various
studies have suggested that the size of N-MT domain is
typically 450 amino acids, while C-MT and O-MT are
generally 300 amino acids long. A set of 3 conserved sequence
motifs has been identified in most MTs [28-30].
Mutational studies of N-MTs of peptide synthetases have
shown that these 3 motifs are essential for the catalysis
[31]. The knowledge of these MT sequence motifs and the
expected spacing between them is often used for
discerning presence of MT domains in multifunctional NRPS and
PKS proteins. However, because of the high degree of
sequence divergence, delineating the correct boundary of
these proteins is quite often a difficult task. In our earlier
study, we attempted to identify MT domains in various
NRPS/PKS gene clusters based on pairwise alignment with
MT domain from actinomycin cluster [32]. However, this
domain identification protocol failed to detect 23 out of
32 MT domains. The 23 unidentified MT domains
included the three groups of MTs (C-, O- and N-MTs), for
which proper te (...truncated)