Application of a sensitive collection heuristic for very large protein families: Evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases

BMC Bioinformatics, Mar 2006

Background Manually finding subtle yet statistically significant links to distantly related homologues becomes practically impossible for very populated protein families due to the sheer number of similarity searches to be invoked and analyzed. The unclear evolutionary relationship between classical mammalian lipases and the recently discovered human adipose triglyceride lipase (ATGL; a patatin family member) is an exemplary case for such a problem. Results We describe an unsupervised, sensitive sequence segment collection heuristic suitable for assembling very large protein families. It is based on fan-like expanding, iterative database searches. To prevent inclusion of unrelated hits, additional criteria are introduced: minimal alignment length and overlap with starting sequence segments, finding starting sequences in reciprocal searches, automated filtering for compositional bias and repetitive patterns. This heuristic was implemented as FAMILYSEARCHER in the ANNIE sequence analysis environment and applied to search for protein links between the classical lipase family and the patatin-like group. Conclusion The FAMILYSEARCHER is an efficient tool for tracing distant evolutionary relationships involving large protein families. Although classical lipases and ATGL have no obvious sequence similarity and differ with regard to fold and catalytic mechanism, homology links detected with FAMILYSEARCHER show that they are evolutionarily related. The conserved sequence parts can be narrowed down to an ancestral core module consisting of three β-strands, one α-helix and a turn containing the typical nucleophilic serine. Moreover, this ancestral module also appears in numerous enzymes with various substrate specificities, but that critically rely on nucleophilic attack mechanisms.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-7-164.pdf

Application of a sensitive collection heuristic for very large protein families: Evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases

BMC Bioinformatics Methodology article Application of a sensitive collection heuristic for very large protein families: Evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases Georg Schneider 1 Georg Neuberger 1 Michael Wildpaner 1 Sun Tian 1 Igor Berezovsky 0 Frank Eisenhaber 1 0 Department of Chemistry and Chemical Biology, Harvard University , 12 Oxford str., M-105, 02138 Cambridge, MA , USA 1 IMP - Research Institute of Molecular Pathology , Dr. Bohr-Gasse 7, A-1030 Vienna , Republic of Austria Background: Manually finding subtle yet statistically significant links to distantly related homologues becomes practically impossible for very populated protein families due to the sheer number of similarity searches to be invoked and analyzed. The unclear evolutionary relationship between classical mammalian lipases and the recently discovered human adipose triglyceride lipase (ATGL; a patatin family member) is an exemplary case for such a problem. Results: We describe an unsupervised, sensitive sequence segment collection heuristic suitable for assembling very large protein families. It is based on fan-like expanding, iterative database searches. To prevent inclusion of unrelated hits, additional criteria are introduced: minimal alignment length and overlap with starting sequence segments, finding starting sequences in reciprocal searches, automated filtering for compositional bias and repetitive patterns. This heuristic was implemented as FAMILYSEARCHER in the ANNIE sequence analysis environment and applied to search for protein links between the classical lipase family and the patatin-like group. Conclusion: The FAMILYSEARCHER is an efficient tool for tracing distant evolutionary relationships involving large protein families. Although classical lipases and ATGL have no obvious sequence similarity and differ with regard to fold and catalytic mechanism, homology links detected with FAMILYSEARCHER show that they are evolutionarily related. The conserved sequence parts can be narrowed down to an ancestral core module consisting of three -strands, one -helix and a turn containing the typical nucleophilic serine. Moreover, this ancestral module also appears in numerous enzymes with various substrate specificities, but that critically rely on nucleophilic attack mechanisms. - Background The failure to develop a rational, generally applicable cure for obesity-related diseases can be attributed to the highly complex regulation of energy metabolism, which is not yet fully understood. On the other hand considering the historic successes in deciphering the underlying biochemical pathways, it is assumed that the chemical transformation steps of basic metabolites are known in their entirety. This view is seriously questioned in light of the recent discovery of ATGL, a protein that catalyzes the initial step of hydrolysis of triacylglycerides at the surface of lipid droplets in adipocytes [1]. It is surprising that the fundamental activity of this key enzyme escaped from attention so far [2,3]. Just considering the many dozens of additional hypothetical human protein sequences with low but statistically significant sequence-similarity to known metabolic enzymes that can be collected with PSI-BLAST searches [4], more such findings are still expected to be ahead. One of the key steps in energy metabolism is the separation of fatty acids from glycerol moieties. A diverse set of lipases performs this task in various contexts by hydrolyzing the connecting ester-bonds [5]. One of the best characterized lipases, pancreatic lipase, acts at the stage of food digestion [6]. Other lipases, such as hormone sensitive lipase or lipoprotein lipase, are involved in lipid accumulation and release in tissue [7,8]. Most lipases share a common type of 3D structure known as /-hydrolase fold, which is present in enzymes with quite diverse substrate specificities [9,10]. The catalytic mechanism of most lipases is reminescent of serine proteases as it proceeds via the nucleophilic attack of a serinehistidine-aspartate triad [10]. The recently discovered, novel key enzyme involved in fatty acid release from adipocytes, adipose triglyceride lipase (ATGL) [1], does not share any direct sequence similarity with known mammalian lipases. In fact, it appears to belong to a protein family that is centered around patatin, a potato storage protein with lipid acyl hydrolase activity [11,12]. The catalytic mechanism of these enzymes is inherently different from classic lipases as it proceeds via a serine-aspartate dyad [13,14] as opposed to the well described serine-histidine-aspartate triad. In this work, we present sequence-analytic evidence that the ATGL/patatin family and the classic mammalian lipases represented by the human pancreatic lipase evolved from a common ancestor. Moreover, we display a set of structural and sequence key features that are conserved between these two enzyme groups including also related protein families. The analysis of homology relationships within large superfamilies of protein sequences are a reoccurring theme in biomolecular sequence analysis. Finding the pancreatic lipase/ATGL relationship is just one application for the respective methodologies. It should be noted that detecting subtle yet statistically significant and structurally plausible relationships in families involving thousands of members is not a straightforward task since the manual analysis of myriads of reports generated by standard BLAST/PSI-BLAST [4] installations for sequence comparisons in databases is impossible in practice. Progress in this area was hampered by insufficiently developed tools. Here, we developed a computer implementation of a family searching heuristic involving: (i) Automated invocation of fan-like iterative PSI-BLAST [4] searches with starting sequences. (ii) Filtering of starting sequences with various sequence-analytic methods for detecting compositional and repetitive pattern bias. (iii) Automatic re-detection of starting sequence segments in reciprocal searches. (iv) Criteria for alignment length and overlap with the starting sequence segments. (v) Automated parsing of outputs and (vi) database-supported analysis of similarity networks. The user-parameterized measures (ii-iv) are designed to suppress the detection of unrelated hits for the case of a starting sequence that are thought to represent a single globular domain, a functionally and structurally independent elementary module. This FAMILYSEARCHER is part of the sequence-analytic workbench ANNIE [15] that is being developed in our laboratory. To our knowledge, this article describes the first software package for sequence family collection with fully automated checks for bidirectional search criteria, transitive hit overlap criteria and generic procedures for masking repetitive regions that is applicable for extremely large sequence families. Results FAMILYSEARCHER: Methodical (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2105-7-164.pdf
Article home page: http://www.biomedcentral.com/1471-2105/7/164

Georg Schneider, Georg Neuberger, Michael Wildpaner, Sun Tian, Igor Berezovsky, Frank Eisenhaber. Application of a sensitive collection heuristic for very large protein families: Evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases, BMC Bioinformatics, 2006, pp. 164, 7, DOI: 10.1186/1471-2105-7-164