HOMECAT: consensus homologs mapping for interspecific knowledge transfer and functional genomic data integration (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/29/12/1574.full.pdf

HOMECAT: consensus homologs mapping for interspecific knowledge transfer and functional genomic data integration

Simone Zorzan 1 2 Erika Lorenzetto 2 Michele Ettorre 1 2 Valeria Pontelli 2 Carlo Laudanna 0 1 Mario Buffelli 1 2 3 Associate Editor: Martin Bishop 0 Department of Pathology, University of Verona , Strada le Grazie, 8 37134 Verona , Italy 1 Centre for Biomedical Computing, University of Verona , Strada le Grazie 8, 37134 Verona- Italy 2 Department of Neurological, Neuropsychological, Morphological and Motor Sciences, Section of Physiology, University of Verona , Strada le Grazie 8, 37134 Verona- Italy 3 National Institute of Neuroscience-Italy , Verona , Italy Motivation: Comparative studies are encouraged by the fast increase of data availability from the latest high-throughput techniques, in particular from functional genomic studies. Yet, the size of datasets, the challenge of complete orthologs findings and not last, the variety of identification formats, make information integration challenging. With HOMECAT, we aim to facilitate cross-species relationship identification and data mapping, by combining orthology predictions from several publicly available sources, a convenient interface for highthroughput data download and automatic identifier conversion into a Cytoscape plug-in, that provides both an integration with a large set of bioinformatics tools, as well as a user-friendly interface. Availability: HOMECAT and the Supplementary Materials are freely available at http://www.cbmc.it/homecat/. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. - INTRODUCTION The interpretation of large data sets, in particular when obtained from experiments using model systems, greatly benefits from the transfer of knowledge pertaining to phylogenetically related species. The studies on widely used model organisms, such as mouse, fly or rat, yielded a large wealth of information, essential for the understanding of life complexity (Aitman et al., 2011; Loman et al., 2012; Schuster, 2008). The integration of functional genomic and proteomic expression profiles in the modeling of regulatory networks is a valuable approach in biology (Romero et al., 2012). HOMECAT (homology mapper for enrichment and comparative analysis with translation) is a plug-in for Cytoscape (Shannon et al., 2003) that allows cross-species data comparison and integration of high-throughput data with automatic identifier conversion. Orthology relationships can be difficult to identify, and several approaches exhibit different sensitivity and specificity (Altenhoff and Dessimoz, 2009; Chen et al., 2007; *To whom correspondence should be addressed. Hulsen et al., 2006). HOMECAT, at present, can combine data from four homology data sources, to attain better specificity and increased sensitivity. BridgeDB (Van Iersel et al., 2010) usage allows to support 30 species and nearly 100 identifiers formats (71 from microarrays platforms). HOMECAT also interfaces Array Express ATLAS (Kapushesky et al., 2012) to download and integrate high-throughput curated data. COMPARATIVE ANALYSES AND INTEGRATION WITH HIGH-TROUGHPUT DATA In HOMECAT, input species identifiers can be chosen between nodes attributes and are then converted to query the available homology data sources for the identification of orthologs in one or more destination species. Finally, the identifiers of the orthologs are converted to the selected output format. By default Homologene (Wheeler et al., 2007), OMA (Roth et al., 2008), Compara (Vilella et al., 2009) and OrthoMCL (Li et al., 2003) are supported; the former two are more specific and the other two are more sensitive (Altenhoff and Dessimoz, 2009). OMA and Compara servers are queried directly, whereas Homologene (rel. 67) and OrthoMCL (ver. 5) data are accessed through our server in CBMC. When possible, a direct access to the data was used, to always guarantee the most updated results. All necessary conversions are performed by default through a remote BridgeDB server. BridgeDB database can be installed on any machine supporting Java (see BridgeDB website), and can be used by HOMECAT to reduce conversion times, particularly when hundreds of identifiers are processed. After the search phase, the user can decide whether to use only the orthologs confirmed by all the sources or those indicated by any of them, hence, increasing the reliability or the coverage of the results. Orthologs can be used to enrich the input network, to assign their identifiers to specific attributes or to create a novel network of metanodes, preserving original network connectivity. Each metanode will contain an input species node, along with its orthologs, and can be expanded or collapsed. In the networks of metanodes, the color of the border of each node always represents the input species data, whereas the internal color of the node is always related to the ortholog data. A pie chart summarizes the orthologs data representation on collapsed metanodes (see Fig. 1A for details). When more than one network are created, node selections can be extended between networks, facilitating comparisons. For each ortholog, an attribute summarizes the sources that support its identification. This attribute is useful to adjust the sensitivity attained by combining the results from different orthology sources. The user can filter less strongly supported orthologs and eventually remove them from the network. External data can be loaded to the resulting network, either from a local file or from ATLAS, by selecting the experiment code and an experimental factor. When data from different orthologs are present, the average is added as an attribute to each metanode. A general scheme of this workflow is depicted in Figure 1B. As an example, HOMECAT was used to compare microarray data after optic nerve crush in three publicly available datasets from zebrafish, mouse and rat samples. The optic nerve crush is a common model to study the regeneration in the central nervous system. After crush, the optic nerve regenerates in zebrafish, whereas the axonal recovery is absent in mammals. The results highlighted differently regulated genes in fish and mammals, consistent with literature, and showed how the combination of functional genomic and regulatory data analysis can contribute to the identification of putative key factors in comparative biological studies (see the manual in the Supplementary Materials). 3 FEATURES AND EXTENSIBILITY Orthologs resulting from HOMECAT multiple queries can be saved and loaded in .hcd files. Data mapped to attributes can be saved along with Cytoscape networks. A programming interface allows the development of Java classes called components, to combine additional homology data sources that are automatically loaded (Supplementary Materials). ACKNOWLEDGEMENTS The authors thank Giovanni Scardoni for the suggestions in the plug-in development; Claudio Pascale and Alessio Azzoni for the early development of the plug-in prototypes; Adrian Altenhoff for the support in the develop (...truncated)