easyDAS: Automatic creation of DAS servers
Gel Moreno et al. BMC Bioinformatics 2011, 12:23
http://www.biomedcentral.com/1471-2105/12/23
SOFTWARE
Open Access
easyDAS: Automatic creation of DAS servers
Bernat Gel Moreno1,2*, Andrew M Jenkinson2, Rafael C Jimenez2, Xavier Messeguer Peypoch1,
Henning Hermjakob2
Abstract
Background: The Distributed Annotation System (DAS) has proven to be a successful way to publish and share
biological data. Although there are more than 750 active registered servers from around 50 organizations, setting
up a DAS server comprises a fair amount of work, making it difficult for many research groups to share their
biological annotations. Given the clear advantage that the generalized sharing of relevant biological data is for the
research community it would be desirable to facilitate the sharing process.
Results: Here we present easyDAS, a web-based system enabling anyone to publish biological annotations with
just some clicks. The system, available at http://www.ebi.ac.uk/panda-srv/easydas is capable of reading different
standard data file formats, process the data and create a new publicly available DAS source in a completely
automated way. The created sources are hosted on the EBI systems and can take advantage of its high storage
capacity and network connection, freeing the data provider from any network management work. easyDAS is an
open source project under the GNU LGPL license.
Conclusions: easyDAS is an automated DAS source creation system which can help many researchers in sharing
their biological data, potentially increasing the amount of relevant biological data available to the scientific
community.
Background
In recent years the amount of biological data generated
have been increasing greatly, and with the advent of
new technologies like next generation sequencing this
trend is likely to increase. Additionally, new analysis and
re-analysis techniques help to produce better and more
accurate derived results every day. Making all this data
and results publicly available can be of great benefit for
the scientific community as a whole, since valid biological data can be used both by field researchers and by
those developing new methodologies and algorithms.
Sharing raw research data allows others to conduct reanalysis and meta-analysis, usually reinforcing the previous results or even producing novel ones. ArrayExpress [1] and GEO [2] have had a very positive effect on
microarray research development and have been heavily
used in the development of both new data analysis techniques and biological knowledge. Sharing research
results eases rapid spreading of new findings and its
* Correspondence:
1
Software Department, UPC-BarcelonaTech, Barcelona, Spain
Full list of author information is available at the end of the article
incorporation in ongoing research increasing its overall
usefulness.
The effects of this sharing can be greatly increased if
data and results are made publicly available using some
kind of machine readable standard format allowing
them to be seamlessly used by other researchers.
While making the raw data available as supplementary
material attached to the publication of a paper can be
useful and other researchers can certainly use it, its integration with data from other sources will still be difficult
and will not help fully automatic approaches such as
workflows. However, if this same data is made available
in a standard machine readable format it can be easily
integrated with data coming from other standard
sources and automatically displayed and analyzed.
The Distributed Annotation System (DAS) is a complete system for sharing annotations on biological
sequences. It comprises a standard XML based file format, an accurate definition of the semantics of the data
-based on the use of ontologies of biological terms-, and
an HTTP based REST style protocol for sharing those
annotations [3-5]. Figure 1 is an overview of the DAS system. Many Annotation Servers can provide annotations
© 2011 Gel et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Gel Moreno et al. BMC Bioinformatics 2011, 12:23
http://www.biomedcentral.com/1471-2105/12/23
Page 2 of 8
Figure 1 Overview of DAS. Data is stored in either databases or data files (A). DAS servers (B) offer a common interface to access that data, the
DAS protocol. Users (C), can use different client types (D) including specific DAS clients, internet browsers and non-visual scripts, to access the
data. Optionally, clients can access the DAS Registry (E) to retrieve a list of available DAS sources.
of sequence objects provided by a Reference Server, for
example Ensembl or UniProt. While initially designed to
annotate genomic sequences, DAS has support for both
genomic and protein annotations, for sequence alignment
data, and for structural information. Other federated systems exist for other types of biological data, such as
PSICQUIC [6] for molecular interaction data.
DAS client-server architecture was designed around
the idea of having a small number of complex clients
integrating data coming from, potentially, many of different simple sources. Some examples of DAS clients
are Ensembl [7], Dasty2 [8], GBrowse [9], Jalview [10],
SPICE [11], PeppeR [12], DASher [13]. Sharing biological data on DAS allows data providers to leverage the
DAS ecosystem and make it easy to integrate their data
with other existing sources.
DAS server software is available in different programming languages, such as ProServer [14] in Perl and Dazzle [15] and MyDAS [16] in Java. However, despite the
idea of DAS servers being simple, setting up a DAS server is not a trivial task. DAS servers allow for a great
flexibility on where the actual data is stored and how it
is structured. Usually the backend is database, but files
and other options are also viable. The downside of this
flexibility is that very often data providers will need to
implement a custom made data access layer mapping
their real data layout to the DAS concepts used in the
server and this will have to be done either in Perl or
Java. There are many research groups who will not have
easy access to people proficient enough in programming
to implement that access layer. In addition, setting up
and managing an internet accessible machine to host
Gel Moreno et al. BMC Bioinformatics 2011, 12:23
http://www.biomedcentral.com/1471-2105/12/23
the server can be also difficult or a too big overhead for
many data generators, mainly for those with small data
sets.
Thus, the challenge: converting all those data generators into data providers, increasing the amount and variety of the biological data available to the scientific
community and contributing to the collective annotation
of biological sequences.
Results and Discus (...truncated)