Reefgenomics.Org - a repository for marine genomics data
Database, 2016, 1–4
doi: 10.1093/database/baw152
Original article
Original article
Reefgenomics.Org - a repository for marine
genomics data
Yi Jin Liew, Manuel Aranda*, and Christian R. Voolstra*
Division of Biological and Environmental Science and Engineering (BESE), Red Sea Research Center,
King Abdullah University of Science and Technology (KAUST), Saudi Arabia
*Corresponding author: Email: , Phone: +966 12 8082377, Fax: +966 12 8082377
Correspondence may also be addressed to Manuel Aranda: Email: , Phone: +966 12
8082377, Fax: +966 12 8082979
Citation details: Liew,Y.J., Aranda,M., Voolstra,C.R. Reefgenomics.Org - a repository for marine genomics data. Database
(2016) Vol. 2016: article ID baw152; doi:10.1093/database/baw152
Received 28 July 2016; Revised 15 October 2016; Accepted 31 October 2016
Abstract
Over the last decade, technological advancements have substantially decreased the cost
and time of obtaining large amounts of sequencing data. Paired with the exponentially
increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are
stored in public sequence databases, very often, only raw sequencing data are available;
miscellaneous data such as assembled transcriptomes, genome annotations etc. are not
easily obtainable through the same means. Here, we introduce our website (http://reefge
nomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where
applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets
can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it
convenient for end-users to search and explore processed sequence data.
Database URL: http://reefgenomics.org
Introduction
Driven primarily by continuous reduction in sequencing
costs and increasing availability of computing resources
over the last decade, the genomes of several marine organisms e.g. Amphimedon queenslandica (1), Acropora digitifera (2), Aiptasia pallida (3), and Hydra vulgaris (4), and
C The Author(s) 2016. Published by Oxford University Press.
V
the transcriptomes of many others (5–8) have now been
sequenced. However, a disconnect exists between what is
submitted in the form of primary sequence data and what
is available in the form of assembled and annotated data.
While the majority of studies provide primary sequence
data to public repositories, e.g. NCBI (National Center for
Biotechnology Information), EBI (European Bioinformatics
Page 1 of 4
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
(page number not for citation purposes)
Page 2 of 4
Database, Vol. 2016, Article ID baw152
Results
Main landing page
The main page (http://reefgenomics.org) has a text box similar in design to a popular search engine - that allows
new and experienced users alike to quickly jump to their
organism(s) of interest. For new users that are unfamiliar
with the datasets hosted on the website, we ease data discovery by featuring a collection of datasets, and we also
provide a link to a complete list of all datasets on our website. Also on the main page is a prominent ‘Contribute’
button, which provides a point of contact for groups to
contribute their data to our data portal (Figure 1A).
Project-specific data sharing
From the main page, users are able to visit subdomains
containing data produced by a project. At the time of writing, the hosted data range from individual genome projects, e.g. the genome of Aiptasia (3), to multi-institute
collaborative efforts, e.g. a comparative study of 20 coral
transcriptomes and genomes (9). Whenever possible, we
opted to use short, memorable subdomain names instead
of nested subdirectories. For instance, the Aiptasia genome
project is located at http://aiptasia.reefgenomics.org; while
the comparative study is at http://comparative.reefgenom
ics.org. Repeat users can quickly navigate to their
Figure 1. Layout of http://reefgenomics.org. (A) Landing page shows
the search bar and tiles corresponding to featured projects and a tile to
explore all datasets. (B) Project page shows the typical layout of a project-specific page. Hovering the cursor over tiles darkens them, as
shown in the ‘Browse’ tile.
organisms/datasets of interest by typing the memorable
URLs in their browsers.
Subdomains typically contain three buttons: ‘BLAST’,
‘Download’, and ‘Browse’ (Figure 1B). Some subdomains
have fewer buttons depending on the hosted contents: for
instance, transcriptome data cannot be viewed on a genome browser. The first button links users to a custom
BLAST server based on SequenceServer (11), which produces aesthetically pleasing BLAST results that takes advantage of modern web standards. This BLAST server also
has the advantage being able to search multiple databases
simultaneously with a single query, which simplifies tasks,
e.g. retrieving homolog for a particular gene of interest
across many organisms. The ‘Download’ page is a typical
Institute), and DDBJ (DNA Data Bank of Japan), many
studies elect not to upload assembled and annotated genomes or transcriptomes (all mRNAs expressed from the
genes of an organism) to public sequence databases. Also,
transcriptomic data tend to be more disparate, as illustrated
by a recent 20-coral comparative metastudy (9) that used
published primary data to infer the evolutionary success of
reef-building corals. Although the sources of all data are
cited, a web database to peruse, search via BLAST (Basic
Local Alignment and Search Tool) (10), and download relevant sequence data were also provided for the convenience
of the readers (http://comparative.reefgenomics.org).
To facilitate dissemination of similar data, we designed
and host a website with simplicity in mind. We aim to provide an online platform for sharing assembled sequence
data, and at the same time, facilitate access and retrieval of
sequence files, simplify searches for related sequences
among hosted data, and to enable the visual exploration of
genomic features. We intend that the ease of access will facilitate further analyses and pave the way for other comparative studies using these and additional data, fostering
collaborations and discovery within the marine biology
community.
Database, Vol. 2016, Article ID baw152
HTML page that houses links to retrieve datasets, with a
MD5 hash for users to veri (...truncated)