Building biomedical web communities using a semantically aware content management system
Sudeshna Das
Lisa Girard
Tom Green
Louis Weitzman
Alister Lewis-Bowen
Tim Clark
Web-based biomedical communities are becoming an increasingly popular vehicle for sharing information amongst researchers and are fast gaining an online presence. However, information organization and exchange in such communities is usually unstructured, rendering interoperability between communities difficult. Furthermore, specialized software to create such communities at low costtargeted at the specific common information requirements of biomedical researchershas been largely lacking. At the same time, a growing number of biological knowledge bases and biomedical resources are being structured for the Semantic Web. Several groups are creating reference ontologies for the biomedical domain, actively publishing controlled vocabularies and making data available in Resource Description Framework (RDF) language.We have developed the Science Collaboration Framework (SCF) as a reusable platform for advanced structured online collaboration in biomedical research that leverages these ontologies and RDF resources. SCF supports structured 'Web 2.0' style community discourse amongst researchers, makes heterogeneous data resources available to the collaborating scientist, captures the semantics of the relationship among the resources and structures discourse around the resources. The first instance of the SCF framework is being used to create an open-access online community for stem cell researchStemBook (http://www.stembook.org). We believe that such a framework is required to achieve optimal productivity and leveraging of resources in
-
INTRODUCTION
Online scientific communitiesgroups of scientists
or collaboratories connected through the Internet
have become an important means by which
to exchange data and information. The most
common form of an online community is an
intraorganization web site. In this format, a department or
a lab, for example, shares data and knowledge in a
web-based forum.
Barriers to developing a successful community
beyond organization or consortia boundaries have
been discussed in Bos et al. [1]. The obstacles
discussed by these authors include issues, such as
scientists preference for working independently and
intellectual property competition between
institutions. Despite these barriers, the practice of scientists
discussing nascent work on the web is an emerging
trend, sometimes labeled Science 2.0 [2]. Successful
scientific communities in which interdisciplinary
researchers network and engage in scientific
discussions for a common driving cause have been
developed and fill a critical resource gap.
One notable example of such a web-based
scientific community is Alzforum (www.alzforum.
org)a thriving community of over 4600
researchers networking to find a cure for Alzheimers [3, 4].
In Alzforum, researchers can discuss papers and news
spontaneously and participate in live discussions.
Researchers are also invited to provide perspectives
on key research news and comment on papers of the
week. Compendia of genes, antibodies, animal
models and protocols are also available on the site.
Currently, the site contains more than 60 000
literature citations, 1900 research news articles,
6000 comments, 20 000 antibodies, 250 research
models, 500 genes from published association
studies of late-onset AD, all known mutations
causing familial Alzheimer disease, all drugs in
Phase 2 and 3 clinical trials and a wealth of
community resources, such as databases for grants,
conferences and jobs [4]. Another emerging
community based on the Alzforum model is the
Schizophrenia Forum (www.schizophreniaforum.
org)a community of researchers exchanging ideas
to develop better understanding of schizophrenia and
improve treatment options.
Communities such as Alzforum and
Schizophrenia Forum require both social and
technological infrastructure for nurturing their
growth [3, 4]. However, both these sites were
evolved over time for their specific communities and
until recently there has been no common reusable
toolkit to create a new site similar in structure. Data
and information in these sites and other similar ones
are organized and structured in different ways and
there was heretofore only limited opportunity to
share and exchange information amongst these sites.
Moreover, the use of the Semantic Web [5, 6] to
exchange information among these scientific
communities in a machine-readable format remains a
challenge [7].
At the same time, a large number of biological
resources are now becoming available as W3C
Resource Description Framework (RDF) triples
(http://www.w3.org/TR/rdf-primer/). Gene
Ontology (GO), CHEBI and SNOMED [810]
are examples of the most widely used ontologies in
the biomedical domain. The ambitious BioMoby
project that publishes more than 1400 data sources
and analysis tools using a semantic framework has
also released its first version [11]. The W3C Health
Care and Life Sciences Interest Group [12] and other
efforts such as, Open Biomedical Ontologies [13] are
actively defining common controlled vocabularies
and making data available as RDF. One of the goals
of Science Collaboration Framework (SCF) is to
annotate the discourse, publications and news
published within scientific communities with terms
and identifiers from these and other semantically
characterized biological information resources, and
to make the knowledge and linked data available on
the Semantic Web.
There are other efforts to develop collaborative
annotation and knowledge management systems
using Semantic wikis [14, 15]. Wikis are being
increasingly adopted by the biomedical community
for collective annotation. Gene Wiki for collective
annotation of gene function [16] is a recently
published wiki example. Some of these resources
are also available as RDFWikiProteins [17] and
BOWiki (http://bowiki.net/). However, wiki is a
technology useful for focused annotation efforts and
does not easily support community-networking tools
such as blogs and forums. Wikis readily enable
multiple editing of content and checking the
differences between versions. Wiki is a useful
technology and is the primary choice when the purpose is to
generate a consensus view that is flexible enough
to accommodate input from various people.
WikiProteins is a great example of that purpose. The
entry for human amyloid- A4 protein precursor
(APP), http://www.wikiproteins.org/index.php/
Concept:13341741 lists the various functional roles
of APP and the types of Alzheimers disease caused
by APP defects. However, the provenance of the
claims is lost, and the multi-viewpoints,
disagreement or divergence regarding the role of APP in
Alzheimers are not captured in the entry as it is in
Alzforum (http://www.alzforum.org/res/for/jour
nal/transcript.asp?LiveID120). The generalist view
of APP presented in WikiProteins is not useful to a
scientist specializing in Alzheimers research. In
summary, wiki is most useful for leveraging the
Long Tail and (...truncated)