In it for the long haul
© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology
EDITORIAL
In it for the long haul
S
ix weeks ago, the rights to one of biology’s premier public databases
were quietly sold to an informatics startup. The database in question,
the Biomolecular Interaction Network Database (BIND), is arguably the
most comprehensive freely accessible protein-protein interaction database available to the research community. Yet through a combination of
bureaucratic delays, Canadian government fiscal nitpicking and a lack
of community consensus, this important resource now finds itself on life
support, its survival precariously linked to that of Unleashed Informatics,
a private venture founded last April with little more than $1.0 million in
seed funding from Sun Microsystems.
BIND is a database of molecular associations that collates highthroughput data submissions and hand-curated information from the
scientific literature. Although the database has existed since 1998, it
really started making headlines in 2003 when the Blueprint Initiative,
a project of Toronto-based Mount Sinai Hospital’s Samuel Lunenfeld
Research Institute, obtained $17.3 million in federal and Ontario government funding and another $7.8 million from the private sector to
“assemble man’s biomolecular knowledge on one open source database
for all researchers to access free of charge.”
By 2005, Blueprint’s staff of curators, software developers and administrators had grown to 68. The number of interactions lodged in the
database had risen to >180,000. And 77 scientific journals had signed
on to publish BIND accession numbers in their papers. All seemed to be
progressing well until Blueprint’s principal investigator, Chris Hogue,
started looking around for new funding.
It soon became apparent that Genome Canada was unwilling to stump
up the additional $20.8 million Hogue estimated was needed to maintain
the database over the next four years. While Genome Canada president
Martin Godbout cited problems with BIND’s “management, budget justification and financial plan,” Hogue countered the real sticking point was
Genome Canada’s requirement that Blueprint secure “matched funding”
from another source. Unfortunately, the most likely provider of such
funding, the Ontario government, declined to back the project because
it was in the midst of restructuring how it doled out funds.
Forced to lay off half of his staff to keep the project alive, Hogue was
left scrambling to find alternative sources of funding. In June, an interim
solution appeared to have been found when Blueprint’s mirror node
in Singapore offered to take over database operations. By November,
however, this arrangement had also fallen through. Hogue announced
the termination of all BIND curation activities and the dismissal of all
remaining staff. Last month, Blueprint Asia closed its doors, leaving BIND
under the sole control of Unleashed Informatics, which has agreed to
maintain the existing data as an open access resource.
One might argue that BIND offers nothing more than a cautionary
tale about Canadian research funding. After all, researchers have several
other protein interaction databases available, including the European
Bioinformatics Institute’s (EBI) InAct (http://www.embl-ebi.ac.uk/
intact/index.jsp), the University of California’s Database of Interacting
Proteins (DIP; http://dip.doe-mbi.ucla.edu/) and the Munich Institute
for Bioinformatics’ MPact (http://mips.gsf.de/genre/proj/mpact/index.
html). But that would be wrong. BIND’s predicament is not just about
Canadian politics and Canadian research. Only ~7% of BIND’s users
were based in Canada; the majority originated from the United States and
Europe. And BIND was not just another protein interaction database. It
was a unique resource, not only because of its comprehensiveness (as of
January 23, ~200,000 interactions compared with ~60,000 and ~56,000
in IntAct and DIP, respectively), but also because of the quality of its data
and its hyperlinks to the scientific literature.
If the failure of BIND highlights anything, it is the endemic and longstanding problem of providing sustainable long-term financial support
for databases. Because when it comes to financial insecurity, no database project, big or small, is immune. According to a survey by Nature
(435, 1010–1011, 2005), of 89 databases listed in the Molecular Biology
Database collection in 2000, last year nearly two-thirds (51) were struggling financially and seven had already closed. Last May, the decision of
the US National Institute for General Medical Services (NIGMS) to halve
the $5 million originally requested by the Alliance for Cellular Signaling
(AfCS; http://www.signaling-gateway.org/) necessitated the closure of the
AfCS’s curation office at Duke University and required Nature Publishing
Group to assume full editorial control of AfCS’s Molecular Pages. Even
the EBI has had to shuffle its own financial reserves to keep databases like
InterPro afloat; and confirmation of renewed funding for major databases, such as ArrayExpress, often has to wait until the eleventh hour.
A first step to addressing these problems would be to bring together
representatives of the major funding agencies (e.g., the European Union
and Wellcome Trust in Europe, and the National Science Foundation
and NIGMS in the United States) to engage in a high-level discussion on
long-term funding goals for international databases. A report on longlived databases published in September by the US National Science Board
provides a summary of the most important issues (http://www.nsf.gov/
pubs/2005/nsb0540/start.jsp).
Second, a mechanism needs to be outlined that would allow funding
agencies to recognize databases that both reflect community consensus
standards (e.g., GEO and ArrayExpress are compliant with the Minimal
Information About a Microarray Experiment standard; http://www.
mged.org) and have matured into indispensable community resources.
These are the databases that should be prioritized for longer-term funding (subject of course to regular reappraisal and peer review).
There’s no doubt that BIND, with its links to the literature and its
high-quality molecular interaction data set, was a valuable resource. But
the protein interaction field is still developing its consensus standards
(e.g., see the Proteomics Standard Initiative; http://psidev.sourceforge.
net/) and is to some extent still exploring its boundaries. Perhaps it was
too early to unite community efforts into one molecular interaction
database. Or perhaps BIND was ahead of its time. Whatever the case,
funders must now formulate a strategy to ensure that other databases,
especially those bankrolled with millions of dollars of public money,
avoid a similar fate.
NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 2 FEBRUARY 2006
115
(...truncated)