Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web
Kell DB (2008) Defrosting the Digital Library:
Bibliographic Tools for the Next Generation Web. PLoS Comput Biol 4(10):
e1000204. doi:10.1371/journal.pcbi.1000204
Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web
Duncan Hull 0
Steve R. Pettifer 0
Douglas B. Kell 0
Johanna McEntyre, National Center for Biotechnology Information (NCBI),
United States of America
0 1 School of Chemistry, The University of Manchester , Manchester , United Kingdom , 2 The Manchester Interdisciplinary Biocentre , The University of Manchester , Manchester , United Kingdom , 3 School of Computer Science , The University of Manchester , Manchester , United Kingdom
Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as ''thought in cold storage,'' and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.
-
The term digital library [24] denotes a collection of literature
and its attendant metadata (data about data) stored electronically.
According to Herbert Samuel, a library is thought in cold
storage [5], and unfortunately digital libraries can be cold,
isolated, impersonal places that are inaccessible to both machines
and people. Many scientists now organize their knowledge of the
literature using some kind of computerized reference management
system (BibTeX, EndNote, Reference Manager, RefWorks, etc.),
and store their own digital libraries of full publications as PDF files.
However, getting hold of both the data (the actual publication) and
the metadata for any given publication can be problematic
because they are often frozen in the isolated and icy deposits of
scientific publishing. Because each library and publisher has
different ways of identifying and describing their metadata, using
digital libraries (either manually or automatically) is much more
complicated than it needs to be [6], and with papers in the life
sciences alone (at Medline) being published at the rate of
approximately two per minute [7], only computerized analyses
can hope to be reasonably comprehensive. What then, are these
digital libraries, and what services do they provide?
As far as computational Biologists are concerned, and for the
purposes of this Review, we shall define a digital library more
broadly as a database of scientific and technical articles,
conference publications, and books that can be searched and
browsed using a Web browser. As of early 2008, there is a wide
range of these digital libraries, but no single source covering all
information (in part because of the cost, given that there are some
25,000 peer-reviewed journals publishing some 2.5 million articles
per year [8]). Each library is isolated, balkanized, and has only
partial coverage of the entire literature. This contrasts with the
historically pre-eminent library of Alexandria whose great strength
was that it brought together all the useful literature then available
to a single location. Like Alexandria, most digital libraries are
currently read-only, allowing users to search and browse
information, but not to write new information nor add personal knowledge.
Other digital libraries are in danger of becoming write-only
datatombs [9], where data are deposited but will probably never be
accessed again. Indeed, the literature itself is now so vast that most
scientists choose to access only a fraction of it [10], at potentially
considerable intellectual loss [11] (see also [12]).
Digital libraries provide electronic access to documents,
sometimes just to their abstracts and sometimes to the full text
of the publication. Presently, the number of abstracts considerably
exceeds the number of full-text papers, but with the emergence of
Open Access initiatives (e.g., [1316]), Institutional Repositories
(e.g., [1720]), and the like, this is set to change considerably. This
is very important, as much additional information exists in full
papers that i (...truncated)