Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web

PLoS Computational Biology, Oct 2008

Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pcbi.1000204&representation=PDF

Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web

Kell DB (2008) Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Comput Biol 4(10): e1000204. doi:10.1371/journal.pcbi.1000204 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web Duncan Hull 0 Steve R. Pettifer 0 Douglas B. Kell 0 Johanna McEntyre, National Center for Biotechnology Information (NCBI), United States of America 0 1 School of Chemistry, The University of Manchester , Manchester , United Kingdom , 2 The Manchester Interdisciplinary Biocentre , The University of Manchester , Manchester , United Kingdom , 3 School of Computer Science , The University of Manchester , Manchester , United Kingdom Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as ''thought in cold storage,'' and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines. - The term digital library [24] denotes a collection of literature and its attendant metadata (data about data) stored electronically. According to Herbert Samuel, a library is thought in cold storage [5], and unfortunately digital libraries can be cold, isolated, impersonal places that are inaccessible to both machines and people. Many scientists now organize their knowledge of the literature using some kind of computerized reference management system (BibTeX, EndNote, Reference Manager, RefWorks, etc.), and store their own digital libraries of full publications as PDF files. However, getting hold of both the data (the actual publication) and the metadata for any given publication can be problematic because they are often frozen in the isolated and icy deposits of scientific publishing. Because each library and publisher has different ways of identifying and describing their metadata, using digital libraries (either manually or automatically) is much more complicated than it needs to be [6], and with papers in the life sciences alone (at Medline) being published at the rate of approximately two per minute [7], only computerized analyses can hope to be reasonably comprehensive. What then, are these digital libraries, and what services do they provide? As far as computational Biologists are concerned, and for the purposes of this Review, we shall define a digital library more broadly as a database of scientific and technical articles, conference publications, and books that can be searched and browsed using a Web browser. As of early 2008, there is a wide range of these digital libraries, but no single source covering all information (in part because of the cost, given that there are some 25,000 peer-reviewed journals publishing some 2.5 million articles per year [8]). Each library is isolated, balkanized, and has only partial coverage of the entire literature. This contrasts with the historically pre-eminent library of Alexandria whose great strength was that it brought together all the useful literature then available to a single location. Like Alexandria, most digital libraries are currently read-only, allowing users to search and browse information, but not to write new information nor add personal knowledge. Other digital libraries are in danger of becoming write-only datatombs [9], where data are deposited but will probably never be accessed again. Indeed, the literature itself is now so vast that most scientists choose to access only a fraction of it [10], at potentially considerable intellectual loss [11] (see also [12]). Digital libraries provide electronic access to documents, sometimes just to their abstracts and sometimes to the full text of the publication. Presently, the number of abstracts considerably exceeds the number of full-text papers, but with the emergence of Open Access initiatives (e.g., [1316]), Institutional Repositories (e.g., [1720]), and the like, this is set to change considerably. This is very important, as much additional information exists in full papers that i (...truncated)


This is a preview of a remote PDF: http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pcbi.1000204&representation=PDF

Duncan Hull, Steve R. Pettifer, Douglas B. Kell. Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web, PLoS Computational Biology, 2008, 10, DOI: 10.1371/journal.pcbi.1000204