Sharing Detailed Research Data Is Associated with Increased Citation Rate

PLOS ONE, Mar 2007

Background Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. Principal Findings We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. Significance This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.

Sharing Detailed Research Data Is Associated with Increased Citation Rate

Citation: Piwowar HA, Day RS, Fridsma DB ( Sharing Detailed Research Data Is Associated with Increased Citation Rate Heather A. Piwowar 0 1 Roger S. Day 0 1 Douglas B. Fridsma 0 1 0 Academic Editor: John Ioannidis, University of Ioannina School of Medicine , Greece 1 Department of Biomedical Informatics, University of Pittsburgh School of Medicine , Pittsburgh, Pennsylvania , United States of America Background. Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. Principal Findings. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. Significance. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data. - INTRODUCTION Sharing information facilitates science. Publicly sharing detailed research datasample attributes, clinical factors, patient outcomes, DNA sequences, raw mRNA microarray measurementswith other researchers allows these valuable resources to contribute far beyond their original analysis[1]. In addition to being used to confirm original results, raw data can be used to explore related or new hypotheses, particularly when combined with other publicly available data sets. Real data is indispensable when investigating and developing study methods, analysis techniques, and software implementations. The larger scientific community also benefits: sharing data encourages multiple perspectives, helps to identify errors, discourages fraud, is useful for training new researchers, and increases efficient use of funding and patient population resources by avoiding duplicate data collection. Believing that that these benefits outweigh the costs of sharing research data, many initiatives actively encourage investigators to make their data available. Some journals, including the PLoS family, require the submission of detailed biomedical data to publicly available databases as a condition of publication[24]. Since 2003, the NIH has required a data sharing plan for all large funding grants. The growing open-access publishing movement will perhaps increase peer pressure to share data. However, while the general research community benefits from shared data, much of the burden for sharing the data falls to the study investigator. Are there benefits for the investigators themselves? A currency of value to many investigators is the number of times their publications are cited. Although limited as a proxy for the scientific contribution of a paper[5], citation counts are often used in research funding and promotion decisions and have even been assigned a salary-increase dollar value[6]. Boosting citation rate is thus is a potentially important motivator for publication authors. In this study, we explored the relationship between the citation rate of a publication and whether its data was made publicly available. Using cancer microarray clinical trials, we addressed the following questions: Do trials which share their microarray data receive more citations? Is this true even within lower profile trials? What other data-sharing variables are associated with an increased citation rate? While this study is not able to investigate causation, quantifying associations is a valuable first step in understanding these relationships. Clinical microarray data provides a useful environment for the investigation: despite being valuable for reuse and extremely costly to collect, is not yet universally shared. RESULTS We studied the citations of 85 cancer microarray clinical trials published between January 1999 and April 2003, as identified in a systematic review by Ntzani and Ioannidis[7] and listed in Supplementary Text S1. We found 41 of the 85 clinical trials (48%) made their microarray data publicly available on the internet. Most data sets were located on lab websites (28), with a few found on publisher websites (4), or within public databases (6 in the Stanford Microarray Database (SMD)[8], 6 in Gene Expression Omnibus (GEO)[9], 2 in ArrayExpress[10], 2 in the NCI GeneExpression Data Portal (GEDP)(gedp.nci.nih.gov); some datasets in more than one location). The internet locations of the datasets are listed in Supplementary Text S2. The majority of datasets were made available concurrently with the trial publication, as illustrated within the WayBackMachine internet archives (www.archive.org/web/web.php) for 25 of the datasets and mention of supplementary data within the trial publication itself for 10 of the remaining 16 datasets. As seen in Table 1, trials published in high impact journals, prior to 2001, or with US authors were more likely to share their data. The cohort of 85 trials was cited an aggregate of 6239 times in 20042005 by 3133 distinct articles (median of 1.0 cohort citation per article, range 123). The 48% of trials which shared their data received a total of 5334 citations (85% of aggregate), distributed as shown in Figure 1. Funding: HAP was supported by NLM Training Grant Number 5T15-LM007059-19. The NIH had no role in study design, data collection or analysis, writing the paper, or the decision to submit it for publication. The publication contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. . ... Table 1. Characteristics of Eligible Trials by Data Sharing. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... TOTAL T8No5utmalber of Art4Dic1alet(as48S%ha)red 4D4at(a52N%o)t Shared Odds Ratio (95% confidence interval) . .......... LPHouigwbhliIsmIhmeppdaacc1tt9J9(o.9ur=2n20a50l)0 76132 25192(8((341%000%)%) ) 4104((10(7%6%0)%) ) 6.0(3(.08.6toto2)88.5) . ....... IPnucblulidsheeadU2S00A1ut2h0o0r3 5769 3356 ((6436%%)) 2413 ((3584%%)) 6.4 (2.0 to 21.9) ... No US Authors 29 6 (21%) 23 (79%) ... .. doi:10.1371/journal.pone.0000308.t001 Whether a trials dataset was made publicly available was significantly associated with the log of its 20042005 citation rate (69% increase in citation count; 95% confidence interval: 18 to 143%, p = 0.006), independent of journal impact factor, date of publication, and US authorship. Detailed results of this multivariate linear regression are given in Table 2. A similar result was found when we regressed on the number of ci (...truncated)


This is a preview of a remote PDF: http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0000308&type=printable
Article home page: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000308

Heather A. Piwowar, Roger S. Day, Douglas B. Fridsma. Sharing Detailed Research Data Is Associated with Increased Citation Rate, PLOS ONE, 2007, 3, DOI: 10.1371/journal.pone.0000308