A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

Jul 2010

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

Leser U (2010) A Comprehensive Benchmark of Kernel Methods to Extract Protein-Protein Interactions from Literature. PLoS Comput Biol 6(7): e1000837. doi:10.1371/journal.pcbi.1000837 A Comprehensive Benchmark of Kernel Methods to Extract Protein-Protein Interactions from Literature Domonkos Tikk 0 Philippe Thomas 0 Peter Palaga 0 Jo rg Hakenberg 0 Ulf Leser 0 Andrey Rzhetsky, University of Chicago, United States of America 0 1 Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universita t zu Berlin , Berlin, Germany , 2 Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary, 3 Department of Computer Science and Engineering, Arizona State University , Tempe, Arizona , United States of America The most important way of conveying new findings in biomedical research is scientific publication. Extraction of proteinprotein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using crossvalidation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods. - Funding: DT is supported by the Alexander-von-Humboldt Foundation (http://www.humboldt-foundation.de/web/home.html). PT is supported by the Federal Ministriy of Education and Research, Germany (BMBF, http://www.bmbf.de/en/1398.php), grant no 0315417B. JH acknowledges support by Arizona State University (http://www.asu.edu/) and Science Foundation Arizona (http://www.sfaz.org/). PP was supported by the Max-Planck-Gesellschaft (http://www.mpg.de/ english/portal/index.html) under project TM-REG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Protein-protein interactions (PPIs) are integral to virtually all cellular processes, such as metabolism, signaling, regulation, and proliferation. Collecting data on individual interactions is crucial for understanding these processes at a systems biology level [1]. Known PPIs help to predict the function of yet uncharacterized proteins, for instance using conserved PPI networks [2] or proximity in a PPI network [3]. Networks can be generated from molecular interaction data and are useful for multiple purposes, such as identification of functional modules [4] or finding novel associations between genes and diseases [5]. Several approaches are in use to study interactions in large- or small-scale experiments. Among the techniques most often used are two-hybrid screens, mass spectrometry, and tandem affinity purification [6]. Results of high-throughput techniques (such as two-hybrid screens and mass spectrometry) usually are published in tabular form and can be imported by renowned PPI databases quickly. These techniques are prone to produce comparably large numbers of false positives [7]. Other techniques, such as coimmunoprecipitation, cross-linking, or rate-zonal centrifugation, produce more reliable results but are small-scale; these are typically used to verify interesting yet putative interactions, possibly first hypothesized during large-scale experiments [8]. Only now, authors started to submit results directly to PPI databases in a regular manner, oftentimes as a step required by publishers to ensure quality. Taking into account the great wealth of PPI data that was published before the advent of PPI databases, it becomes clear that still much valuable data is available only in text. Turning this information into a structured form is a costly task that has to be performed by human experts [9]. Recent years have seen a steep increase in the number of techniques that aim to alleviate this task by applying computational methods, especially machine learning and statistical natural language processing [10]. Such tools are not only used to populate PPI databases, but their output is often also used directly as independent input to biological data mining (see, e.g., [11,12]). The most important way of conveying new findings in biomedical research is scientific publication. In turn, the most recent and most important findings can only be found by carefully reading the scientific literature, which becomes more and more of a problem because of the enormous number of published articles. This situation has led to the development of various computational approaches to the automatic extraction of important facts from articles, mostly concentrating on the recognition of protein names and on interactions between proteins (PPI). However, so far there is little agreement on which methods perform best for which task. Our paper reports on an extensive comparison of nine recent PPI extraction tools. We studied their performance in various settings on a set of five different text collections containing articles describing PPIs, which for the first time allows for an unbiased comparison of their respective effectiveness. Our results show that the tools performance depends largely on the collection they are trained on and the collection they are then evaluated on, which means that extrapolating their measured performance to arbitrary text is still highly problematic. We also show that certain cl (...truncated)


This is a preview of a remote PDF: http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pcbi.1000837&representation=PDF
Article home page: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000837

Domonkos Tikk, Philippe Thomas, Peter Palaga, Jörg Hakenberg, Ulf Leser. A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature, 2010, Volume 6, Issue 7, DOI: 10.1371/journal.pcbi.1000837