Nominalization and Alternations in Biomedical Language
Citation: Cohen KB, Palmer M, Hunter L (
Nominalization and Alternations in Biomedical Language
K. Bretonnel Cohen 0
Martha Palmer 0
Lawrence Hunter 0
Robert P. Futrelle, Northeastern University, United States of America
0 1 Center for Computational Pharmacology, University of Colorado School of Medicine , Aurora , Colorado, United States of America, 2 Department of Linguistics, University of Colorado at Boulder , Boulder, Colorado , United States of America
Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedical language. We also report on a previously undescribed alternation involving an adjectival present participle.
-
Funding: K. Bretonnel Cohens and Lawrence Hunters work was supported by grants G08LM009639, R01LM009254, and R01LM008111. Martha Palmers work
was supported by NSF grant CISE-CRI 0551615. J. Gregory Caporaso is supported by training grant fellowship T15LM009451. No sponsors or funders were
involved in the design or conduct of the study; in the collection, analysis, or interpretation of the data; or in the preparation, review, or approval of the
manuscript.
Competing Interests: The authors have declared that no competing interests exist.
This work is a step toward understanding the syntactic and
semantic aspects of verb meaning in the biomedical domain. The
goal is to lay the groundwork for a set of representations of
domainspecific verbs that is broad enough in its coverage to scale up to
realistic problems in information extraction, and deep enough in its
representation to support accurate extraction of information in the
face of syntactic variability and to allow for the resolution of
coreferential and related (e.g. elliptical) references in text. In an initial
step, we sought to answer a very basic question: do alternations occur
in biomedical texts? (Alternation is the term in theoretical linguistics for
variations in the surface syntactic form of verbs.) We approached the
problem by determining what the most frequent verbs are in
biomedical text, then analyzing those verbs and their
nominalizations in terms of the alternations that they participate in. Of the
many classes of alternations that verbs participate in, we looked
specifically at the passive alternation (Levin classes 5.1 Verbal Passive,
5.3 Adjectival Passive, and 5.4 Adjectival Perfect Participle) and at
alternations related to transitivity (Levin class 1 Transitivity alternations
and its descendants). We also report a previously undescribed
alternation, Adjectival Present Participle. For the nouns, we examined
alternations in the presence or absence of arguments and in the
syntactic position of non-absent arguments.
One characteristic of alternations is that they preserve the
underlying semantics of an assertion even in the face of syntactic
variability. For example, one commonly known alternation is the
passive alternation. One claim of an alternations-based approach
to explaining syntactic/semantic relations is that in
N FSH stimulates follicular development (PMID 12021046) and
N follicular development is stimulated by FSH (PMID 6615964)
the underlying semantics of the sentences, i.e. that FSH is the
stimulator and follicular development is the thing that is stimulated, is
the same, even though in the first sentence FSH is the grammatical
subject and follicular development is the grammatical object, while in
the second sentence follicular development becomes the grammatical
subject and there is no grammatical object, per se. Alternations
have been a topic of interest in the theoretical linguistics literature
because they are thought to shed light on what is known in
linguistics as the mapping problem: how it is that underlying
semantics are realized in the syntax of sentences. One assumption
of the model is that verbs with shared semantics will participate in
the same alternations.
Alternations are of relevance to language processing and text
mining because of the contribution that they might make to the
development of broad-coverage rule- and pattern-based systems
for relation extraction: if verbs with similar semantics do
participate in the same alternations, then it might be possible to
take advantage of this by inheriting or otherwise reusing abstract
rules in broad classes of verbs. For example, if it turns out to be the
case that transitive verbs share the trait of being able to occur in
the passive alternation, then system developers might be able to
write just two rules for extracting relations from active and passive
sentences and share those between all transitive verbs, rather than
writing a separate active rule and a separate passive rule for each
transitive verb in the lexicon.
Levin (1993) [1] identified fifty major classes of alternations.
That work also identified 49 major semantic classes of verbs,
grouped according to the alternations in which they do and do not
participate. (There are also subclasses of the fifty major classes of
alternations and of the 49 major classes of verbs.) To illustrate the
relationship between the semantics of related verbs and their
shared syntactic behaviors, consider what Levin termed calibratable
change-of-state verbs. These verbssuch as increaseshare the
semantic characteristics of a state-change in the logical object of
the verb, and the syntactic behavior that when they are
intransitive, the grammatical subject of the verb is the undergoer
of the change (i.e., is the logical object). Thus, in
N the addition of hCG alone significantly increased lyase activity in these cells
(PMID 2788776)
the verb increase is transitive and lyase activity is both the
grammatical (...truncated)