The UMLS Semantic Network and the Semantic Web.
The UMLS Semantic Network and the Semantic Web
Vipul Kashyap, Ph.D.
National Library of Medicine, Bethesda, Maryland
The Unified Medical Language System (UMLS) ,
an extensive source of biomedical knowledge
developed and maintained by the US National
Library of Medicine (NLM) is being currently used in
a wide variety of biomedical applications. The
Semantic Network, a component of the UMLS is a
structured description of core biomedical knowledge
consisting of well defined semantic types and
relationships between them. We investigate the
expressiveness of DAML+OIL, a markup language
proposed for ontologies on the Semantic Web, for
representing the knowledge contained in the
Semantic Network. Requirements specific to the
Semantic
Network,
such
as
polymorphic
relationships and blocking relationship inheritance
are discussed and approaches to represent these in
DAML+OIL are presented. Finally, conclusions are
presented along with a discussion of ongoing and
future work.
INTRODUCTION
The Unified Medical Language System (UMLS)
project was initiated in 1986 by the U.S. National
Library of Medicine (NLM). Its goal is to help health
professionals and researchers use biomedical
information from different sources1. It consists of
three main knowledge repositories: (a) The UMLS
Metathesaurus, which provides a common structure
for more than 95 source biomedical vocabularies. It
is organized by concept, which is a cluster of terms
(e.g., synonyms, lexical variants, translations) with
the same meaning. (b) The UMLS Semantic
Network2, which categorizes these concepts through
semantic types and relationships. (c) The
SPECIALIST lexicon contains over 30,000 English
words, including many biomedical terms.
Information for each entry, including base form,
spelling variants, syntactic category, inflectional
variation of nouns and conjugation of verbs, is used
by the lexical tools11. The 2002 version of the
Metathesaurus contains 871,584 concepts named by
2.1 million terms. It also includes inter-concept
relationships across multiple vocabularies, concept
categorization, and information on concept cooccurrence in MEDLINE.
The UMLS Semantic Network is highly suited for
representation using DAML+OIL5 constructs as it has
a rich semantic structure and an underlying metamodel consistent with the DAML+OIL specification.
In this paper, we investigate the expressiveness of
DAML+OIL constructs for representing the
knowledge contained in the Semantic Network. The
results of this work will also be applied to the UMLS
Metathesaurus.
DAML+OIL: AN ONTOLOGY LANGUAGE
FOR THE SEMANTIC WEB
The recognition of the key role that ontologies are
likely to play in the future of the Web has led to the
extension of Web markup languages in order to
facilitate content description and the development of
web ontologies, e.g., XML Schema7, RDF4 and RDF
Schema8. However, more expressive power is both
necessary and desirable in order to describe data in
sufficient detail, and enable automated reasoning,
e.g., determine semantic relationships between
syntactically different terms. The DAML+OIL
language5 is designed to describe the structure of a
domain. It takes an object oriented approach, with the
structure of the domain being described in terms of
classes and properties. An ontology consists of a set
of axioms that assert characteristics of these classes
and properties. We now present a discussion on the
various constructs in DAML+OIL with their
foundations in Description Logics (DLs)9.
DAML+OIL is, in essence equivalent to a very
expressive DL, with a DAML+OIL ontology
corresponding to a DL terminology. As in a DL,
DAML+OIL classes can be names (URIs) or
expressions. A variety of constructors (or operators)
are provided for building class expressions. The
expressive power of the language is determined by
the class (and property) constructors provided, and
by the kinds of axioms allowed. Table 1 summarizes
the constructors used in DAML+OIL expressed using
the standard DL syntax. In the RDF syntax, the
expression Bacterium ∩ Virus would be written as:
<daml:Class>
<daml:intersectionOf
rdf:parseType=”daml:collection”>
<daml:Class
rdf:about=”#Bacterium”/>
<daml:Class rdf:about=”#Virus”/>
</daml:intersectionOf>
</daml:Class>
The meanings of the first three constructors from
Table 1 are just the standard boolean operators on
classes. The oneOf constructor allows classes to be
AMIA 2003 Symposium Proceedings − Page 351
defined by enumerating their members. The toClass
and hasClass constructors correspond to slot
constraints in a frame-based language.
Table 1: DAML+OIL class constructors
Constructor
DL Syntax
Example
intersectionOf
C1 ∩ … ∩ Cn
Bacterium ∩ Animal
unionOf
C1 ∪ … ∪ Cn
Bacterium ∪ Virus
complementOf
¬C
{x1,…, xn}
¬Plant
∀P.C
∀partOf.Cell
hasClass
∃P.C
∃processOf.Organism
hasValue
∃P.{x}
∃treatedBy{aspirin}
minCardinalityQ
≥ n P.C
≥ 2 hasPart.Cell
maxCardinalityQ
≤ n P.C
= n P.C
≤ 1 hasPart.Tissue
oneOf
toClass
cardinalityQ
{aspirin, tylenol}
= 1 partOf.Cell
The class ∀P.C is the class, all of whose instances are
related via the property P only to resources of type C,
while the class ∃P.C is the class, all of whose
instances are related via the property P to at least one
resource of type C. The hasValue constructor is just
shorthand for a combination of hasClass and oneOf.
The minCardinalityQ, maxCardinalityQ and
cardinalityQ constructors (known in DLs as qualified
number restrictions) are generalizations of the
hasClass and hasValue constructors. The class ≥ n
P.C (≤ n P.C, = n P.C) is the class all of whose
instances are related via the property P to at least (at
most, exactly) n different resources of type C. The
emphasis on different is because there is no unique
name assumption wrt to resource names (URIs) and it
is possible that many URIs could name the same
resource.
Table 2 (next page, bottom) summarizes the axioms
allowed in DAML+OIL. These axioms make it
possible to assert subsumption or equivalence wrt
classes or properties, the disjointness of classes, the
equivalence or non-equivalence of individuals
(resources), and various properties of properties. A
crucial feature of DAML+OIL is that subClassOf and
sameClassAs axioms can be applied to arbitrary class
expressions. The last two rows of Table 2 refer to
DAML+OIL constructs domain/range, which
identify the domain and range classes of the various
properties. Their DL constructors are as shown. We
shall discuss later in the paper, various approaches to
represent domains and ranges and the impact it might
have on the complexity of the reasoning process.
DAML+OIL also allows properties of properties to
be asserted. It is possible to assert that a property is
unique (i.e., functional) and unambiguous (i.e., its
inverse is functional). It is also possible to use
inverse properties and assert that a property is
transitive.
DAML+OIL REPRESENTATION OF THE
SEMANTIC NETWORK
We now present a DAML+OIL representation of a
sma (...truncated)