Coverage of oncology drug indication concepts and compositional semantics by SNOMED-CT.
Coverage of Oncology Drug Indication Concepts and Compositional
Semantics by SNOMED-CT®
Steven H. Brown MD1, 2, Brent A. Bauer, MD3 , Dietlind L. Wahner-Roedler MD3 , Peter
L. Elkin MD3. 1Department of Veterans Affairs, 2 Vanderbilt University, 3 Mayo Clinic
Objective: To evaluate SNOMED-CT ‘s ability
to represent simple and compositional concepts
in FDA approved oncology drug indications.
Methods : Oncology drug indications were
decomposed into single and compositional
concepts. SNOMED-CT’s coverage of single
concepts and the semantics needed to create
compositional concepts were evaluated using
automated and manual techniques.
Results: SNOMED-CT covered 86.3% of single
concepts present in oncology drug indications;
11.3% of indications were covered completely.
Coverage was best for concepts describing
diseases, anatomy, and patient characteristics.
Medications accounted for 50.5% of missing
concepts. Excluding drug names, 45.2% of
indications were completely represented.
SNOMED -CT’s semantics completely
represented 60.1% of compositional expressions.
Conclusions: SNOMED -CT’s overall coverage
of the concepts in oncology drug indications was
good. Improvements or alternatives are needed
for medications and semantics.
Content coverage studies are not new to the
literature. For example, in 1977 Lowery et al
examined ICD, SNOMED, and the Cardiff
system for coding congenital malformations and
genetic syndromes 7 . A number of subsequent
content coverage studies further evaluated the
SNOMED family of terminologies 8-14 .
SNOMED -CT is a reference terminology created
from the combination of SNOMED-RT and the
National Health Service’s Clinical Terms version
315 . According to the July 2002 fact sheet,
SNOMED -CT contained 333,000 concepts and
approximately 1,000,000 “is a” semantic
relationships. SNOMED -CT supports the
composition of new terms through the
combination of existing concepts. A national
license for SNOMED-CT was being negotiated
by the NLM at the time this manuscript was
written. If this license agreement comes to pass,
SNOMED -CT could become a defacto national
standard. Thus, understanding the content
coverage of SNOMED -CT is of particular
importance at this time.
Introduction
In the past five years a number of papers
detailing desirable characteristics of
terminologies have been published. In 1998,
Chute documented 11 characteristics that
terminologies should have or evolve to have in
order to meet important needs of health care 1 .
Cimino’s 2 work from the same year described
12 “desiderata” synthesized from the literature of
medical vocabulary research. ASTM E 2087-00,
published in 2000, enumerated over 50 quality
indicators for controlled health vocabularies 3 .
ISO TS171174 carries forward the ideas in
ASTM 2087 as an international technical
specification. Two additional publications5, 6
advance our understanding of terminology
quality indicators even further. While the
guideline authors may disagree on certain fine
points, the importance of content coverage is
universally acknowledged. In our experience, the
importance of content coverage is understood
and accepted by technical and non-technical
audiences alike. “Content, content, content” 2
delivers the message succinctly.
Compositionality has been proposed and
successfully demonstrated as an approach to
improve content coverage16-18 . For example, post
coordinated composition of UMLS concepts to
represent problem statements has performed
significantly better than UMLS concepts alone19 .
The linkage of two or more concepts is typically
achieved using a formal semantic that details the
concepts’ relationship. For example, the concepts
“enalapril” and “angiotensin converting enzyme
inhibition” could be joined by the semantic
relationship “has mechanism of action.” Post
coordinating a terminology’s concepts via its
semantics suggests another type of study: the
content coverage of the linking semantics. We
believe semantics are an important part of
compositional terminologies. Others agree. For
instance, Bakken evaluated SNOMED -CT’s
semantics in a study of nursing diagnoses20 .
In the current study, we evaluate SNOMEDCT’s ability to represent the content of a set of
FDA approved oncology drug indications and
perform a preliminary analysis of its semantics
AMIA 2003 Symposium Proceedings − Page 115
Methods
Approved oncology drug indications (table 1)
were downloaded from the FDA Oncology Tools
website21 . SNOMED -CT version 1.0 from the
College of American Pathologists was employed.
All downloaded indications were manually
broken into single concepts and compositional
concepts. Our method identified the shortest
medically sensible compositional concepts
within the indication. Expressions composed of
two concepts (e.g. oral + capsule) were identified
whenever possible. A second author verified
each proposed compositional expression.
Examples of single and compositional concepts
identified within indications are given in table 1.
Each single or compositional concept was
categorized as relating to treatments, diseases,
patients, medications, anatomy, or other. Only
concepts that mentioned a specific medication
were classified as medication related. Concepts
referring to broad classes of medications were
classified as treatment related. Descriptive
statistics and tables documenting the most
commonly occurring single and compositional
expressions are presented in the results section.
SNOMED -CT’s content coverage of the
identified single concepts was measured in two
phases. In the first phase, automated concept
identification tools available in our lab19, 22 were
applied to each indication. The output was an
XML file containing the original indications and
all mapped SNOMED concepts. Each indication
concept to SNOMED concept mapping was
manually reviewed for correctness. The
indication concepts that were not mapped
properly via the algorithmic approach were
manually reviewed using the Mayo Vocabulary
Server and Browser tool loaded with SNOMEDCT. In this manner, single concepts were
determined to be present or absent.
SNOMED -CT’s coverage of the semantics
needed to form compositional expressions was
evaluated by manual modeling. The single ‘best’
fitting semantic relation was used to link
concepts forming each compositional expression.
The adequacy of each semantic’s representation
of the meaning of the compositional expression
was judged by consensus of two reviewers to be
1) complete, 2) partial, or 3) inadequate.
Results
The FDA website contained 115 indications for
68 unique drugs. We identified 1527 concepts in
the 115 indications. The mean number of
concepts per indication was 13.3 (95% CI 12.0 –
14.2) with a range from 3 to 48. Table 1 shows
two representative indications and the concepts
identified within them.
The ten most commonly found single concepts
and their frequency of occurrence are: “Patients”
(56), “Treatment” (44), “Cancer” (44),
“Therapy” (34), “Combination” (34),
“Metastatic” (25 (...truncated)