The UMLS knowledge source server: an object model for delivering UMLS data. (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1480045/pdf/

The UMLS knowledge source server: an object model for delivering UMLS data.

The UMLS Knowledge Source Server: An Object Model For Delivering UMLS Data Anantha Bangalore, Karen E. Thorn, Carolyn Tilley, Lee Peters US National Library of Medicine, Bethesda, Maryland The Unified Medical Language System® (UMLS ®), a project of the National Library of Medicine (NLM), regularly distributes a set of knowledge sources to the research community. These data are made available over the Internet through the UMLS Knowledge Source Server (UMLSKS). The new version of the UMLSKS is a complete redesign of the original system using Java and the Extensible Markup Language (XML) technologies to implement a fast, reliable, flexible, and extensible UMLS data retrieval system that includes an Application Programmer’s Interface (API) and an Object Model of each of the Knowledge Sources: the UMLS Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon. In this paper we present the design of the new system, outline each of the system design goals, the UMLS Object Model, and statistics showing the usage of the new UMLSKS and associated data. We conclude with implications for future work. INTRODUCTION The Unified Medical Language System® (UMLS®) approach involves the development of a set of widely distributed Knowledge Sources (Metathesaurus®, Semantic Network, and SPECIALIST Lexicon). These Knowledge Sources can be used by a variety of computerized applications to compensate for differences in the way concepts are expressed in a variety of biomedical vocabularies [1]. Currently, over 1900 individuals and institutions have signed the UMLS License Agreement, enabling them to receive the UMLS data either on CD-ROM or through the UMLS Knowledge Source Server (UMLSKS). A smaller number of licensees (approximately 1200) have registered for access to the UMLSKS. The UMLS is large and complex and presents significant challenges in retrieving information in a comprehensive way. The centrally managed UMLSKS provides system developers with UMLS information remotely and on demand. The advantage of such an approach is that it makes the Knowledge Sources readily available and perhaps more importantly, developers do not need to invest time and effort in understanding the structure of the data files and other details to use the UMLS data in their applications. In 1995, the UMLS data were made available for the first time through the Internet-based UMLSKS [2]. Since then there have been significant improvements to the software and hardware components of the UMLSKS resulting in enhanced performance, increased flexibility, extensibility, and scalability, and better software developer access to UMLS data. Functionally, the UMLSKS is similar to previous versions in facilitating remote site users, individuals as well as computer programs, to send requests to a server at the National Library of Medicine (NLM) through multiple channels. The similarity ends there. The old system ran as a single server using a flagbased command line Application Programmer’s Interface (API) that was written in the “C” programming language. The new Java-based system was designed with the following tenets in mind: • • Extensibility for ease of new feature integration Flexibility by providing a rich API set to allow system developers access to all UMLS data elements • Access to data through multiple channels (web, XML/socket API, and Java API) • Provision of a unified data model for the Knowledge Sources for use by application developers • Scalability in handling ever increasing user loads and increasing numbers of UMLS source vocabularies • Performance enhancement to provide faster access to UMLS data • Ease of administration by NLM staff and contractors The UMLSKS Object Model for each of the Knowledge Sources allows users to ingest XML documents produced by the UMLSKS and to manipulate those data in an object-oriented fashion within their own programs. The load on the new system is spread across multiple machines to achieve load balance and fault tolerance. UMLSKS API The API provides a number of functions for querying UMLS Knowledge Source information from the UMLSKS. Two programming interfaces are available AMIA 2003 Symposium Proceedings − Page 51 to developers wishing to use the UMLSKS to retrieve UMLS data content – a Java Remote Method Invocation (RMI)-based mechanism and a TCP/IP socket-based mechanism. The first scheme utilizes the Java RMI package to establish a connection to the UMLSKS that allows client applications to make method calls from directly within their Java programs. The underlying communications mechanism is hidden and frees the user from needing to directly manage the communications with the UMLSKS server. The second scheme is a lower level mechanism that can be used with any programming language. The socketbased scheme includes a TCP/IP server running on the UMLSKS server that accepts socket connections from remote clients. Clients establish a connection to this server socket, compose a UMLSKS API request in XML format to send over this connection, and then await receipt of the XML response from the server. Client programs may be written in any language that supports TCP/IP socket communication. Java programmers can take further advantage of the API by using the Object Model to interpret the returned XML. The API is built on the premise that all of the Metathesaurus may not be required by every developer. Many applications require only a fraction of the information available. With this in mind, the API was developed to slice the Metathesaurus into subsets of data. This results in a reduction of the total amount of information traveling between the UMLSKS and client applications and also provides applications with fine-grained control over the data they wish to receive. These modifications to the software yield significant performance improvements over the previous version. The API exclusively uses XML for describing data for each of the Knowledge Sources. As an industry standard means of structuring information, XML provides a platform-independent form for representing hierarchical data like those of the Knowledge Sources. XML is basically ASCII text that is self-describing through use of descriptive data tags. Many tools exist for manipulating and displaying XML that make the developer’s job easier by releasing them from this responsibility and allowing them to focus on the application details. The use of XML gives the system its extensibility and flexibility as proprietary formats are dropped in favor of a more-widely available and accepted form and XML is inherently forward compatible. UMLS OBJECT MODEL Previously, the onus has been on application developers to create their own usable data model for the Knowledge Sources. Each developer needed to understand the relational data representation delivered by the UMLS development group in order to abstract the Knowledge Source contents into application level components. Competing UMLS object models existed but without a cons (...truncated)