CDEGenerator: an online platform to learn from existing data models to build model registries
Clinical Epidemiology
Dovepress
open access to scientific and medical research
ORIGINAL RESEARCH
Open Access Full Text Article
Clinical Epidemiology downloaded from https://www.dovepress.com/ by 5.135.254.153 on 21-Dec-2018
For personal use only.
CDEGenerator: an online platform to learn from
existing data models to build model registries
This article was published in the following Dove Press journal:
Clinical Epidemiology
Julian Varghese 1
Michael Fujarski 2
Stefan Hegselmann 1
Philipp Neuhaus 1
Martin Dugas 1,3
1
Institute of Medical Informatics,
University of Münster, 2Faculty of
Mathematics and Computer Sciences,
University of Münster, 3Institute
of Medical Informatics, European
Research Center for Information
Systems (ERCIS), Münster, Germany
Objective: Best-practice data models harmonize semantics and data structure of medical
variables in clinical or epidemiological studies. While there exist several published data sets,
it remains challenging to find and reuse published eligibility criteria or other data items that
match specific needs of a newly planned study or registry. A novel Internet-based method for
rapid comparison of published data models was implemented to enable reuse, customization,
and harmonization of item catalogs for the early planning and development phase of research
databases.
Methods: Based on prior work, a European information infrastructure with a large collection of
medical data models was established. A newly developed analysis module called CDEGenerator
provides systematic comparison of selected data models and user-tailored creation of minimum
data sets or harmonized item catalogs. Usability was assessed by eight external medical documentation experts in a workshop by the umbrella organization for networked medical research
in Germany with the System Usability Scale.
Results: The analysis and item-tailoring module provides multilingual comparisons of semantically complex eligibility criteria of clinical trials. The System Usability Scale yielded “good
usability” (mean 75.0, range 65.0–92.5). User-tailored models can be exported to several data
formats, such as XLS, REDCap or Operational Data Model by the Clinical Data Interchange
Standards Consortium, which is supported by the US Food and Drug Administration and European Medicines Agency for metadata exchange of clinical studies.
Conclusion: The online tool provides user-friendly methods to reuse, compare, and thus learn
from data items of standardized or published models to design a blueprint for a harmonized
research database.
Keywords: common data elements, semantic interoperability, metadata repositories, Unified
Medical Language System
Introduction
Correspondence: Julian Varghese
Institute of Medical Informatics,
University of Münster, 1 AlbertSchweitzer-Campus, Gebäude A11,
Münster 48149, Germany
Tel +49 251 835 4714
Email
961
submit your manuscript | www.dovepress.com
Clinical Epidemiology 2018:10 961–970
Dovepress
© 2018 Varghese et al. This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.
php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work
you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For
permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
http://dx.doi.org/10.2147/CLEP.S170075
Powered by TCPDF (www.tcpdf.org)
A foundational step for patient-data capture is to define the structure and semantics
of medical variables in a study. Due to a lack of reuse of data standards or existing trial-related ontologies available on BioPortal, many medical variables are
reinvented or heterogeneously defined for new studies.1,2 The lack of overview and
technical comparability of existing data models (eg, case-report forms [CRFs] or
item catalogs) that define the structure and semantics of medical variables limits
possibilities to learn best practices from similar studies that have already been
conducted.3 As a long-term effect, heterogeneity of data capture increases and
data integration and systematic analyses across different study results are limited.4
Dovepress
Clinical Epidemiology downloaded from https://www.dovepress.com/ by 5.135.254.153 on 21-Dec-2018
For personal use only.
Varghese et al
Therefore, a h armonized data-item catalog (herein “item
catalog”) is crucial to counteract these issues already in the
planning phase of a research database. Primarily, this item
catalog should list the definitions of the medical variables
(herein “data items”) being used for study feasibility or
data capture. An overview of such data items from similar
studies or existing metadata standards provides an essential
checklist for newly planned studies. This would enable
reuse of best-practice approaches and avoid possibly missing items, which are relevant for later data analysis, and
foster compatible data for later meta-analyses.
The aim to build harmonized and user-tailored item
catalog forms the rationale of our work, which requires the
key components:
1. An online open-access repository to provide valuable data
models, such as data standards, item catalogs (containing data items and coded lists as permissible values) or
full CRFs of clinical studies conducted on a broad range
of disease entities. This repository, called Medical Data
Models Portal (MDM Portal), has already been implemented based on previous work and is available at https://
medical-data-models.org.5
2. An online comprehensive analysis tool for systematic
analyses of such data models to identify common data
items (eg, demographics, clinical data). To achieve this,
each data item is linked to its language-independent
medical concept and coded within an existing international medical vocabulary. This way, terms of different languages and synonyms and homonyms within
one language can be semantically compared with one
another. The comparison should include comparison
of semantically simple concepts (eg, body height) and
free-text eligibility criteria that might contain many different atomic concepts in a single criterion (eg, patient
suffers from heart or kidney injury). As a result, a filtered
overview of existing data items and generation of a usertailored full item catalogs is possible. This item catalog
can be exported to a standardized metadata format that
is supported by electronic data-capture systems, in line
with regulatory requirements of the US Food and Drug
Administration and European Medicines Agency and
provides an initial blueprint to build upon a research
database.
While the MDM Portal serves as the primary source for
selecting data models, the analysis method is implemented as
a standar (...truncated)