CDEGenerator: an online platform to learn from existing data models to build model registries

Clinical Epidemiology, Aug 2018

Julian Varghese,1 Michael Fujarski,2 Stefan Hegselmann,1 Philipp Neuhaus,1 Martin Dugas1,3 1Institute of Medical Informatics, University of Münster, 2Faculty of Mathematics and Computer Sciences, University of Münster, 3Institute of Medical Informatics, European Research Center for Information Systems (ERCIS), Münster, Germany Objective: Best-practice data models harmonize semantics and data structure of medical variables in clinical or epidemiological studies. While there exist several published data sets, it remains challenging to find and reuse published eligibility criteria or other data items that match specific needs of a newly planned study or registry. A novel Internet-based method for rapid comparison of published data models was implemented to enable reuse, customization, and harmonization of item catalogs for the early planning and development phase of research databases. Methods: Based on prior work, a European information infrastructure with a large collection of medical data models was established. A newly developed analysis module called CDEGenerator provides systematic comparison of selected data models and user-tailored creation of minimum data sets or harmonized item catalogs. Usability was assessed by eight external medical documentation experts in a workshop by the umbrella organization for networked medical research in Germany with the System Usability Scale. Results: The analysis and item-tailoring module provides multilingual comparisons of semantically complex eligibility criteria of clinical trials. The System Usability Scale yielded “good usability” (mean 75.0, range 65.0–92.5). User-tailored models can be exported to several data formats, such as XLS, REDCap or Operational Data Model by the Clinical Data Interchange Standards Consortium, which is supported by the US Food and Drug Administration and European Medicines Agency for metadata exchange of clinical studies. Conclusion: The online tool provides user-friendly methods to reuse, compare, and thus learn from data items of standardized or published models to design a blueprint for a harmonized research database. Keywords: common data elements, semantic interoperability, metadata repositories, Unified Medical Language System

Article PDF cannot be displayed. You can download it here:

https://www.dovepress.com/getfile.php?fileID=43586

CDEGenerator: an online platform to learn from existing data models to build model registries

Clinical Epidemiology Dovepress open access to scientific and medical research ORIGINAL RESEARCH Open Access Full Text Article CDEGenerator: an online platform to learn from existing data models to build model registries Clinical Epidemiology downloaded from https://www.dovepress.com/ For personal use only. This article was published in the following Dove Press journal: Clinical Epidemiology Julian Varghese 1 Michael Fujarski 2 Stefan Hegselmann 1 Philipp Neuhaus 1 Martin Dugas 1,3 1 Institute of Medical Informatics, University of Münster, 2Faculty of Mathematics and Computer Sciences, University of Münster, 3Institute of Medical Informatics, European Research Center for Information Systems (ERCIS), Münster, Germany Objective: Best-practice data models harmonize semantics and data structure of medical variables in clinical or epidemiological studies. While there exist several published data sets, it remains challenging to find and reuse published eligibility criteria or other data items that match specific needs of a newly planned study or registry. A novel Internet-based method for rapid comparison of published data models was implemented to enable reuse, customization, and harmonization of item catalogs for the early planning and development phase of research databases. Methods: Based on prior work, a European information infrastructure with a large collection of medical data models was established. A newly developed analysis module called CDEGenerator provides systematic comparison of selected data models and user-tailored creation of minimum data sets or harmonized item catalogs. Usability was assessed by eight external medical documentation experts in a workshop by the umbrella organization for networked medical research in Germany with the System Usability Scale. Results: The analysis and item-tailoring module provides multilingual comparisons of semantically complex eligibility criteria of clinical trials. The System Usability Scale yielded “good usability” (mean 75.0, range 65.0–92.5). User-tailored models can be exported to several data formats, such as XLS, REDCap or Operational Data Model by the Clinical Data Interchange Standards Consortium, which is supported by the US Food and Drug Administration and European Medicines Agency for metadata exchange of clinical studies. Conclusion: The online tool provides user-friendly methods to reuse, compare, and thus learn from data items of standardized or published models to design a blueprint for a harmonized research database. Keywords: common data elements, semantic interoperability, metadata repositories, Unified Medical Language System Introduction Correspondence: Julian Varghese Institute of Medical Informatics, University of Münster, 1 AlbertSchweitzer-Campus, Gebäude A11, Münster 48149, Germany Tel +49 251 835 4714 Email A foundational step for patient-data capture is to define the structure and semantics of medical variables in a study. Due to a lack of reuse of data standards or existing trial-related ontologies available on BioPortal, many medical variables are reinvented or heterogeneously defined for new studies.1,2 The lack of overview and technical comparability of existing data models (eg, case-report forms [CRFs] or item catalogs) that define the structure and semantics of medical variables limits possibilities to learn best practices from similar studies that have already been conducted.3 As a long-term effect, heterogeneity of data capture increases and data integration and systematic analyses across different study results are limited.4 961 submit your manuscript | www.dovepress.com Clinical Epidemiology 2018:10 961–970 Dovepress © 2018 Varghese et al. This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms. php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php). http://dx.doi.org/10.2147/CLEP.S170075 Dovepress Varghese et al Therefore, a h armonized data-item catalog (herein “item catalog”) is crucial to counteract these issues already in the planning phase of a research database. Primarily, this item catalog should list the definitions of the medical variables (herein “data items”) being used for study feasibility or data capture. An overview of such data items from similar studies or existing metadata standards provides an essential checklist for newly planned studies. This would enable reuse of best-practice approaches and avoid possibly missing items, which are relevant for later data analysis, and foster compatible data for later meta-analyses. The aim to build harmonized and user-tailored item catalog forms the rationale of our work, which requires the key components: 1. An online open-access repository to provide valuable data models, such as data standards, item catalogs (containing data items and coded lists as permissible values) or full CRFs of clinical studies conducted on a broad range of disease entities. This repository, called Medical Data Models Portal (MDM Portal), has already been implemented based on previous work and is available at https:// medical-data-models.org.5 2. An online comprehensive analysis tool for systematic analyses of such data models to identify common data items (eg, demographics, clinical data). To achieve this, each data item is linked to its language-independent medical concept and coded within an existing international medical vocabulary. This way, terms of different languages and synonyms and homonyms within one language can be semantically compared with one another. The comparison should include comparison of semantically simple concepts (eg, body height) and free-text eligibility criteria that might contain many different atomic concepts in a single criterion (eg, patient suffers from heart or kidney injury). As a result, a filtered overview of existing data items and generation of a usertailored full item catalogs is possible. This item catalog can be exported to a standardized metadata format that is supported by electronic data-capture systems, in line with regulatory requirements of the US Food and Drug Administration and European Medicines Agency and provides an initial blueprint to build upon a research database. While the MDM Portal serves as the primary source for selecting data models, the analysis method is implemented as a standardized web service and can, therefore, also be called from other software systems. Both components are described 962 submit your manuscript | www.dovepress.com Dovepress as one online (...truncated)


This is a preview of a remote PDF: https://www.dovepress.com/getfile.php?fileID=43586
Article home page: https://doaj.org/article/213437ca0f484578bac731c7f421c048

Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries, Clinical Epidemiology, 2018, pp. 961-970, Issue Volume 10,