Profile of Michel Dumontier
0 Photo Courtesy of Dr. Michel Dumontier
1 Christian Burris , Profiles Editor
2 Profile of Michel Dumontier, Distinguished Professor of Data Science at Universiteit Maastricht in the Netherlands and 32nd Annual NASIG Conference Vision Speaker
Wide Web Consortium Semantic Web in Health Care and Life Sciences Interest Group (W3C HCLSIG), and the Scientific Director for Bio2RDF, an open source biological database. You can also follow Dr. Dumontier on Twitter as @micheldumontier. My interview with Dr. Dumontier was completed by email on February 15, 2017.
-
Indeed, I remember spending vast amounts of time in
the library examining different materials to glean insight
into a particular subject matter – from textbooks on
protein structure and function to peer-reviewed journal
articles detailing computational analyses of their
composition and sequence. As library collections
became searchable using the web browser, and
textbooks and journals became available in a digital
form that could be downloaded, searched, and
annotated, the need to be physically present in the
library has drastically reduced. Until fairly recently, the
role of the library in my work was largely as a
gatekeeper to the digital collections of scientific
research articles that were only realistically available
through institutional subscriptions. With the advent of
open access movement, where authors pay publishers
NASIG Newsletter May 2017
to make their articles available free of charge, even the
gatekeeper role has been diminishing somewhat.
In your view, what are some of the possibilities of the
semantic web?
Nonetheless, new challenges concerning research data
– description, archival, citation, and availability - are
being now been taken on by the libraries. Libraries are
providing key infrastructure to help researchers prepare
their digital research products in a manner that meets
the demands of funders, journals, and other
researchers. A key part of that is by making them FAIR –
findable, accessible, interoperable, and reusable. I have
been fortunate to have had the opportunity to work
with a diverse group of individuals, particularly from
libraries, through multiple international forums to not
only establish the FAIR principles, but also pursue
practical implementations and their evaluation. While
libraries are playing a key role in the formulation of
standardized metadata, I think the future will be in the
preparation and execution of resource sharing plans for
digital research products.
What attracted you to the field of data science?
I was fortunate enough to have had a computer since I
was a young child. While I used the computer mostly for
playing video games, I think this helped develop my
curiosity driven approach to tackling unseemingly large
scientific problems. It wasn’t until graduate school did I
start to program and really understand the importance
of i) having access to data and ii) that it be in a form
that is readily available for other uses. A large part of
my work has been on how to publish biomedical
knowledge such that it becomes easier to ask and
answer sophisticated questions that span the molecular
to the clinical. Key to this vision has been the adoption
of semantic web technologies to build a large
interconnected knowledge graph. More recent work
has focused on using data science methods over this
knowledge graph to find evidence of biological
phenomena or to characterize the response of patient
populations to complex therapeutic interventions.
The semantic web offers the tantalizing possibility that
people will be able to do more significant things with
substantially less effort. At the heart of the semantic
web effort is the continuous development of standards
and conventions that promote the interoperability of
data and services. The semantic web builds on the
principles of the Web itself – a decentralized
architecture that establishes the minimal behavior for
maximal reuse. Just as we can use a single application to
browse content the World Wide Web using HTML
formatted pages and the HTTP protocol, the semantic
web add new conventions to represent, link, access,
query, and explore structured knowledge. The result is a
massive graph that interlinks datasets across different
providers and domains, removes the site-specific quirks
of accessing these data, and creates new opportunities
for reuse. This is a significant development because it
enables users to spend less time processing data and
more time focusing on the problem to be solved.
How does data science and the Internet of Things
intersect?
Vast amounts of data are now being generated through
IoT devices – the big question is how this data can be
used to in the acceleration of scientific discovery, the
improvement of health care and wellbeing, and in the
strengthening of communities – the three pillars for the
inter-faculty Institute of Data Science that we are
building at Maastricht University. The massive
interconnection of these devices poses subs (...truncated)