Graph embeddings in criminal investigation: towards combining precision, generalization and transparency
World Wide Web
https://doi.org/10.1007/s11280-021-01001-2
Graph embeddings in criminal investigation: towards
combining precision, generalization and transparency
Special issue on computational aspects of network science
Valerio Bellandi1
· Paolo Ceravolo1
· Samira Maghool1
· Stefano Siccardi1
Received: 1 April 2021 / Revised: 17 September 2021 / Accepted: 22 December 2021
© The Author(s) 2022
Abstract
Criminal investigation adopts Artificial Intelligence to enhance the volume of the facts that
can be investigated and documented in trials. However, the abstract reasoning implied in
legal justification and argumentation requests to adopt solutions providing high precision,
low generalization error, and retrospective transparency. Three requirements that hardly
coexist in today’s Artificial Intelligence solutions. In a controlled experiment, we then investigated the use of graph embeddings procedures to retrieve potential criminal actions based
on patterns defined in enquiry protocols. We observed that a significant level of accuracy can
be achieved but different graph reformation procedures imply different levels of precision,
generalization, and transparency.
Keywords Knowledge graphs · Enquiry protocols · Criminal investigation ·
Graph embeddings
1 Introduction
Criminal investigation and prosecution are complex procedures that have to deeply examine large documental sources to spotlight facts often unrevealed, denied, or deliberately
This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network
Science
Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir
Samira Maghool
Valerio Bellandi
Paolo Ceravolo
Stefano Siccardi
1
Department of Computer Science, Università degli studi di Milano, Via Celoria 18, Milan, Italy
World Wide Web
withheld by criminal agents. Criminal agents also find out and make use of those assets
that are less traced in the documental sources exploited by persecutors and law enforcement agencies. On the contrary, a successful investigation should result in exhaustive and
documented proceedings delineating the facts and the responsibilities that comprise criminal actions. This must be done in accordance with the law and following policies that can
guarantee fair, impartial, and efficient procedures. Artificial Intelligence (AI) [2] have been
proposed in support of the great deal of work implied by criminal prosecution. AI can process large data sources and automatically identify relevant patterns to support the prosecutor
with recommendations. It has been observed that automating the inspection of data sources
can significantly impact the size of documents a prosecutor can bring in the trial [21] but
the benefits of AI go beyond the volumes of facts that can be documented. AI can enhance
the ability to identify criminal actions by flexibly matching the patterns defined by prosecutors and extending their scope [3], anticipating this way the ability of criminals in hiding
their actions by layering the stages that bring to exchanges and revenues.
A basic requirement of AI is precision, that in information retrieval refers to the rate
of relevant information within the set retrieved information. This requirement impacts how
reliable will be considered AI predictions. However, high precision achieved at the cost of
low generalization is not favorable, otherwise, the validity of the AI support will be very
narrowed and over fitted to specific problem. This notion can be measured by the generalization error that rates how accurately an algorithm can predict outcome values for
previously unseen data. Intuitively, larger is the domain of data that can be accurately handled by an algorithm better it generalizes. It has been observed that the generalization error
may be significant when applying AI [19] to legal documentation. Either if the intelligence
is achieved by expert systems, driven by explicit rules, by supervised learning, discriminating from large amounts of examples, or by the hybridization of these techniques, obtaining
low generalization errors is challenging. Legal systems are, indeed, constructed on very
abstract definitions and interpretations of facts, that can be hardly reduced to a restricted set
of observations and that evolve over time following the concerns of the society. The juridical
justifications that are provided to decide on a case are often constructed a-posteriori, based
on the imputations to be supported and using a selective set of legislative provisions. This
problem is, for example, observed in [26], where an average accuracy range from 58% to
68% is reported in predicting decisions of the European Court of Human Rights for future
cases based on the cases from the past. As the authors note, the Court formulates the justifications in a way that is conducive to fit the conclusion. A same legal framework can be
applied differently based on the conditions encompassing a case.
A prosecutor has to verify specific facts, enquiries must address information in the
scope of the probed events, all related actions must be transparently documented. Therefore,
another key requirement for AI in criminal investigation is transparency. Following [13, 25]
we highlight transparency in algorithmic decision-making systems is more than accountability. An accountable software is software we can observe, verify the tasks it executed,
the systems, and the users it interacted with [12]. A transparent AI has to support the retrospective analysis of the decision process followed by the algorithms, decomposing it into
the main elements that determined the final decision and allowing to backtrack the decision
steps followed.
In [3] we presented a method to support criminal investigation by operationalizing
Enquiry Protocols. Prosecutors adopt protocols to identify the qualified sources, registers,
and documents that can be exploited to pursue a crime, the information to be verified and
integrated, and the formal stages to be followed during the prosecution. Using a set of
data integration techniques, data sources can be organized in queryable Knowledge Graphs.
World Wide Web
Prosecutors can express a protocol as a set of subsequent operations over the data sources,
referring to the historical cases they addressed. Each operation can then be translated into
exact queries over the knowledge graph. An user interface can guide prosecutors along a
workflow that allow applying exact queries and examining their results. However, an intrinsic limit of exact specifications is that a small variation in the structure of a data source may
result in an unmatched occurrence. Using AI the exact knowledge of prosecutors can be
generalized supporting the identification of patterns similar to the ones identified from their
experience but differing in some respect. For example, a protocol could define a suspected
drug dealer as a person traveling with an unusual number of baggage pieces, missing the
fact he/she (...truncated)