Graph embeddings in criminal investigation: towards combining precision, generalization and transparency (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-021-01001-2.pdf

Graph embeddings in criminal investigation: towards combining precision, generalization and transparency

World Wide Web https://doi.org/10.1007/s11280-021-01001-2 Graph embeddings in criminal investigation: towards combining precision, generalization and transparency Special issue on computational aspects of network science Valerio Bellandi1 · Paolo Ceravolo1 · Samira Maghool1 · Stefano Siccardi1 Received: 1 April 2021 / Revised: 17 September 2021 / Accepted: 22 December 2021 © The Author(s) 2022 Abstract Criminal investigation adopts Artificial Intelligence to enhance the volume of the facts that can be investigated and documented in trials. However, the abstract reasoning implied in legal justification and argumentation requests to adopt solutions providing high precision, low generalization error, and retrospective transparency. Three requirements that hardly coexist in today’s Artificial Intelligence solutions. In a controlled experiment, we then investigated the use of graph embeddings procedures to retrieve potential criminal actions based on patterns defined in enquiry protocols. We observed that a significant level of accuracy can be achieved but different graph reformation procedures imply different levels of precision, generalization, and transparency. Keywords Knowledge graphs · Enquiry protocols · Criminal investigation · Graph embeddings 1 Introduction Criminal investigation and prosecution are complex procedures that have to deeply examine large documental sources to spotlight facts often unrevealed, denied, or deliberately This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir Samira Maghool Valerio Bellandi Paolo Ceravolo Stefano Siccardi 1 Department of Computer Science, Università degli studi di Milano, Via Celoria 18, Milan, Italy World Wide Web withheld by criminal agents. Criminal agents also find out and make use of those assets that are less traced in the documental sources exploited by persecutors and law enforcement agencies. On the contrary, a successful investigation should result in exhaustive and documented proceedings delineating the facts and the responsibilities that comprise criminal actions. This must be done in accordance with the law and following policies that can guarantee fair, impartial, and efficient procedures. Artificial Intelligence (AI) [2] have been proposed in support of the great deal of work implied by criminal prosecution. AI can process large data sources and automatically identify relevant patterns to support the prosecutor with recommendations. It has been observed that automating the inspection of data sources can significantly impact the size of documents a prosecutor can bring in the trial [21] but the benefits of AI go beyond the volumes of facts that can be documented. AI can enhance the ability to identify criminal actions by flexibly matching the patterns defined by prosecutors and extending their scope [3], anticipating this way the ability of criminals in hiding their actions by layering the stages that bring to exchanges and revenues. A basic requirement of AI is precision, that in information retrieval refers to the rate of relevant information within the set retrieved information. This requirement impacts how reliable will be considered AI predictions. However, high precision achieved at the cost of low generalization is not favorable, otherwise, the validity of the AI support will be very narrowed and over fitted to specific problem. This notion can be measured by the generalization error that rates how accurately an algorithm can predict outcome values for previously unseen data. Intuitively, larger is the domain of data that can be accurately handled by an algorithm better it generalizes. It has been observed that the generalization error may be significant when applying AI [19] to legal documentation. Either if the intelligence is achieved by expert systems, driven by explicit rules, by supervised learning, discriminating from large amounts of examples, or by the hybridization of these techniques, obtaining low generalization errors is challenging. Legal systems are, indeed, constructed on very abstract definitions and interpretations of facts, that can be hardly reduced to a restricted set of observations and that evolve over time following the concerns of the society. The juridical justifications that are provided to decide on a case are often constructed a-posteriori, based on the imputations to be supported and using a selective set of legislative provisions. This problem is, for example, observed in [26], where an average accuracy range from 58% to 68% is reported in predicting decisions of the European Court of Human Rights for future cases based on the cases from the past. As the authors note, the Court formulates the justifications in a way that is conducive to fit the conclusion. A same legal framework can be applied differently based on the conditions encompassing a case. A prosecutor has to verify specific facts, enquiries must address information in the scope of the probed events, all related actions must be transparently documented. Therefore, another key requirement for AI in criminal investigation is transparency. Following [13, 25] we highlight transparency in algorithmic decision-making systems is more than accountability. An accountable software is software we can observe, verify the tasks it executed, the systems, and the users it interacted with [12]. A transparent AI has to support the retrospective analysis of the decision process followed by the algorithms, decomposing it into the main elements that determined the final decision and allowing to backtrack the decision steps followed. In [3] we presented a method to support criminal investigation by operationalizing Enquiry Protocols. Prosecutors adopt protocols to identify the qualified sources, registers, and documents that can be exploited to pursue a crime, the information to be verified and integrated, and the formal stages to be followed during the prosecution. Using a set of data integration techniques, data sources can be organized in queryable Knowledge Graphs. World Wide Web Prosecutors can express a protocol as a set of subsequent operations over the data sources, referring to the historical cases they addressed. Each operation can then be translated into exact queries over the knowledge graph. An user interface can guide prosecutors along a workflow that allow applying exact queries and examining their results. However, an intrinsic limit of exact specifications is that a small variation in the structure of a data source may result in an unmatched occurrence. Using AI the exact knowledge of prosecutors can be generalized supporting the identification of patterns similar to the ones identified from their experience but differing in some respect. For example, a protocol could define a suspected drug dealer as a person traveling with an unusual number of baggage pieces, missing the fact he/she (...truncated)