Deception detection with machine learning: A systematic review and statistical analysis

PLOS ONE, Feb 2023

Several studies applying Machine Learning to deception detection have been published in the last decade. A rich and complex set of settings, approaches, theories, and results is now available. Therefore, one may find it difficult to identify trends, successful paths, gaps, and opportunities for contribution. The present literature review aims to provide the state of research regarding deception detection with Machine Learning. We followed the PRISMA protocol and retrieved 648 articles from ACM Digital Library, IEEE Xplore, Scopus, and Web of Science. 540 of them were screened (108 were duplicates). A final corpus of 81 documents has been summarized as mind maps. Metadata was extracted and has been encoded as Python dictionaries to support a statistical analysis scripted in Python programming language, and available as a collection of Jupyter Lab Notebooks in a GitHub repository. All are available as Jupyter Lab Notebooks. Neural Networks, Support Vector Machines, Random Forest, Decision Tree and K-nearest Neighbor are the five most explored techniques. The studies report a detection performance ranging from 51% to 100%, with 19 works reaching accuracy rate above 0.9. Monomodal, Bimodal, and Multimodal approaches were exploited and achieved various accuracy levels for detection. Bimodal and Multimodal approaches have become a trend over Monomodal ones, although there are high-performance examples of the latter. Studies that exploit language and linguistic features, 75% are dedicated to English. The findings include observations of the following: language and culture, emotional features, psychological traits, cognitive load, facial cues, complexity, performance, and Machine Learning topics. We also present a dataset benchmark. Main conclusions are that labeled datasets from real-life data are scarce. Also, there is still room for new approaches for deception detection with Machine Learning, especially if focused on languages and cultures other than English-based. Further research would greatly contribute by providing new labeled and multimodal datasets for deception detection, both for English and other languages.

Deception detection with machine learning: A systematic review and statistical analysis

PLOS ONE RESEARCH ARTICLE Deception detection with machine learning: A systematic review and statistical analysis Alex Sebastião Constâncio ID1☯*, Denise Fukumi Tsunoda1☯, Helena de Fátima Nunes Silva1‡, Jocelaine Martins da Silveira2‡, Deborah Ribeiro Carvalho3‡ 1 PPGGI, Universidade Federal do Paraná, Curitiba, State of Paraná, Brazil, 2 PPGPSI, Universidade Federal do Paraná, Curitiba, State of Paraná, Brazil, 3 PPGTS, Pontifı́cia Universidade Católica do Paraná, Curitiba, State of Paraná, Brazil a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Constâncio AS, Tsunoda DF, Silva HdFN, Silveira JMd, Carvalho DR (2023) Deception detection with machine learning: A systematic review and statistical analysis. PLoS ONE 18(2): e0281323. https://doi.org/10.1371/journal. pone.0281323 Editor: Muhammad Fazal Ijaz, Sejong University, REPUBLIC OF KOREA Received: June 15, 2022 Accepted: January 20, 2023 Published: February 9, 2023 Copyright: © 2023 Constâncio et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All Jupyter Lab Notebooks and BiblioAlly database files are available on the GitHub repository accessible at https://github.com/gambit4348/deceptiondetection-review-2022. ☯ These authors contributed equally to this work. ‡ HFNS, JMS, and DRC also contributed equally to this work. * Abstract Several studies applying Machine Learning to deception detection have been published in the last decade. A rich and complex set of settings, approaches, theories, and results is now available. Therefore, one may find it difficult to identify trends, successful paths, gaps, and opportunities for contribution. The present literature review aims to provide the state of research regarding deception detection with Machine Learning. We followed the PRISMA protocol and retrieved 648 articles from ACM Digital Library, IEEE Xplore, Scopus, and Web of Science. 540 of them were screened (108 were duplicates). A final corpus of 81 documents has been summarized as mind maps. Metadata was extracted and has been encoded as Python dictionaries to support a statistical analysis scripted in Python programming language, and available as a collection of Jupyter Lab Notebooks in a GitHub repository. All are available as Jupyter Lab Notebooks. Neural Networks, Support Vector Machines, Random Forest, Decision Tree and K-nearest Neighbor are the five most explored techniques. The studies report a detection performance ranging from 51% to 100%, with 19 works reaching accuracy rate above 0.9. Monomodal, Bimodal, and Multimodal approaches were exploited and achieved various accuracy levels for detection. Bimodal and Multimodal approaches have become a trend over Monomodal ones, although there are high-performance examples of the latter. Studies that exploit language and linguistic features, 75% are dedicated to English. The findings include observations of the following: language and culture, emotional features, psychological traits, cognitive load, facial cues, complexity, performance, and Machine Learning topics. We also present a dataset benchmark. Main conclusions are that labeled datasets from real-life data are scarce. Also, there is still room for new approaches for deception detection with Machine Learning, especially if focused on languages and cultures other than English-based. Further research would greatly contribute by providing new labeled and multimodal datasets for deception detection, both for English and other languages. Funding: The author(s) received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist. PLOS ONE | https://doi.org/10.1371/journal.pone.0281323 February 9, 2023 1 / 31 PLOS ONE Deception detection with machine learning: A systematic review and statistical analysis Introduction We aim to find out which Machine Learning techniques perform best for automatic deception detection, what kind of data they process, what is the source of that data, and what theoretical framework they have used. We also seek to understand their limitations and merits, and what remains to be explored. Therefore, this paper is not about Artificial Intelligence, Machine Learning, or deception detection. Instead, it is a literature review on deception detection with Machine Learning. Our intention is not to go deep into either deception detection or Machine Learning. Instead, our focus is on selecting and scrutinizing research papers on the application of Machine Learning for deception detection. For this study, we define both “deceiving” and”lying” as the intentional act of making the interlocutor believe in something the deceiver considers false [1]; it is a conscious and deliberated act, perpetrated by the deceiver [2]. However, a false information believed to be true by the emitter is not considered deceptive. Lying is a frequent and pervasive social phenomenon [3]. While some forms may be accepted as a “social lubricant” [4], others are socially harmful. Telling (and being told) lies is frequent but perceiving them is a major challenge for most people. The average person has a lie detection rate around 54% [5, 6], rarely reaching 60%, and sometimes falling below 50% [7]. Nevertheless, some individuals show a remarkable ability for spotting deceptions, with a detection accuracy above 90%. Referred to as “Wizards of deception detection” [5], these individuals demonstrate that lies can be detected. Such “wizards”, however, are not numerous. Machine Learning has been successfully applied to a large number of fields and functions, such as document classification, computer vision, natural language processing, protein structure prediction, fraud and malware detection [8], medical diagnosis and data privacy [9], network and data transmission security [10], intrusion detection [11], generative molecular design [12], and recommendation systems, among others [13]. Also, it offers a vast set of techniques, providing several opportunities to approach various problems. Seeing Machine Learning applied to deception detection is not surprising. We noticed many studies on deception detection aided by Machine Learning have been published in the last decade. Those report different approaches and results, a rich and bulky corpus of knowledge is available. The results, however, suffer from large variance, with a diversity of settings, techniques, complexities, and strategies based on several theoretical frameworks. Identifying trends, gaps, and research opportunities may be challenging. Due to the diversity of studies and the difficulty of establishing a general state of technology on deception detection with Machine Learning, we felt stimulated to formulate the following research questions: a) W (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0281323&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281323

Alex Sebastião Constâncio, Denise Fukumi Tsunoda, Helena de Fátima Nunes Silva, Jocelaine Martins da Silveira, Deborah Ribeiro Carvalho. Deception detection with machine learning: A systematic review and statistical analysis, PLOS ONE, 2023, Volume 18, Issue 2, DOI: 10.1371/journal.pone.0281323