Scientific workflows for process mining: building blocks, scenarios, and implementation (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs10009-015-0399-5.pdf

Scientific workflows for process mining: building blocks, scenarios, and implementation

Int J Softw Tools Technol Transfer DOI 10.1007/s10009-015-0399-5 SW Scientific workflows for process mining: building blocks, scenarios, and implementation Alfredo Bolt1 · Massimiliano de Leoni1 · Wil M. P. van der Aalst1 © The Author(s) 2015. This article is published with open access at Springerlink.com Abstract Over the past decade process mining has emerged as a new analytical discipline able to answer a variety of questions based on event data. Event logs have a very particular structure; events have timestamps, refer to activities and resources, and need to be correlated to form process instances. Process mining results tend to be very different from classical data mining results, e.g., process discovery may yield end-to-end process models capturing different perspectives rather than decision trees or frequent patterns. A process-mining tool like ProM provides hundreds of different process mining techniques ranging from discovery and conformance checking to filtering and prediction. Typically, a combination of techniques is needed and, for every step, there are different techniques that may be very sensitive to parameter settings. Moreover, event logs may be huge and may need to be decomposed and distributed for analysis. These aspects make it very cumbersome to analyze event logs manually. Process mining should be repeatable and automated. Therefore, we propose a framework to support the analysis of process mining workflows. Existing scientific workflow systems and data mining tools are not tailored towards process mining and the artifacts used for analysis (process models and event logs). This paper structures the basic building blocks needed for process mining and describes various analysis scenarios. Based on these requirements we implemented RapidProM, a tool supporting scientific workflows for process mining. Examples illustrating the different scenarios are provided to show the feasibility of the approach. B Wil M. P. van der Aalst 1 Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands Keywords Scientific workflows · Process mining · Large scale process analysis · RapidProM 1 Introduction Scientific Workflow Management (SWFM) systems help users to design, compose, execute, archive, and share workflows that represent some type of analysis or experiment. Scientific workflows are often represented as directed graphs where the nodes represent “work” and the edges represent paths along which data and results can flow between nodes. Next to “classical” SWFM systems such as Taverna [23], Kepler [33], Galaxy [20], ClowdFlows [27], and jABC [40], one can also see the uptake of integrated environments for data mining, predictive analytics, business analytics, machine learning, text mining, reporting, etc. Notable examples are RapidMiner [22] and KNIME [4]. These can be viewed as SWFM systems tailored towards the needs of data scientists. Traditional data-driven analysis techniques do not consider end-to-end processes. People are process models by hand [e.g., Petri nets, UML activity diagrams, or Business Process Modeling Notation (BPMN) models], but this modeled behavior is seldom aligned with real-life event data. Process mining aims to bridge this gap by connecting end-toend process models to the raw events that have been recorded. Process-mining techniques enable the analysis of a wide variety of processes using event data. For example, event logs can be used to automatically learn a process model (e.g., a Petri net or BPMN model). Next to the automated discovery of the real underlying process, there are processmining techniques to analyze bottlenecks, to uncover hidden inefficiencies, to check compliance, to explain deviations, to predict performance, and to guide users towards “better” 123 A. Bolt et al. processes. Hundreds of process-mining techniques are available and their value has been proven in many case studies. See for example the twenty case studies on the webpage of the IEEE Task Force on Process Mining [24]. The open source process mining framework ProM [58] provides hundreds of plug-ins and has been downloaded over 100,000 times. The growing number of commercial process mining tools (Disco, Perceptive Process Mining, Celonis Process Mining, QPR ProcessAnalyzer, Software AG/ARIS PPM, Fujitsu Interstage Automated Process Discovery, etc.) further illustrates the uptake of process mining. For process mining typically many analysis steps need to be chained together. Existing process mining tools do not support such analysis workflows. As a result, analysis may be tedious and it is easy to make errors. Repeatability and provenance are jeopardized by manually executing more involved process mining workflows. This paper is motivated by the observation that tool support for process mining workflows is missing. None of the process mining tools (ProM, Disco, Perceptive, Celonis, QPR, etc.) provides a facility to design and execute analysis workflows. None of the scientific workflow management systems including analytics suites like RapidMiner and KNIME support process mining. Yet, process models and event logs Fig. 1 Overview of the framework to support process mining workflows are very different from the artifacts typically considered. Therefore, we propose the framework to support process mining workflows depicted in Fig. 1. This paper considers four analysis scenarios where process mining workflows are essential: – Result (sub-)optimality Often different process mining techniques can be applied and a priori it is not clear which one is most suitable. By modeling the analysis workflow, one can just perform all candidate techniques on the data, evaluate the different analysis results, and pick the result with the highest quality (e.g., the process model best describing the observed behavior). – Parameter sensitivity Different parameter settings and alternative ways of filtering can have unexpected effects. Therefore, it is important to see how sensitive the results are (e.g., leaving out some data or changing a parameter setting a bit should not change the results dramatically). It is important to not simply show the analysis result without having some confidence indications. – Large-scale experiments Each year new process mining techniques become available and larger data sets need to be tackled. For example, novel discovery techniques need to be evaluated through massive testing and larger event Analysis scenarios for process mining Result (sub-) optimality Large-scale experiments Repeating questions Event data transformation Process model extraction Process model and event data analysis Add data to event data (AddED) Import process model (ImportM) Analyze process model (AnalyzeM) Filter event data (FilterED) Discover process model from event data (DiscM) Evaluate process model using event data (EvaluaM) Select process model form collection (SelectM) Compare process mode (...truncated)