Anomaly detection in business processes using process mining and fuzzy association rule learning

Sarno, Riyanarto; Sinaga, Fernandes; Sungkono, Kelly Rossa

doi:10.1186/s40537-019-0277-1

Research
Open access
Published: 09 January 2020

Anomaly detection in business processes using process mining and fuzzy association rule learning

Riyanarto Sarno¹,
Fernandes Sinaga¹ &
Kelly Rossa Sungkono¹

Journal of Big Data volume 7, Article number: 5 (2020) Cite this article

11k Accesses
39 Citations
Metrics details

Abstract

Much corporate organization nowadays implement enterprise resource planning (ERP) to manage their business processes. Because the processes run continuously, ERP produces a massive log of processes. Manual observation will have difficulty monitoring the enormous log, especially detecting anomalies. It needs the method that can detect anomalies in the large log. This paper proposes the integration of process mining, fuzzy multi-attribute decision making and fuzzy association rule learning to detect anomalies. Process mining analyses the conformance between recorded event logs and standard operating procedures. The fuzzy multi-attribute decision making is applied to determine the anomaly rates. Finally, the fuzzy association rule learning develops association rules that will be employed to detect anomalies. The results of our experiment showed that the accuracy of the association rule learning method was 0.975 with a minimum confidence level of 0.9 and that the accuracy of the fuzzy association rule learning method was 0.925 with a minimum confidence level of 0.3. Therefore, the fuzzy association rule learning method can detect fraud at low confidence levels.

Introduction

Many corporations worldwide use an enterprise resource planning (ERP) system to manage their business process, which continuously changes due to dynamic business requirements [1]. Because the processes run continuously, ERP produces a considerable log of processes. Manual observation will have difficulty monitoring the sizeable log, especially detecting anomalies. It needs the method that can detect anomalies in the huge log.

Standard business processes are usually incorporated into standard operating procedures (SOP), which are used as a reference to find any deviations. Deviations or anomalies in the business process can be caused by variations or operation errors [2]; however, some of the anomalies may be the result of fraudulent behaviours [3]. Fraud can be committed in many ways and can lead to significant losses. In 2012, the Association of Certified Fraud Examiners (ACFE) reported that there had been 1.388 fraud cases in 96 countries, which have incurred US$1.4 billion in losses [4]. On average, organizations have lost a gross profit of 7% per year to fraud [5]. This forces companies to instate strong security policies and an information system for fraud detection.

Analysis in the domain of process mining and data mining provides solutions for anomaly detection, which can be used for fraud detection. In previous research, we have investigated process mining for minimizing internal fraud in business processes [6]. In this research, several process mining methods were applied, such as conformance checker, dotted-chart analysis, social network miner, originator by task matrix and others, to investigate event logs of business processes [7,8,9].

Data mining analyses input data to construct a model or a pattern as output, which can be used to detect anomalies in the process under examination [10]. Several methods of data mining, such as decision tree, neural network, bayesian network and support vector machine have been implemented in previous researches [10,11,12,13] to identify cases of fraud. However, these methods still have weaknesses in detecting fraud since they are not able to analyse the behaviour of control flow in the business process. Another research supporting fraud detection used association rule learning (ARL) to extract association rules from transaction data, where ARL was applied to develop association rules related to fraudulent behaviours [14, 15].

In our previous research [6], only fraud with a high confidence level could be detected. In this paper, a method is proposed that can detect fraud with a low confidence level and a low intensity based on a certain threshold. The proposed method integrates process mining, fuzzy multi attribute decision making and fuzzy association rule learning to detect anomalies in a business process.

Related work

Types of Fraud

Fraud is a misuse of an organizational system [16]. The concept of the fraud triangle explains that frauds occur because of three things, i.e. pressure or coercion, opportunity and rationalization. When attempting to detect fraud in a business process, internal control can be used as a counter measure towards fraud that may occur [17]. The SOP for a business process should include a standard business process model, time record, resource, organization role and decision-making. The complete SOP can be used as a reference to detect anomalies in a running process and existing data that may contain fraud. Analyzing the anomalies in a business process can be done using process mining techniques [18,19,20,21,22].

Process-based fraud (PBF) refers to fraud occurring in business processes [6]. In a previous research concerning PBF, we have identified attributes and patterns in order to describe PBF [15]. The following six types of anomaly attributes or fraudulent behaviours in business processes can be distinguished.

Skipped activity

As its name implies, an anomaly is an activity that is skipped according to the SOP. Skipped Activity can be divided into two types, i.e. skipped sequence, for a skipped activity occurring in a sequence, and skipped decision, for a skipped activity occurring in a split decision activity.

Wrong throughput time

Wrong throughput time is a condition when an activity is performed faster or slower than the time limit as stated in the SOP. It is divided into two types: wrong throughput time min and wrong throughput time max.

Wrong resource

Wrong resource is a situation when an activity is not executed by an authorized employee in accordance with his or her role allowed by the SOP.

Wrong duty

Wrong duty is a condition when an employee performs two or more different activities in one running process. This type is divided into three types: Wrong duty sequence (occurring in sequence activity), wrong duty decision (occurring in decision activity) and wrong duty combine (occurring in sequence and decision activity).

Wrong pattern

Wrong pattern is a situation when a wrong activity sequence occurs that does not conform with the sequence of activities as stated in the standard business process.

Wrong decision

Wrong decision is a condition when a decision is made that does not conform to the decision-making process stated in the SOP.

To detect the anomaly attributes, four process mining analyses can be executed: control flow analysis, role resource analysis, throughput time analysis, and decision point analysis. Control flow analysis can be done using a manual analysis or with assistance of plug-ins in ProM. This analysis is crucial for the detection of fraud in the form of skipped activity and wrong pattern. Manually, the analysis is done by searching the event log using a process searching algorithm. fuzzy miner, which compares a fuzzy model to the standard business process, is the algorithm recommended for this searching process. However, this algorithm has a limitation in that it is relies heavily on the determination of the threshold value [23,24,25]. In addition to the manual method, this analysis can be done using the conformance checking plug-in in ProM, resulting in values for fitness, precision, and structure. These values can be used to measure the equality between a running process and a standard business process. The purpose of the control flow analysis is to measure the equality and difference between event logs resulted from a running process and a standard business process model. In this case, different parts in the running process can be suspected as anomalous. The result is in the form of fitness values revealing fraudulent behaviour.

Role resource analysis can be performed using the social network miner plug-in in ProM. Then, the role attribute of each event in a running process can be compared to the roles present in the SOP to obtain the probability of fraud occurrence in terms of its resources.

Throughput time analysis can be done by measuring the time interval between activities. This interval is measured from the start time stamp to the completed time stamp. The time of implementation of an activity can then be compared to the SOP in terms of the application time.

Decision point analysis is done by finding out the existence of a specific case as a result of decision-making in a business process. Detecting anomalies can be done by building a relational database and do a query for that specific case.

Case study and fraudulent issues

In this paper, we would like to provide an example of the occurrence of fraud in the credit application process in a bank as depicted in Fig. 1. The first executed activity is receive application, i.e. the activity of receiving the credit application. The data received are in the form of a rules and regulations document required for credit applications.

Check completeness is the activity of checking the completeness of the rules and regulations document provided by the creditor. If the rules and regulations document is not complete, the give info activity will be executed to give information to the creditor in order for him or her to complete the document. This activity can be executed repeatedly until the document is fully completed. Once it is completed the process moves on to the check SID activity.

The check SID activity is done by the system to check the credit application history of the creditor. If a creditor has ever submitted a credit application, the process moves on to loan decision and check loan type. Check loan type, collateral verification locate, collateral local Government, collateral office and complete verification are done to check the collateral owned by the creditor in accordance with the credit type proposed. Plafond estimation is used to estimate the amount of disbursed credit; this depends on the collateral check.

The check overrate activity is used to determine the further activities that need to be executed, in this case to determine whether the decision-making is executed by the director or to the leader. This depends on the amount of the credit applied. Further, loan decision is an activity that of division of the bank but in fact was done by a staff member. Another example is document validation, which is supposed to be done within 1 week but in fact was done after more than 1 week. Another example is in an activity that issues a branching output to more than one activity, such as in check overrate, where a credit application with an amount of over 500 million Rupiah should not be executed through director authorization but through leader authorization.

Proposed method

The training section in the proposed method is implemented in three steps, which are conformance checking, fuzzy multi attribute decision making and fuzzy association rule learning. The conformance checking, which is part of process mining, is applied to detect anomalies in the process business. The fuzzy multi attribute decision making is used to determine the anomaly rates. Finally, the fuzzy association rule learning develops rules which will be used to detect anomalies in the testing phase.

Skipped activity analysis

Activities that were wrongly executed may emerge in the event log, e.g. a skipped activity. This will lead to the presence of an anomaly in that activity. For example, the completeness verification is supposed to be done by the head out whether there is a skipped activity or any other activity not in line with the standard business process model. This analysis is done with the conformance checker plug-in in ProM, which was modified to give the number of skipped activities. The input of this analysis is the standard business process in the form of Petri nets and event logs. This analysis generates anomaly data for the skipped sequence and skipped decision attributes.

Wrong pattern analysis

In this part, pattern analysis of the event logs is done by comparing the sequence of activities to the standard business process model. If there is a case with an activity that is not done in line with the model, it will be marked as wrong pattern. This analysis is done with the conformance checker plug-in in ProM.

Wrong throughput time analysis

In this part, an analysis of the execution times of all activities in the event logs is made by comparing them with the execution time in the standard model. If the execution time is not in line with the standard model, being either too short or too long, it will be marked as an anomaly. This analysis is done with the conformance checker for attributes plug-in in ProM.

Wrong resource analysis

The analysis in this part is done for each actor who executes an activity recorded in the event logs using the conformance checker for attributes plug-in in ProM. If there is an activity that is executed by an unauthorized actor towards that activity according to the standard model, it will be marked as wrong resource.

Wrong duty analysis

The analysis in this part is done to see whether there is an actor who violates the segregation of duty as defined in the standard model. Anomalies have the form of two or more activities conducted by one actor at once. This analysis is done using the conformance checker for attributes plug-in in ProM.

Wrong decision analysis

In this part, activities involving decision-making or event branching are analysed. To conduct a wrong decision analysis, the event logs should first be changed into ontology-based event logs to facilitate doing a SPARQL ontology query. This analysis generates anomaly data for the wrong decision attribute.

All obtained anomalies are trained using fuzzy association rule learning. This procedure consists of two processes. It starts with the calculation of the anomaly rates for each case. This process is done by using fuzzy multi attribute decision making. The inputs for this process are the anomaly occurrences and expert assessments. There are two kinds of values that are calculated in this process. First, the importance weight of the anomaly attributes. This value shows the importance of the anomaly attributes according to assessment of experts. Second, the anomaly attribute occurrence rate. This value shows the occurrence rate of each anomaly attribute. Then, with these two values, all cases with an anomaly rate are trained using fuzzy association rule Learning. This process generates the association rules among the anomalous attributes.

Fuzzy multi attribute decision making

This method is used to determine the anomaly rates from a set of anomalies that occurred in a process. The determination of the anomaly rates is done using a combination of two concepts, i.e. fuzzification and multiple attribute decision making (MADM). MADM can be used to select one alternative out of a set of alternatives marked with several attributes [24]. However, MADM still has weaknesses in handling inaccurate or linguistic information. Hence, it is necessary to add fuzziness to support the handling of linguistic information.

Two data are required for determining the anomaly rates, i.e. an importance assessment of the PBF attributes by experts and the anomaly occurrences resulted from the conformance checking. Both data are converted into fuzzy numbers based on the table of importance weights and the level of membership. The importance weights of the anomaly attributes assessed by experts can be seen in Table 1, which uses the sample from the function of the anomaly attribute membership can be seen in Fig. 2.

Table 1 Importance level of anomaly attributes

Anomaly detection in business processes using process mining and fuzzy association rule learning

Abstract

Introduction

Related work

Types of Fraud

Skipped activity

Wrong throughput time

Wrong resource

Wrong duty

Wrong pattern

Wrong decision

Case study and fraudulent issues

Proposed method

Skipped activity analysis

Wrong pattern analysis

Wrong throughput time analysis

Wrong resource analysis

Wrong duty analysis

Wrong decision analysis

Fuzzy multi attribute decision making

Fuzzy association rule learning

Detecting fraud with association rule data

Results and discussion

Evaluation design

Results

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords