Big knowledge-based semantic correlation for detecting slow and low-level advanced persistent threats

Targeted cyber attacks, which today are known as Advanced Persistent Threats (APTs), use low and slow patterns to bypass intrusion detection and alert correlation systems. Since most of the attack detection approaches use a short time-window, the slow APTs abuse this weakness to escape from the detection systems. In these situations, the intruders increase the time of attacks and move as slowly as possible by some tricks such as using sleeper and wake up functions and make detection difficult for such detection systems. In addition, low APTs use trusted subjects or agents to conceal any footprint and abnormalities in the victim system by some tricks such as code injection and stealing digital certificates. In this paper, a new solution is proposed for detecting both low and slow APTs. The proposed approach uses low-level interception, knowledge-based system, system ontology, and semantic correlation to detect low-level attacks. Since using semantic-based correlation is not applicable for detecting slow attacks due to its significant processing overhead, we propose a scalable knowledge-based system that uses three different concepts and approaches to reduce the time complexity including (1) flexible sliding window called Vermiform window to analyze and correlate system events instead of using fixed-size time-window, (2) effective inference using a scalable inference engine called SANSA, and (3) data reduction by ontology-based data abstraction. We can detect the slow APTs whose attack duration is about several months. Evaluation of the proposed approach on a dataset containing many APT scenarios shows 84.21% of sensitivity and 82.16% of specificity.

tricks such as code injection to do any malicious activities through a trusted process, and conceal any footprint and abnormalities. To detect such malicious activities it is necessary to use fine-grained event interception in the endpoint systems. Since finegrained interception leads to huge numbers of events, detection approaches use short time-window to correlate the events. Slow APTs use some tricks to increase the time of attacks and escape from short time-window of intrusion detection and alert correlation systems. Hence, security information and event management approaches which collect and correlate the logs and alerts generated by different tools (e.g., antivirus, firewall, UTM, network intrusion detection system, network device, and operating system) are vulnerable against the APT attacks for the following reasons: • Due to the processing limitation, the generated and correlated logs are mostly coarse-grained. The coarse-grained logs lead to data loss and lack of precise correlation between the low-level operating system events with the network events and rebuilding the attack vectors. • Due to the processing limitation, the available solutions use short-time windows for correlating the alerts, and hence they are vulnerable to detect slow attacks and long-term attack vectors.
As a result, the purpose of this paper is to solve these problems in practice by proposing a big Knowledge-based semantic correlation engine for detecting slow and low-level APTs, which are the most sophisticated APTs. To this aim, we enhance our previous solution proposed in [16] to detect slow APTs, other than low-level and hybrid APTs. Therefore, the contributions of our proposed approach in this paper are as follows: • Since, detecting low-level APT attacks needs processing large event logs, and detecting slow APTs makes the processing problem much harder, one of our contributions is to propose an approach to detect both low-level and slow APT attacks. • Using a long sliding window for detecting the slow APTs. We propose a Vermiform sliding window to analyze and correlate system events instead of using a fixed-size time-window. • Using Scalable Semantic Analytics Stack (SANSA) [17] as a big inference engine based on Spark for scalable semantic correlation. • Although SANSA is a good inference engine for processing huge number of events, its processing power is limited. We use event abstraction concept to reduce the number of events, to speed up the inference time, and to detect the very slow APTs (whose attack duration is several years instead of several months). By abstracting the old events, we consider them as a history in the detection process instead of being disposed by the movement of the timing window.
The rest of the paper is organized as follows. "Preliminaries" section describes the necessary preliminaries which are used in the paper. The characteristics of APTs, related works, and the formal definition of the problem are described in "Background and problem statement" section. The proposed approach is discussed in "Proposed

Preliminaries
In this section, we define the basic terms and concepts that are used in other sections of this paper. The most basic terms of this section are retrieved from our previous work [16], which proposes an approach to detect hybrid and low-level APTs. The summary of the defined symbols in this paper is presented in "List of the symbols used in the paper" section.
Since the proposed approach in this paper employs description logic [18], and Ontology web language-description logic (OWL-DL), we have defined the syntax and semantics of part of description logic for the readers who are not familiar with these concepts in "Syntax and semantics of description logic" section.

Definition 1
An event occurs when a subject acts on an object in a specific time or period [16].
More formally, event e i ∈ Event I is defined by a quadruple as follows: where s i is a subject such as a user or a process, o i is an object such as a socket or file, a i is an action such as reading (R) or writing (W), and t i is the timestamp of the event occurrence.
Since in this paper event is considered as a concept in system ontology and the languages provided for ontology specification allow only using unary and binary predicates, we should specify the properties of an event e i with four binary relations as follows [16].
In the rest of the paper, for the sake of simplicity, we define the event and its properties as a quadruple.
According to the ontology specified in the section "Semantic correlation", concept Subject includes four subject types Thread, Process, User, and Host as follows: Also, function time : Event I −→ N specifies the timestamp of an event, and is defined as follows: Similarly functions subject, object, and action specify the subject, the object, and the action of an event respectively as follows: (1) ∀e i ∈ Event I , e i = �s i , o i , a i , t i �, s i ∈ Subject I , o i ∈ Object I , a i ∈ Action I , t i ∈ N, e i = �s i , o i , a i , t i � ⇐⇒ �e i , s i �, �e i , o i �, �e i , a i �, �e i , t i �.
(3) ∀ e i ∈ Event I , e i = �s i , o i , a i , t i � → time(e i ) = t i .
Definition 2 Frame f : Event × N → N specifies the number of events in a specific event set which have a specific timestamp.
In other words, f (E i , t i ) = n i means, the number of events in E i where e i ∈ E i ∧ time(e i ) = t i is equal to n i .
For example, if E i = {�s 1 , o 1 , a 1 , t 1 �, �s 2 , o 2 , a 2 , t 2 �, �s 3 , o 3 , a 3 , t 1 �} then f (E i , t 1 ) = 2 and f (E i , t 2 ) = 1. Definition 3 Two events e i and e j are related to each other and denoted by e i ∼ e j if there are specific relations between their properties and t i ≤ t j [16]. This relation can be modeled by a directed acyclic graph, which is shown in Fig. 1.
To detect malicious activities, it is necessary to define the system security policy. The security policy is defined as follows.
Definition 4 (Security policy [16]) Security policy (SP) is defined as SP ⊆ Subject I × Object I × Action I , which determines the set of all unauthorized events in the system. Any policy rule p i = �s i , o i , a i � in SP shows that subject s i is not authorized to do action a i on object o i at any time. [16]) The occurrence of an event set ES ( ES ⊆ Event I ) in a system causes the explicit violation of security with regard to security policy SP, if and only if, Definition 6 (Implicit violation [16]) The occurrence of an event set ES ( ES ⊆ Event I ) in a system causes the implicit violation of security with regard to security policy SP, if and only if, where function I : P(Event I ) −→ P(Event I ) specifies the set of all events that are occurred implicitly following the execution of another event set. As an example, for

Definition 5 (Explicit violation
object(e i ) = object(p j ) ∧ action(e i ) = action(p j ).  Figure 2. As shown in this figure, the occurrence of a sequence of events (ES) cause to object o 1 be read by subject s 2 indirectly. In other words, event set ES cause event e k = �s 2 , o 1 , r, t 3 � occur implicitly. The process of indirect access detection is discussed in the proposed approach in "Expanding: Knowledgebased Inference" section. Definition 7 (Attack vector) An attack vector ν i is a set of events ( ν i ⊆ Event I ) that has the following three characteristics: • Malicious: An attack vector ν i is malicious if it violates the security policies implicitly or explicitly. • Minimal: An attack vector ν i is minimal if the exclusion of any event e i from ν i reduces the maliciousness of ν i [16]. Suppose that function ζ : Event I → N shows the value of maliciousness of an event set, then E i ⊆ Event I is more malicious than E j ⊆ Event I , if and only if, ζ(E i ) > ζ(E j ) [16]. In other words the minimality means: • Connected: An attack vector ν i is connected, if the relations between all the events of ν i construct a connected directed acyclic graph. In other words: Set ν is defined as the set of all attack vectors.

Background and problem statement
As discussed in the introduction, there are various definitions to describe the APTs. In this section, the characteristics of the APTs and the problem to be solved in this paper are defined.

APT characteristics
According to our survey on the behavior and anatomy of nearly 70 real APTs, which are reported by Kaspersky Targeted Cyberattacks Logbook [15], the APTs can be defined by the following characteristics: Special-purpose: Since the intruders have sensitive information about the victim's infrastructure, the behaviors of APT attacks are somewhat intelligent. This characteristic means that an APT that is malicious in one infrastructure might be completely benign in another. For example, Stuxnet [19] is an instance of a special-purpose APT, which is malicious after satisfying certain conditions in the victim's infrastructure (e.g., detecting special patterns in centrifuge falls of victim's industrial infrastructure), but it is nearly benign in the system of a normal user.
Slow: Since existing security mechanisms use short time-windows (about a few minutes), some APTs (e.g., ProjectSauron APT [20]) abuse this weakness to bypass the detection methods. In this case, the intruders take advantage of some tricks such as using wake-up and sleep functions to distribute their attack vectors in several time-windows (about several months). Note that, in real conditions, the attack duration cannot last very long (e.g., several years); because the software migration in the victim's infrastructure can cause the attack to fail.
Low-level: In low-level APTs, the explicit violation of security policy is not probable and the attacker usually violates the security policy implicitly by some methods, including: • Using trusted events and agents to perform malicious activities: this method takes the advantages of some techniques such as malicious code injection into trusted applications (e.g., Gauss APT [15]), or using stolen digital certificates (e.g., Stuxnet APT [15]), or using genuine recognized removable media to bypass the data loss prevention (DLP) system (e.g., Project Sauron APT [20]), and human errors to infiltrate the victim's system. • Performing the malicious actions gradually: some APTs (e.g., Carbanak APT [15]), especially the malware that use data exfiltration, steal the sensitive data gradually to hide from intrusion and anomaly detection systems. For example, to exfiltrate 1 GB of data from the victim's system, the malware breaks the data into several tiny parts (e.g., less than 1 MB) and exfiltrates them slowly in several days.

Multi-step:
In multi-step APTs (e.g., Flame APT [15]), the attack vector is divided into several steps, and activation of each step depends on the success of the previous steps. In these cases, the main challenge is detecting the relations of the steps and constructing the primary attack vector. Distribution: In such threats, the intruders distribute the malware attack vectors in several sub-vectors and sub-vectors are executed by different subjects (e.g., different processes, and in some cases by different hosts). In such cases, communications between malware subjects are established through inter-process communication (IPC). Also, these malwares try to obfuscate the dependencies between the sub-vectors using fake and unrelated events within actual events. The main challenge is to identify the actual events semantically, remove the fake events, and summarize the behavior.
Hybrid: Since most intrusion detection and alert correlation systems do not correlate operating system events with network events, the intruders use a combination of both event types to bypass the detection mechanisms. For example, some APTs (e.g., Stuxnet, Hacking Team RCS, and ProjectSauron APTs [15]), for lateral movement in air-gapped networks, use removable media to spread the malwares from the Internet to local networks.
It is important to note that most APTs have only some of the six mentioned characteristics (especially low and slow features) and a few sophisticated APTs (e.g., Project Sauron APT [20]) have all of the six characteristics.
Brogi et al. [39] proposed an APT detector called TerminAPTor, which tracks the information flows between the operating system processes. This approach intercepts events of a network system for two months and collects 3.2 billion events and 7.4 attacks per day. The main drawbacks of this approach are the lack of evaluation by some wellknown APTs and the existence of high false positive alerts.
Ghafir et al. [40] proposed a machine learning-based system called MLAPT, which can detect APTs in real-time in three phases. In the first phase, the system analyzes the network traffic and generates some alerts based on some malicious patterns. In the next phase, the generated alerts of the first phase are correlated, and in the third phase, a machine-learning-based prediction is used for APT prediction. Again this approach has not been evaluated against the well-known APTs and it cannot detect the hybrid APTs.
In [16], a general approach is proposed to detect multi-step, hybrid, and low-level APTs. This approach is based on a knowledge-based system, i.e., the ontology of the operating system and network entities, low-level interception, and inference over the security policies and event relationships. The correlation between the operating system and network events in this approach is done based on the semantic relations of the entities, which are defined in the system ontology. In this approach, malicious behaviors and implicit violation of security policies are detected by deduction based on the existing knowledge of the occurred events and various relations between the entities of the machines and network. The main drawback of this approach is its weakness in detecting slow APTs. Since this approach is based on event correlation (instead of alert correlation) and uses ontology and inference engine, suffers from high processing overhead.
However, we believe this approach is the best available solution for detecting low-level and hybrid APTs.
Mohamed et al. [38] proposed an approach based on adversarial tactics techniques and a common knowledge matrix for detecting advanced persistent attacks. This approach focuses on detecting APT attacks in their first steps of malicious activities and they managed to reduce the detection time of the attack from several months to several minutes.

Problem statement
As discussed in the introduction, attack vectors of APTs have several characteristics such as special purpose, low-level, hybrid, multi-step, slow, or distributed. Since all these characteristics are not necessarily held in one APT, we just focus on the two most important characteristics, which make detection more difficult; low-level and slow APTs. Therefore, the problem is finding an approach ϕ : P(Event I ) → P(ν) to detect attack vectors such as ν i from a set of intercepted events such as ES ⊆ E , in which the following conditions are held:

Proposed approach
As we mentioned in previous sections, since the most sophisticated APT attacks are lowlevel and slow, and these two characteristics make detection difficult for intrusion detection and alert correlation systems, the purpose of this paper is to detect this type of APT attack. In our approach, we enhance our previous solution proposed in [16] to detect slow APT attacks other than the low-level ones. Our approach takes the advantages of event correlation (instead of alert correlation) and using the ontology of operating (9) ∧ � ∃e k , e k ∈ ν i ∧ subject(e k ) = subject(e i )∧ object(e k ) = object(e i ) ∧ action(e k ) = action(e i )).
system and network entities, which are specified in this section ("Semantic correlation" section). Since the number of events and event relations significantly increases during the time, using semantic correlation leads to massive processing overhead for detecting slow attacks. The other purpose of this paper is to solve this problem.
The architecture of the proposed approach and the process of detecting malicious attack vectors like ν i are shown in Figs. 3 and 4, respectively. In our approach, on the client side, the operating system and network events are deeply intercepted, normalized, and sent to the server side. Afterward, in step 3, on the server side, we can detect the low-level attacks by using ABox, TBox, RBox, and an inference engine. Since the number of intercepted events is very big in slow APTs, we cannot use Protégé-OWL [42] as used in [16] for processing and inference. To overcome this problem, we use a scalable inference engine called SANSA [17], which can analyze a big size of ABox and TBox using Spark. Although SANSA is a good inference engine for processing big-size events, its processing power is limited. Therefore, in step 3, we use Event Abstraction concept to reduce the number of events, speed up the inference time, and detect very slow APTs (whose attack duration is more than one year). In the Event Abstraction process, the old events are considered as an abstracted history instead of being completely disposed of. Finally, in step 4, we can detect the violation of security policy based on the inferred data in the previous steps and the high-level user-defined security policies.
The components and concepts that are used in the proposed architecture are explained in the rest of this section. Since our approach uses semantic correlation to detect low-level APTs, at first we explain the semantic correlation concepts and its limitation.

Semantic correlation
The main concepts of semantic correlation for detecting APT attacks which are retrieved from [16] are as follows: 1 Knowledge Base or KB: In semantic correlation for detecting APT attacks, we employ a knowledge base consisting of the following three boxes: • TBox: This box defines the system ontology and the relations between the system entities. For example, the ontology of Windows operating system is shown in Fig. 5. As shown in this figure, class Object consists of three subclasses KernelObj, Use-rObj, and GDIObj. For another example, subject Thread is a part of subject Process. Function ma : Subject I −→ P(Object I ) determines the objects that are written explicitly or implicitly or deleted by a specific subject. For example o i ∈ me(s i ) means subject s i has read object o i , or o j ∈ ma(s i ) means object o j has been written by subject s i . These two functions are defined for detecting the violation of confidentiality and integrity, respectively. Security Policy (PStore): The security policy, which is defined in "Preliminaries", is stored in PStore. It is necessary to note that by using the ontology, we can define high-level and more abstract security policies and then infer the low-level security policies. The general format of security policy is shown in Algorithm 1.
Algorithm 1: General format of high-level security policy [16] For example, in Fig. 2, the main security policy is o 1 / ∈ me(s 2 ) , which means subject s 2 should not LNO ⊑ Object and PNS ⊑ Subject In this scenario, data can be exfiltrated using different approaches (e.g., through network buffer or USB drive or CD-ROM). Since we use system ontology, it is not necessary to define several security policies, because all data transmission devices (e.g., network buffer or USB drive or CD-ROM) are a type of PNO objects. For more details about security policy, readers are referred to [16].
Event Relations: Two events can be related to each other by the relations between their subjects, objects, or actions. For example, relation e i ∼ e j , which is defined for two events e i and e j , is stored in this sub-box. This sub-box contains the events that are related to each other based on some relation rules. The relation rules are described in the rest of this section in RBox subsection.
• RBox: This box consists of two sub-boxes as follows: Relation Rules: As mentioned before, two events can be related to each other based on their subjects, objects, or actions. All types of relations rules are described in Table 1. For example, as shown in this table, relation e i wr ∼ e j means a i = W or Write and a j = R or Read. According to this table, we can define approximately 500 relation rules for event correlation (precisely (3 + 1) × 4 × (6 × 6) rules which some are meaningless). For example, relation rule e i tewrip ∼ e j is equal to three event relations e i te ∼ e j , e i wr ∼ e j , and e i ip ∼ e j . Indirect Access Rules: Some event relations result in indirect change to the value of me and ma of subjects (e.g., as shown in Fig. 2). The related rules, which are used to detect indirect changes to the value of me and ma, are defined as Indirect Access Rules. These rules, which can be used for detecting low-level APTs, are defined in "Expanding: knowledge-based inference" section. 2 Inference Engine: Inference engine is a component of semantic correlation that uses the information and rules in the knowledge base to infer the event relations and calculates the me and ma for each subject. The low performance of inference engines is Table 1 All types of event relations [16] Relation type

Meaning
Object a considerable limitation in this approach. In the approach proposed in [16], Protégé-OWL [42] is used as an inference engine to perform reasoning based on Description Logic. The processing power of Protégé-OWL is limited to a knowledge base with several million frames. Hence Protégé-OWL is not a proper inference engine to detect slow APTs. 3 Policy Checker: In the final step, according to the me and ma functions and the security policy, which is stored in PStore and defined based on the system ontology, the system detects the violations of the security policy. More details of Policy Checker is explained in "Step 3: big event set processing" section.
For a better understanding, the steps of semantic correlation for detecting the violation of security policy in the case of Fig. 2 are shown in Table 2. As shown in this table, there is a user-defined security policy for the system. Security policy o 1 / ∈ me(s 2 ) means subject s 2 is not permitted to read object o 1 . According to this table, the intercepted events are e 1 = �s 1 In the next step, sets me and ma can explicitly be calculated through the intercepted events as follows: tewr ∼ e j two events e 2 and e 3 are correlated. In the next step, since e 2 and e 3 are correlated to each other, one indirect access rule is fired and set me for subject s 2 should be updated implicitly. Hence, me(s 2 ) = {o 2 } ∪ {o 1 } and this means the security policy is violated implicitly and a low-level attack is occurred. Table 2 Steps of policy violation detection for file removal example according to Figure 2 [16] Step Component Sample (based on Fig. 2).
Step 1 Event Interception Step 2 Event Normalization Using OWL-DL to define the events and initiate the me and ma.

Limitation of semantic correlation
Semantic correlation is an approach proposed in [16] for detecting multi-step, hybrid, and low APTs. This approach uses description logic, and OWL for correlating the events based on their relations and inferring the violation of security policies to detect the APTs.
The main challenge in using the semantic correlation is the size of the knowledge base that affects the reasoning time. To the best of our knowledge, semantic correlation is a good idea to detect low-level and slow APTs, if we overcome the processing overhead. In the semantic correlation, the reasoning time depends on the size of the knowledge base including the following components: • RBox: Big size of RBox can strongly increase the reasoning time. Since in the field of APT detection, the number of rules in RBox is limited (in maximum 500 rules as discussed in Relation rules subsection), RBox does not have a great impact on the reasoning time.
• TBox: Since we use TBox for storing the ontology of the Windows operating system and the size of the Windows ontology is not very large, TBox does not make a challenge for reasoning time. • ABox: Any instances of OWL classes and relations (e.g., events, subjects, objects, actions, and relations) are stored in ABox. Since the number of collected events (and consequently the number of event relations) grows rapidly during the time, the size of ABox increases significantly by event interception and event correlation as well. This problem is more evident for slow APTs. In this paper, we propose an approach for overcoming this problem.
In the rest of this section, we explain four detection steps, which are shown in Figs. 3 and 4.
Step 1: event interception Since APT attacks are low-level and violate the security policy implicitly, we should intercept all system events deeply. Hence, in our approach, we intercept both network and operating system events in different layers. The required layers of event interception for detecting low-level APTs are shown in Fig. 7. As shown in this figure, we intercept both network and operating system events in different ways. For intercepting network events, we use switch port mirroring (port spanning) to intercept MAC Address, source IP, destination IP, source port, destination port, and timestamp of each flow. For intercepting operating system events, we intercept system calls in user mode, kernel mode, and hypervisor of the virtual machine. Intercepting in user mode is done by code injection and kernel mode is done by hooking system service dispatch (SSDT) and Shadow SSDT tables, IRP hooking, and call back functions. The event interception in the virtual machine layer is done by intel virtualization technology. Afterward, each intercepted system call is normalized to the quadruple format, which consists subject, object, action, and timestamp of the event. The first attribute is one of the host_id , user_id , process_id , or thread_id . The second one is the handle of the object, which has a unique identifier. The third attribute is the action of the event, and finally, the fourth one is the timestamp of the system call. Since in the proposed approach we track the information flow, we use network events just for determining which host is connected to another host (by isConnected relation) on a specific port at a specific timestamp. Also, since we intercept operating system events at the host level, we determine which specific process is bound to a specific port for receiving/sending data from/to another host. That way, we can specify the information flow between two processes in two different hosts. For example, consider if Process p 1 in host h 1 sends a packet to process p 2 in host h 2 , then we can track the information flow through the following events: Event interception layers in the proposed approach [16] Step 2: event normalization After the interception of events, we should use a uniform format for describing and storing the event logs. Intrusion detection message exchange format (IDMEF) [43] is a standard format for storing and representing event logs in intrusion detection and alert correlation systems; however, since we use semantic correlation, we should store event logs in a standard format which is understandable by the inference engine. Hence we employ OWL-DL [44] for describing and storing event logs. Another role of Event Normalization is to specifying the explicit read or written objects by the subjects and initiating two sets me and ma. In other words, we use two rules in the Event Normalization step for initiating two sets me and ma as follows [16]: For example, in implicit file read example, according to Table 2, three intercepted events e 1 = �s 1 , o 1 , R, t 1 �, e 2 = �s 1 , o 2 , W , t 2 �, e 3 = �s 2 , o 2 , R, t 3 � lead to initiating sets me and ma for each subject as me( Step 3: big event set processing After the events are intercepted and normalized, it is time to process the event logs, and detect attack vectors like ν i on the server side. As mentioned in "Semantic correlation" section, since detecting slow APT attacks depends on analyzing a big number of events, the described approach in [16] is not applicable and adopting a scalable knowledgebased system is essential. To solve this problem, in this paper, we propose a scalable knowledge-based system which is described in "Big event knowledge-based processing" section.

Step 4: policy checking
In our approach, the event correlation and policy checking are performed every eight hours. In these situations, after the correlation of the events, if the values of me or ma change, the policy checker is fired and checks the violation of security policy. As discussed in "Preliminaries" section, the security policies are defined in set SP and any action performed by the subjects affects the ma and me sets. Therefore, following the approach is proposed in [16], the violation of security policy can be detected as follows: It is necessary to note that the proposed approach has no solution to detect the APT attacks that violate the availability of the system directly such as power save denial of service (PS-DoS) [45] attack. When we detect a violation of security policy, the following steps can help to detect the origins of infection: • Determining the events that cause to violate the security policy as the malicious events. • Tracing back the attack vectors that contain the malicious events.
• Determining the first events of attack vectors. • The subjects of these events are the origins of APT attacks.

Big event knowledge-based processing
As we described in previous sections, to overcome the complexity of slow APTs detection, we propose a scalable knowledge-based system which uses the three following techniques: 1 A flexible sliding window called Vermiform window, 2 A scalable inference engine called SANSA, 3 A data summarization process based on the system ontology called Event Abstraction.
Details of these techniques are described in the following sections.

Vermiform window
Since in our approach we deal with the big size of events, it is necessary to use a sliding window to prevent data explosion. In similar circumstances, it is prevalent to use a fixed-length (or fixed-size) sliding window. However since the fixed-length sliding window is not supple and flexible, it is not suitable for our approach. Hence, in our approach, we use a variable-length sliding window, which we called Vermiform window. This window is just like a worm with a variable length in movement. The movement steps of this sliding window are shown in Fig. 8. As shown in this figure, the Vermiform window has two steps in movement: Expanding and Shrinking. We map the expanding and shrinking of the Vermiform window to the context of event correlation as follows: • Expanding: In expanding step, the new intercepted events are appended to the window immediately and the events of the window are correlated to each other every eight hours and then the inference engine checks the violation of security policy. The process of event correlation in expanding movement is discussed in "Expanding: knowledge-based inference" section. • Shrinking: In some situations, the number of events in the Vermiform window becomes very high and we cannot append any new event to the window. In this state, the prevalent approach is eliminating some of the old events to provide a free space for the new events. Since eliminating the old events reduces the accuracy of the detection approach, we create a history or an abstraction from the old events, instead of eliminating them. The process of event abstraction is discussed in "Shrinking: event abstraction" section. Therefore, the main purpose of shrinking is to reduce the number of events in the sliding window.

Expanding: knowledge-based inference
Since using semantic-based correlation based on Protégé-OWL [42] (as an inference engine) is not applicable for detecting slow attacks and leads to huge processing overhead, we should use a scalable platform for processing OWL files and reasoning. In our proposed approach, we use SANSA [17] as an inference engine. The architecture of SANSA is shown in Fig. 9. As shown in this figure, SANSA is a scalable inference engine, which takes the advantages of big data processing frameworks such as Spark and Flink for querying, inferring, and large-scale RDF (Resource Description Framework) data analysis. Based on our experience, by using SANSA, we can simply analyze more than one billion frames in an acceptable time. A drawback of SANSA is that this engine does not support semantic web rule language (SWRL), which is used in our proposed approach. For solving this problem, we use Jena [46] rule language, which is implemented by SANSA, and redefined RBox rules (Relation rules and Indirect Access Rules) using Jena [46] rule language.
In the expanding process, the events are appended to Vermiform window every time a new event is generated and intercepted, and the inference engine correlates the events and calculates the implicit actions that cause to change the values of me and ma sets for each subject based on ABox, TBox, and RBox. After that, the Policy Checker investigates the violation of security policy based on the high-level user-defined security policies, and the values of me and ma sets of subjects. Policy checking is described in "Step 3: big event set processing" section.
The process of calculating the implicit actions is done by using Indirect Access Rules, which is placed in RBox of the Knowledge Base. As mentioned in [16], Indirect Access Rules consists of two types of rules which are defined as follows.
• Transition rules: These rules specify the circumstances (by the occurrence of a set of events) where some objects are read or written implicitly. In [16], thirteen rules of this type, which implicitly change the values of me and ma sets, are defined. These rules are shown in Table 3.

Fig. 9
Architecture of SANSA inference engine [17] • Untrusted subjects rules: These rules specify the circumstances where the nature of a trusted subject change to an untrusted one. In [16] five rules of this type are defined. These rules are shown in Table 4.
Since we used a sliding window, it is necessary to specify its length in expanding step. The minimum and maximum length of the sliding window in the expanding step depends on the minimum duration of the prevalent APT attacks, and the maximum processing power of the inference engine respectively. Therefore, our sliding window restrictions in expanding are time and size dependent. These restrictions are shown in Fig. 10 and are described in the following. Table 3 Patterns of transition rules [16] # Rule   • T min or minimum time of the Vermiform window in expanding: Since regular APT attacks duration is about several months (maximum one year), we should at least analyze and correlate the events intercepted in this interval. In other words: • S max or maximum size of the Vermiform window in expanding: As mentioned before, the maximum length of the window is when the window is fully expanded. This maximum length depends on the processing power of the inference engine.
Since the processing power of our inference engine is limited (approximately one billion frames), the maximum number of frames, which can be in the Vermiform window, is approximately one billion. In other words: • S min or minimum size of the Vermiform window in expanding: To succeed in discovering the APT attacks, the inference engine must ensure that: (10) T min = t r − t h ≃ 12 Months.
(11) S max = s max − s 0 ≃ 1 billion frames. S min is the number of generated frames from timestamp t h to t r . Since this value is time dependent, and differs from one event set to another, and depends on the size of the victim's computer network, the exact value of S min is calculated for each event set such as ES as follows: where f is the frame function (which is defined in "Preliminaries") and ES is the set of all collected events in the expanding step of the Vermiform window. • T max or maximum time of the Vermiform window in expanding: To succeed in discovering the APT attacks, the inference engine must ensure that: T max is the maximum time of the sliding window in expanding. Since this value differs from one event set to another and depends on the size of the victim's computer network, the exact value of T max is calculated for each event set such as ES as below: where ES is the set of all collected events in the expanding step of the Vermiform window. The process of appending the new events to the sliding window is paused when the maximum length of Vermiform window becomes equal to S max or approximately one billion frames. In this situation, we should start the shrinking process, which is explained in the next section.

Shrinking: event abstraction
As mentioned before, the number of collected events grows rapidly over time, and the correlation of all collected events is impossible in practice. In this situation, the simplest solution is to eliminate the old events. Since eliminating the old events leads to a reduction in the accuracy rate of the detection, we create a history from the old events (by abstracting the events) instead of eliminating them. The process of event abstraction occurs in the shrinking step.
Before describing our event abstraction approach, it is necessary to specify the length of the sliding window in the shrinking step, which has a direct impact on the event abstraction approach.
Since the events of the previous expanding step should be considered as a history (following the shrinking step) in the current expanding step, the length of the Vermiform window for the history (containing the abstracted events) depends on the free size and time of the sliding window in the current expanding step. These two parameters are shown by T h and S h in Fig. 10. Since the history length of the Vermiform window differs from one event set to another and depends on the size of the victim's computer network, the exact value of S h and T h are calculated for each event set like ES as follows.
where ES is the set of all collected events in the current expanding step of the Vermiform window.
After determining the maximum length of the sliding window for abstracted events obtained by the previous shrinking step (i.e., T h and S h ), it is time to summarize all the events collected in the expending step to a set of abstract events with a maximum length of T h and S h .
According to these limitations, abstraction function � : P(E) → P(E) is defined which maps a set of events such as WE to a set of abstract events such as AE (i.e., �(WE) = AE ) and the following conditions are held: • There is no capacity to store more events than those appearing in WE in the current expanding step of the Vermiform window. In other words: where ES is the set of all collected events in the current expanding step of the Vermiform window. • The size of AE as history in Vermiform window is much less than the maximum size of the current expanding step. In other words: Function considers two main facts for event abstraction as follows: 1 History size ( S h ): It is obvious that if the size of the Vermiform window in shrinking step becomes less, function should perform more abstraction compared to when the size of the Vermiform window in shrinking ( S h ) is big. 2 Events timestamps: Since in the process of event correlation, the last events of the sliding window are more valuable than the old events, the function should use less abstraction for the recent events and more abstraction for the old ones.
According to these two facts, we define several abstraction levels, which are used by function , to apply different types of event abstraction based on the size of S h and events' timestamps. The function employs the system ontology for abstracting the system entities such as objects, subjects, and actions. Using ontology we can replace the lower level (more concrete) entities with the upper-level (more abstract) entities. Function performs the abstraction based on the following entities: • Abstraction based on actions: To increase the expression power and accuracy, all the system actions were defined by six basic actions (Read, Write, Execute, Delete, Access, and Create). However, some of these actions (i.e., Create, Delete, and Access) can be replaced by some other actions (i.e., Read and Write). To this aim, we define a set of abstraction rules, which is shown in Table 5. According to these rules, actions Create and Delete are replaced by action Write, action Access is replaced by action Read, and action Execute is eliminated for event correlation; because it does not affect event correlation. • Abstraction based on subjects: Considering the ontology of the Windows operating system (which is shown in Fig. 5), we can replace the lower level subjects with the upper-level ones. For example, a Thread can be abstracted as a Process, or a Process can be abstracted as a User or a Host. Hence we define four abstraction rules based on the subjects' relations (which are shown in Table 6). For example, rule R 1 S means each event, which is generated by a thread such as s i , is supposed to be generated by its related process such as s j . • Abstraction based on objects: Similar to the subjects, we can replace the lower level objects by the upper-level ones. For example, a Socket can be abstracted as a File or a Device, and a Device can be abstracted as a KernelObj. Hence, we define an abstraction rule based on objects' relations (which is shown in Table 7) to abstract the events     By using these six rules, many events are abstracted and replaced by other events. The necessity of these six rules is to reduce the details of each event and increase the abstraction of each event without considering the event relations.
According to Tables 5, 6 and 7, we can combine these rules and define different abstraction levels (e.g., abstraction level R 1 S ∧ R 1 O ∧ R 1 A ). However, some of them are meaningless or have negative effects on both accuracy of the detection approach and the rate of events reduction. We examined all different abstraction levels by a dataset (which is introduced and discussed in "Evaluation" section) and sift all possible abstraction levels to seven main abstraction levels, which are shown in Table 8. As shown in this table, seven abstraction levels are constructed based on six basic abstraction rules. Also, the first abstraction level ( L 0 ) does not use any abstraction rules and does not have any advantage to reduce the number of events or increasing the accuracy. It is necessary to mention that before applying a new abstraction level, all the redundant events resulting from the previous abstraction level have to be eliminated.
In Table 8, column Average Event Reduction Rate specifies the percentage of the events that are reduced by a specific abstraction level using our dataset. Also, column Average impact on detecting very slow attacks specifies the average impact of a specific abstraction level on the accuracy of the proposed approach for detecting the attacks that their duration is more than T max (very slow attacks). Symbols '+' and '-' show the increment or decrease of the accuracy of the detection approach. Column Average impact on detecting slow attacks specifies the impact of a specific abstraction level on the accuracy of the detection approach for the attacks that their duration is less than T max (slow attacks).
As mentioned before, since in the process of event correlation, the last events of the sliding window are more important than the old events, approach use less abstraction for the recent events and more abstraction for the old events. Therefore, we divide the sliding windows into seven different partitions as shown in Fig. 11. In the partitioning process, the number of events in the seven partitions should be approximately the same. More formally the following condition should be held: As shown in Fig. 11, the process of event abstraction maximally consists of 49 steps. In other words, the process of the event abstraction is started from step S 1 and is continued until the size of the abstracted events become less or equal to S h , and in the worth case this process is finished in step S 49 . For example, if the event abstraction process is finished in 19th step ( S 19 ), this means the events which their timestamp is t 0 to t 1 are abstracted based on abstraction level L 4 , the events which their timestamp is t 1 to t 2 are abstracted based on abstraction level L 3 , the events which their timestamp is t 2 to t 3 are abstracted based on abstraction level L 2 , the events which their timestamp is t 3 to t 4 are abstracted based on abstraction level L 2 , the events which their timestamp is t 4 to t 5 are abstracted based on abstraction level L 1 , and the events which their timestamp is t 5 to t max are abstracted based on abstraction level L 0 .
The mentioned event abstraction process reduces the number of objects, subjects, events, event relations, and consequently reduces the processing overhead. Although this abstraction approach reduces the detection accuracy in some situations, it is still a better solution for detecting slow and very slow attacks, in comparison with the other solutions, which eliminate the old events. For a better understanding, we consider the implicit file removal example, which is shown in Fig. 12 and described in Table 9. As (20) Table 9 Implicit file removal through code injection into process p 2 by process p 1 [16] # System calls Description WriteProcessMemory e 2 = �s 1 , o 1 , a 2 , t 2 �, a 2 = W, t 2 = t 1 + ǫ DeleteFile e 6 = �o 2 , o 4 , a 6 , t 6 �, a 6 = D, File(o 4 ), t 6 = t 5 + ǫ Table 10 An example of events abstraction process shown in the figure and table, an implicit file removal through code injection into process p 2 by process p 1 has occurred. The process of events abstraction for this example is shown in Table 10. As shown in this table, we use four levels of abstraction for this example and abstraction levels L 4 to L 6 have no impact on reducing the number of events. Therefore, the seven events of this example can be abstracted into two events i.e., e 4 = �p 1 , p 2 , W , t 4 � and e 6 = �p 2 , o 4 , W , t 6 �.

Other restrictions
If the rate of the event generation in a computer network is low and the size of the abstracted events is small, then the maximum time of the window can be increased.
In such a situation, we can append more new events to the window and consider more events than the ones exist in the 12 months. These extra events are determined by S r and T r parameters which are determined using the following equations. Since S r and T r are dependent to S h (ES) and T h (ES) respectively and as mentioned before these two values are depended on specific event set ES, hence S r and T r are depended on ES. It is obvious that the mentioned situation appears when the S r or T r are greater than zero by the following equations.

Dataset
• These datasets contain regular and simple attacks whereas APT attacks have many complex behaviors. • These datasets do not contain any hybrid, low-level, and slow attacks, which are prevalent in APT attacks. • These datasets do not contain any host-based event logs and mostly contain network-based logs and attacks, which are sufficient for APT detection. • Attacks duration in these datasets are maximally limited to several weeks, which is not proper for evaluating the slow attacks.
• The volume of these datasets is limited to several gigabytes, which is not proper for evaluating the scalability of detection approaches.
Due to the mentioned problems, we generated a new evaluation dataset, which has the following characteristics and is useful for evaluating our approach. This dataset is available online at [59].
• The architecture of the test bed that is used for creating this dataset is shown in Fig. 13.
As shown in this figure, the test network contains four sub-networks. The first subnet is the Internet, which is the invasive way to the organization network. The second subnet is a CafeNet, which is connected to the Internet. The third one is Corporate network, which contains local services of an organization and is connected to CafeNet, and the fourth subnet is Critical network, which is an air-gapped network and isolated from the other networks. • This dataset contains nine APT scenarios, which are shown in Table 11. These scenarios are the abstracted scenarios of some well-known APT samples, which are reported in [15]. Some APT scenarios of this table were implemented based on the available source codes, reversed codes, and published vulnerabilities of these Fig. 13 The architecture of our test network malwares on the Internet. Then, all APTs were run in the testbed. Some were run concurrently, and some, with overlapping scenarios, were run asynchronously. • The behaviors of malwares are intercepted in the operating system in two different ways. The kernel events are intercepted by implementing a Windows driver for hooking and a mini-filter driver for using call-back functions. The user events are intercepted by Easy hook library [60] through the code injection. The interception in hypervisor level is implemented by customizing a version of Ether [61] on Xen hypervisor. Also, the network events are collected by switch port mirroring. • The volume of the dataset is approximately 2 Terabytes. • The dataset contains low-level, and slow attacks. • The dataset contains both the network and host event logs. • The total number of intercepted events in the test network is about 1.646 billion events. • We use seven hosts (one of which belonged to the attacker) for the simulation and supposed one user per host. • Different attacks with different duration times were considered. We deployed one short attack with one-day duration, two almost slow attacks with one-month dura- Internet Surveillance 9 months Stolen digital certificates tion, four slow attacks with five, seven, and eleven months duration, and two very slow attacks with fourteen and fifteen months duration. • The simulated network contained 110 benign processes and 9 attack vectors. Each attack vector contained several sub-processes. Our dataset contains the operating system and network event logs for all processes. • The normal behaviors were generated by the real users in CafeNet and Corporate networks. Since the actions in Critical networks do not have many dependencies on the users, the normal behaviors of such networks were simulated by some softwares, which were running without interacting with users.

Experimental results
After deploying the reconstructed attacks in our test network and creating the dataset, we evaluated our proposed approach using the generated dataset. To this aim: • We used the ontology that was proposed in [16] and implemented by WinDbg tool [70] and Microsoft MSDN library [71]. • We use OWL-DL language [44] for specification of the system ontology and Jena language [72] for specification of user-defined inference rules. We employed SANSA for loading and saving OWL files, querying, and reasoning based on Description Logic. • The processing time of the proposed approach for detecting hybrid and low-level APTs in the test networks was 6.1 hours.
The experimental results of evaluating our proposed approach are shown in Table 12.
As shown in this table, there are 110 benign processes and 9 malicious or attack vectors performed by different processes. Based on the evaluation results, the accuracy rate is 89.07%. However, since the events which are generated by APT attacks are very rare, the two classes of events (i.e., APT and Benign) are imbalanced, and using accuracy and precision as the evaluation criteria is not reliable. Hence in these cases using other criteria such as sensitivity and specificity are more valuable and informative. Specificity or true negative rate, determines the ability of our approach to detect benign samples. In other words, specificity measures the rate of the detected benign samples that are truly benign. Sensitivity or true positive rate or recall determines the ability of our approach in detecting the APT samples. In other words, sensitivity measures the rate of the detected malicious samples that are truly APTs.  According to the values that are shown in Table 12, we can draw the receiver operating characteristic (ROC) curve for our approach. As shown in Fig. 14, the coordinate of our approach in the ROC curve is (0.8888, 0.1091). This point in the ROC curve means any other solution for detecting APT attacks, which are evaluated based on our proposed dataset, should try to increase the sensitivity while the specificity is fixed or become more. In other words, any new solution should try to be placed near point (0,1) in the ROC space.
Also, the experimental results of our approach per each APT scenario, which was described in Table 11, are shown in Table 13. As shown in this table, since our dataset is unbalanced and the number of malicious events versus the benign ones is very small, the values of accuracy and precision are unrealistic and useless.
To evaluate more accurately, we reevaluate and compared the proposed approach in [16] with the proposed approach in this paper using our new generated dataset, which contains slow attacks. As shown in Table 14, since the approach proposed in [16] can just detect low and hybrid attacks and cannot detect the slow ones, the sensitivity of the prior approach [16] is too low (44.44 %) in comparison with the sensitivity of our proposed approach (88.88 %) using this new dataset.

Discussion
APT detection is a new challenge in the field of computer security and the problem of detecting low and slow APTs is a very new challenge in the field of malware analysis, hence the number of publications about detecting APT attacks is rare. Since we cannot find any other approaches focusing on low-level and slow APTs and to the best of our knowledge, our proposed approach is the first solution for detecting this type of APTs, we cannot use quantitative comparison between our approach and the other correlation approaches. However, since the related works use alert correlation methods for detecting APT attacks, we do a qualitative comparison (shown in Table 15) between our approach and the other correlation approaches. As shown in this table, the main drawback of all previous works, even our previous work [16], is the lack of detecting slow attacks. Our approach can detect multi-step, hybrid, low-level, and slow attacks.
As mentioned before, the average batch processing time in our proposed approach to detect the slow attacks was about 6.1 hours. This time depends on several parameters as follows, and can be improved by the following considerations.
• Correlation algorithms: Since the solution for detecting low-level APTs is based on reasoning, the processing time depends on the reasoning algorithm. We use OWL-DL (like a solution in [93]) for reasoning by SANSA. In this situation, the complexity class of reasoning is ExpTime-Complete. Since SANSA inference engine uses OWL-Horst [94] forward chaining inference for reasoning and we do not use all features of basic description logic (Attributive Language with Complements), we can use OWL-Horst instead of OWL-DL to reduce the reasoning time. In this situation the complexity class of reasoning is Nondeterministic Polynomial Complete (NP-Complete) and in a nontrivial case, it is Polynomial [94]. • SANSA framework: To detect slow APTs, we encounter with analyzing a big number of frames (about several billion frames). Since we use SANSA as an inference engine, the processing time is highly dependent on the processing power of SANSA. Our experience shows SANSA can analyze several million frames in some seconds. • Processing infrastructure: Since SANSA uses Spark technology to process the big size of data, the infrastructure that is used by Spark has a key role in the processing time. We deployed our approach on a cluster with 8 computing nodes and 80 cores, 1 Terabyte of RAM, and an NFS (Network File System) server node with 10 Terabyte capacity. The operating system was Rocks cluster 7.

Conclusion
Targeted cyberattacks, which are known as APTs, have some characteristics such as low-level, slow, multi-step, distributed, and hybrid. Since the most complex APT attacks are low-level and slow, in this paper, we focus on this type of APT attack. Our approach uses a scalable knowledge-based system, and semantic correlation following the enhanced version of the approach we proposed in [16] for detecting the low-level as well as slow APTs. In our approach, we use a sliding window called Vermiform window. This window has two steps or phases in its movement: expanding and shrinking. In expanding, we use a scalable inference engine called SANSA for correlating a big size of events based on big data frameworks such as Spark. In some situation that the APT attack duration last a very long time or the number of intercepted events is very much, we use the shrinking process. In shrinking, the events are abstracted and reduced and in fact, we create an abstract history from the old events. This solution is different from the regular approaches which eliminate the old events completely.
To evaluate the proposed approach, we use a dataset that contains seven implemented low-level and slow APT attacks. The proposed approach shows 88.88% of sensitivity and 89.09% of specificity.
To the best of our knowledge, our approach is the first solution for detecting slow APTs which whose duration is about several months. However, there are still many opportunities for innovations in the domain of APT attacks. We believe that the following works could continue and complete our research on detecting APT attacks with more accuracy: • One of the main challenges in the field of APTs is attack prediction. Our approach in this paper cannot predict the APT attacks before the malware fulfills its malicious activity. Using machine learning algorithms, we can predict malicious activities before an APT completes its malicious activities. • The proposed approach has no solution to detect the APT attacks that violate the availability of the system and propose an approach for solving this weakness can be considered as future work. • The proposed approach has no solution to verify the generated alerts. Alert verification can help us to reduce the number of false alerts. • Another future work is to improve and release our dataset to be used by other APT detection approaches to have a better and precise evaluation result. • Proposing a stream-based approach for detecting the APT attacks that are not slow is another research topic, which could be considered in the future.
It is worthwhile to note that the proposed approach can be applied on different network structures such as IT networks, smart grids, microgrids [95], and even IoT networks.

Abbreviations
The list of abbreviations and the symbols used in the paper are listed in Table 16 in "List of the symbols used in the paper" section.

List of the symbols used in the paper
The symbols, which are defined and used in this paper, are listed in Table 16.