Efficient parallel derivation of short distinguishing sequences for nondeterministic finite state machines using MapReduce

Distinguishing sequences are widely used in finite state machine-based conformance testing to solve the state identification problem. In this paper, we address the scalability issue encountered while deriving distinguishing sequences from complete observable nondeterministic finite state machines by introducing a massively parallel MapReduce version of the well-known Exact Algorithm. To the best of our knowledge, this is the first study to tackle this task using the MapReduce approach. First, we give a concise overview of the well-known Exact Algorithm for deriving distinguishing sequences from nondeterministic finite state machines. Second, we propose a parallel algorithm for this problem using the MapReduce approach and analyze its communication cost using Afrati et al. model. Furthermore, we conduct a variety of intensive and comparative experiments on a wide range of finite state machine classes to demonstrate that our proposed solution is efficient and scalable.

Finite state machine and conformance testing Due to their simplicity and ability to model complex systems, Finite State Machines (FSMs) are extensively used in several fields such as communication protocols [1], pattern matching [2], digital event reconstruction [3], smart contract [4], distributed testing [5,6], genomics [7], and other reactive systems [8]. An FSM is a model which has a finite number of states, inputs, outputs, and a finite number of transitions each labeled by an input/output pair. Besides that, FSMs are the underlying models for formal description techniques, such as statecharts, Specification and Description Language (SDL) [9], Unified Modeling Language (UML) [10], programmable logic devices [11], and ethereum smart contracts [4].
Testing FSM is an indispensable part of system design and implementation to guarantee the right functioning of the modeled systems and find aspects of their behavior due to their simplicity and ability to model systems [12]. This is well studied in the FSM-based testing research area. However, we are basically missing some information about the black-box FSM Implementation Under Test (IUT), and, as a consequence, we need to recoup this information by trying experiments on this IUT [13]. The purpose of these experiments is to check whether the implementation of a model behaves in accordance with its specification, by applying checking sequences (CSs) to the IUT, observing the corresponding output responses, and drawing a conclusion about the IUT [14]. In other words, one needs to recognize the state of the IUT and bring the IUT to a particular state. The state recognition can be accomplished by using CSs like distinguishing sequences [15], Unique Input Output sequences [16,17], Characterizing Sets (W-Set) [18], or synchronizing sequences (also known as reset sequences) [19], when such sequences exist. The motivation to study such sequences comes from different fields including robotics, bio-computing, propositional calculus, model-based testing, distributed testing, and many more [15,[19][20][21][22][23][24][25][26][27][28]. The literature contains many techniques that automatically generate CSs [21,22,[29][30][31][32][33][34]. Most approaches consist, in principle, of three parts: initialization, state identification, and transition verification.
In this paper, we focus on the scalability problem of generating distinguishing sequences (DSs) from an FSM to resolve the state identification problem, which consists in finding an input sequence that produces different outputs for each initial state of an FSM. This problem was initially described in the seminal paper by Moore [35] in 1956, and in 1964 Hennie [36] provided the first FSM-based test generation algorithm that can be automated. One motivation is that many FSM-based test sequence generation techniques use DSs (see, for example, [30,[36][37][38][39][40]). It has been found that distinguishing sequences (DSs), where they exist, lead to shorter tests [13]. There are two well-known DSs, adaptive (ADS) and preset (PDS) [41] for different FSM classes (deterministic [13], nondeterministic [42], complete [43], partial [44], observable [45]). PDS is a single fixed input sequence that can be used to distinguish each state of the machine [46]. In ADS, the next input depends on the output of the current input. It is a rooted tree where each root to leaf path represents an input sequence specific to the state represented by the leaf. Throughout the paper, we refer to PDS when we write DS. To derive a shortest DS of a state's pair, Spitsyna et al. have designed the well-known Exact Algorithm (EA) [47]. It's based principally on two steps: For an FSM M, construct a truncated successor tree from the intersection of two initialized FSMs M/s 1 (i.e., FSM M with the initial state s 1 ) M/s 2 (i.e., FSM M with the initial state s 2 ) and then derive a shortest DS from it. They suggested a method to analyze the separability relation between FSMs that can be used for deriving a shortest DS (if exists) of two given FSMs (or for two states of a given FSM) and show that for two states of an FSM, its upper bound becomes exponential.
When nondeterministic FSMs are considered, Alur et al. [48] have shown that the length of a DS for all states can reach the exponential bound. In addition, the complexity to decide if there exists a PDS is PSPACE-complete, and it is EXPTIME-complete to decide if there is an ADS.
To the best of our knowledge, the parallelization and scalability of deriving DSs from nondeterministic FSMs have not been thoroughly addressed using MapReduce framework. In this work, we focus on the design of an optimized MapReduce version of the Exact Algorithm, and experiments to prove its scalability.

Outline and contributions
Our study makes the following contributions: • We present the first parallel MapReduce algorithm to efficiently derive a set of short distinguishing sequences for all pairs of states of a nondeterministic FSM. • We provide a theoretical analysis of the communication cost of proposed methods in MapReduce model through a grounded theory. • We evaluate the performance of the proposed algorithms through extensive experiments using a variety of large-scale FSMs datasets.
The remainder of the paper is structured as follows. Section 2 includes the necessary technical definitions and a brief introduction to the MapReduce computational model. Section 3 presents the related works as well as a survey of the Exact Algorithm for deriving DSs from nondeterministic FSMs. Section 4 presents and analyses the proposed MapReduce version of the Exact Algorithm to derive a set of shortest DSs for all pairs of states. Section 5 shows the efficiency and the scalability of the proposed methods by conducting extensive and comparative experiments on a variety of classes of nondeterministic FSMs, whereas section 6 covers the conclusion of the paper.

Preliminaries
In this section, we present basic concepts that are used throughout this paper and MapReduce framework in brief. An FSM is called connected if for each s ∈ S there exists an input sequence that takes the FSM S from an arbitrary state to state s.Given an FSM M = (S, I, O, E) , a state s and an input i, the i-successor of state s contains each state s ′ for which there exists an output symbol o ∈ O such that (s, i, o, s ′ ) ∈ E . Given a subset of states S ′ ⊆ S and an input i, the set of states S ′ is the i-successor of S if S ′ is the union of the i-successor over all states of the set S. We are interested in DS, which is an input sequence that produces different output sequences when starting from different states of an FSM. As mentioned previously, there exists two types of DSs: An input sequence x is considered as a preset distinguishing sequence (PDS) for FSM M if x is defined as an input sequence for states S and for any pair of distinct states (s 1 , s 2 ) ∈ S × S , x is a distinguishing sequence for s 1 and s 2 . In the other side, an Adaptive Distinguishing Sequence (ADS) is a rooted tree T with exactly n leaves; the internal nodes are labeled with input symbols, the edges are labeled with output symbols, and the leaves are uniquely labeled with states of the FSM such that: a) edges descending from a common node have distinct output symbols, and b) for each leaf of T, if x and y are the input and output sequences respectively formed by the node and edge labels on the path from the root to the leaf labeled by some state s i of the FSM then (s i , x, y i , s ′ ) ∈ E . Then, the input sequence x is considered as an adaptive distinguishing sequence of the state s i . The length of such sequence is the depth of the tree T [12].

MapReduce model of computation
Works based on MapReduce were recently introduced as an optimal parallel model to compute the intersection [49] and the composition [50] operations of FSMs. In this work, we implement the first MapReduce version of the EA to derive a shortest DS for each pair of states if exists from a large-scale complete observable nondeterministic FSM.
MapReduce [51] is considered as one of the most prominent programming models for processing scalable problems. Nowadays, Hadoop [52] offers the most popular opensource framework written in Java for implementing MapReduce algorithms. Authored by Apache Software Foundation, the Hadoop project includes modules enabling reliable and scalable distributed computing. It presents several advantages, such as scalability, flexibility, cost-effectiveness, organized architecture, and resilience to failure. Recently, several Hadoop-based platforms have been proposed as efficient and flexible solutions for computational storage with the strategy of processing data close to where they reside [53][54][55]. Among them, we cite the lineage-aware data management (LDM) that exploits the data locality to decrease the network footprint [56,57]. MapReduce algorithm consists of three major phases: Map, Shuffle and Reduce. Each phase runs several tasks in a completely parallel manner. The Map phase is responsible for filtering and transforming input records into intermediate records, the Shuffle phase occurs automatically, it is done by Hadoop to manage the exchange of the intermediate data from the map phase to the reduce phase, while the Reduce phase is in charge of summarizing the outputs of the previous phase.
When designing a parallel MapReduce algorithm, it is essential to propose an optimal one that offers the best trade-off between parallelism and communication cost in a MapReduce computation. To do this, we analyze the communication and the computation costs of the proposed methods using Afrati et al. theoretical model [58].

Parallel approaches for deriving distinguishing sequences
A variety of studies have recently focused on the use of parallel processing techniques in order to derive CSs in a large-scale context: UIO sequences [17], harmonized state identifiers, and characterizing sets [18], synchronizing sequences [59,60]. For DSs generation, Hierons and Türker [61] and El-Fakih et al. [62] introduced independently parallel multithreading implementations of the EA over the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) architectures [63]. They conducted extensive experiments when considering a large variety of FSM classes, using different CPU-GPU architectures and workloads. The obtained results show that their approaches are sufficiently efficient in large-scale data, and the execution time for deriving DSs from nondeterministic FSMs increases exponentially w.r.t. different parameters such as: the degree of nondeterminism, the number of transitions, and the input alphabet size to the output alphabet size ratio.
In order to reduce the execution time of constructing successors' table of all pairs of states for a given nondeterministic FSM, El-fakih et al. [62] proposed different multithreading parallel approaches based on multicore CPU and GPU architecture. They considered two options: Thrust software platforms and GPU implementations using the CUDA platform. They also proposed and evaluated a Network of Workstations solution (NoWs) based on Divisible Load Theory. They conducted their experiments on the class of nondeterministic FSMs with a large number of input and output symbols.
These experiments bring out the difference between the proposed algorithms in terms of speedup and execution time.
In [61], Hierons and Türker considered the partial observable nondeterministic FSMs and studied the scalability issue while constructing preset and adaptive distinguishing sequences (PDS and ADS) for all states. They proposed an ADS generation algorithm that can process inputs up to 2048 times better than the existing ADS construction algorithm and a PDS generation algorithm that can process inputs up to 8 times better than the existing PDS generation algorithm. Their approach is based on the available parallelism in a GPU computing model, called the thin thread strategy. it utilizes global device memory and so maximizes the number of threads in order to maximize parallelism. The results of their experiments are good and indicate that the proposed algorithm can derive DSs from observable partial nondeterministic FSMs with 32000 states in an acceptable amount of time.

Overview of the exact algorithm
In this section, we present a concise overview of the well-known Exact Algorithm [47] that derives shortest distinguishing sequence for a pair of states, if it exists, of a complete observable nondeterministic FSM.
From an FSM M, we will consider two FSMs with different initial state M/s 1 and M/s 2 . The EA will be applied to a single FSM M to derive a DS of two states s 1 and s 2 and we note that the state pair order doesn't make a distinction i.e. (s 0 , s 1 ) = (s 1 , s 0 ) . In order to derive a shortest distinguishing sequence (when it exists), EA is implemented using the Breadth-First Search (BFS) method that explores the search successor tree level by level.
The states s k and s l are non-separable. */ 10 if there exists an input i such that each pair of the set P has no i-successors 11 or there exists a node at a j th level, j < p, labeled with subset R of states such that P ⊇ R, 12 or for some pair (s, t) of the set P and some output o, the I/O sequence io takes the FSM from states s and t to the same state then 13 The current node is claimed as a leaf node 14 if none of the paths of the truncated tree derived is terminated then 15 return s k and s l are non separable. 16 else 17 a shortest sequence αi where α labels the path from the root of the tree to a leaf, is a shortest distinguishing sequence of s k and s l , return αi.
The EA is divided into two major steps: the intersection step and the derivation step. In the first step, we compute the intersection M/s 1 ∩ M/s 2 . Then, if this intersection is a partial (non-complete) FSM, we derive a truncated successor tree Tree. Otherwise, we return the message that the two states are non-distinguishable. In the second step, we derive from the truncated successor tree Tree a short DS of the state pair of the root node, if it exists, using BFS method. The root of this tree, which is at the 0th level, is the initial state (s k , s l ) of the intersection; the nodes of the tree are labeled with subsets of states of the intersection. Given already derived j tree levels, j ≥ 0 , an internal node of the jth level labeled with a subset P of states of the intersection, and an input i, there is an outgoing edge from this internal node labeled with i to the node labeled with the subset of the i-successors of pairs of states of the subset P. A current node Current, at the pth level, p ≥ 0 , labeled with the subset P of state pairs, is claimed as a leafnode if one of the conditions in line 10, 11 or 12 of Algorithm 1 holds. Next, if no leaf node exists following the condition of line 10, then the states pair of the root node are non-separable. Otherwise, if there is a leaf node labeled with the subset P of states such that for some input i, each state of the set P has no i-successors, then derive a shortest DS αi where α labels the path from the root node of the tree to the leaf node.
The number of different possible subsets of pairs of states in an FSM having n states is 2 ) . In the second step of EA, derivation of a DS, if it exists, from the previously derived truncated successor tree is performed using the classical Breadth-First Search method as recalled in Algorithm 1. Then, the worst-case time complexity of the EA is shown to be in O(2 n 2 2 × n 2 2 × |I| × |O|). When considering two FSMs having respectively n and m states, it's shown in [47] that the length of the shortest DS is at most 2 mn−1 and this upper bound can be reached. As a consequence, the upper bound for a single FSM becomes 2 n 2 −1 .
However, according to the conducted experiments in [47] there exists a large class of FSMs with n and m states such that the length of the shortest DS is less than mn and less than n 2 when considering one FSM. The experiments also show that the existence of a DS of two FSMs significantly depends on the degree of nondeterminism in the FSMs [47].

Example 2
Let us consider the FSM M from Example 1. and apply the EA to derive a DS between the state s 0 and the state s 1 . Figure 2 shows the successor tree derived from step 1 of the EA. The nodes of this successor tree are labeled from n 1 to n 7 .
As the intersection of M/s 0 and M/s 1 is not a complete FSM, the root node n 1 associated with the state pair (s 0 , s 1 ) is at the level j = 0 . The successors of n 1 are nodes n 2 and n 3 . Then, for the level j = 1 , we obtain the state pairs in nodes n 4 and n 5 as successors of n 2 , and the successors of n 3 are the state pairs in nodes n 6 and n 7 . The node n 4 has a pair with repeated state (s 3 , s 3 ) , so, we do not consider n 4 for further exploration. Node n 6 is labeled with the empty set, i.e., there are no successors for n 3 under any input symbol. Therefore, the input sequence "ba ′′ which starts from the root and leads to the node labeled with the empty set is a short DS for the FSMs M/s 0 and M/s 1 .

Efficient MapReduce algorithm for deriving short distinguishing sequences from FSMs
In this section, we present and analyze our parallel version of the EA using MapReduce framework to extract a set of short distinguishing sequences from complete observable nondeterministic FSM. Our method outperforms the previous parallel approaches in the sense that it efficiently provides a short distinguishing sequence for each pair of states of a large FSM.

Framework overview
The proposed solution consists of two MapReduce steps namely: the intersection step and the derivation of short distinguishing sequences step or the derivation step for short. Figure 3 illustrates the workflow of our solution, which receives a large FSM as input and produces a set of short distinguishing sequences for each pair of states using two MapReduce algorithms. Initially, an input FSM M = (S, I, O, E) is preprocessed to produce a text file where every line (value) represents a transition . This file is the input of the intersection step. MapReduce framework is composed essentially of a map function which performs filtering and sorting, and a reduce function which performs a summary operation. The map function produces a set key, value pairs. In our case, for a transition t, it generates a set of associated keys. Following that, the set of pairs produced by map function are grouped and sent to reduce function. This last receives transitions having the same key and computes their intersection i.e. for two transitions t i and t j having the same input/output, the result will be an edge in the so-called truncated successor tree In the second step, we developed an iterative MapReduce algorithm to derive a short distinguishing sequence Fig. 2 The truncated successor tree of M/s 0 ∩ M/s 1 if exists. The input of this step is the output of the previous intersection step. The map function of this step takes an edge of the truncated successor tree (n s , l, n d ) where n s denotes the source node, l is the label, and n d is the destination node and produces a set of associated keys. For a given edge, if its destination node is empty, the associated key will be its source node, else each states' pair in its destination node becomes a new associated key. Next, the reduce function receives the set of edges having the same key and divides the set of associated edges into two subsets, the first one contains all edges having an empty destination node, the other one contains the rest of the edges. Then, the cartesian product of these two subsets will be performed i.e. for two edges e and e ′ , if the source node of e ′ is a subset of the set of pairs in the destination node e, then the resulting edge will be (n s (e), l(e)l(e ′ ), n d (e) \ n s (e ′ )) . This process is equivalent to a BFS in the truncated successor tree of the EA, when we concatenate edges label to construct DSs if exist. Figure 4 shows the derivation step of the successor tree presented in Fig. 2. The first round of the derivation step performs the successor tree received from the intersection step and maps its edges to different reducers following the previously described mapping schema. The first round of the derivation step performs the successor tree received from the intersection step and maps its edges to different reducers following some mapping schema. For example, from . Then, this edge will be mapped, in the second round, to the reducer associated with the key (s 0 , s 1 ) . The same process will be repeated iteratively in the next MapReduce rounds of the derivation step until the stop condition is true. Finally, if a DS not exists and the number of iterations is less than the maximum bound, we repeat the derivation step by considering the output as an input of the next iteration, else the final output is the set of pairs and their short DSs if they exist.
In the next section, we present the communication cost in a MapReduce framework of this problem using Afrati et al. model [58].

Communication cost analysis
The communication cost model introduced by Afrati et al. [58] gives a good way to analyze problems and optimizes the performance of any distributed computing environment by explicitly studying an inherent trade-off between communication cost and parallelism degree. By applying this model in a MapReduce framework, we can determine the best algorithm for a problem by analyzing the trade-off between reducer size and communication cost in a single round of MapReduce computation. There are two parameters that represent the trade-off involved in designing a good MapReduce algorithm: the first one is the reducer size, denoted by q, which represents the size of the largest list of values associated with a key that a reducer can receive. The second parameter is the amount of communication between the map step and the reduce step. The communication cost, denoted by r, is defined as the average number of key-value pairs that the mappers create from each input.
Formally, suppose that we have p reducers and q i ≤ q inputs are assigned to the i th reducer. Let |In| be the total number of different inputs, then the replication rate is given by the expression r = p i=1 q i /|In| [58]. From [58], we compute a lower bound on the replication rate for the intersection of FSMs as a function of q using the following expression: where |In| denotes the input size, |Out| denotes the output size, and g(q) the number of outputs that can be produced by a reducer of size q. Since, we consider a complete observable nondeterministic FSM, we have |In| = |E| and |Out| = n(n−1) 2 × |I| . Thus

Proposition 1 The lower bound on the replication rate is
It is worth noting that limiting the reducer size enables more parallelism. Small reducers' size forces us to redefine the notion of a key in order to allow more, smaller reducers, and thus allow more parallelism using the available nodes.

MapReduce algorithm for the intersection step
Let us present the MapReduce implementation of the intersection step using a modified version of the algorithms proposed in [49]. Notice that our approach produces a truncated successor tree, also called successor table, for all pairs of states of a complete observable nondeterministic FSM. The conducted experiments in [62] show that when deriving distinguishing sequences, the construction time of the successor tree takes 96% of the whole EA's time. That is why three methods will be presented later in this section for the construction of the truncated successor tree.
The Algorithm 2 below contains the definitions of the map and reduce functions of the intersection step. The map function produces a set of keys based on a defined schema from the input FSM transitions. The reduce function performs, inside reducers, the intersection of the received transitions from the mapper tasks.
The three proposed mapping schema emit a transition to a set of reducers w.r.t. a key defined from some hash functions. Our mapping methods are based respectively: on states, input alphabet symbols, and both states and input alphabet symbols.
Formally, lets M = (S, I, O, E) be a complete observable nondeterministic FSM having n states and t = (s, i, o, d) be a transition in E. A mapper produces a set of keys from the transition t based on some hash function h. This hash function is integrated in the definition of these keys as a part of the sub-function getKeysFromTransition() in Line 3 of Algorithm 2. Let us explain in more details the three mapping methods by designing the sub-function getKeysFromTransition().

Mapping based on states
In the first mapping method, from a transition t ∈ E , the mappers produce a set of key-value pairs having the form key, t , where key = �h S (s[t]), s� , for all s ∈ S such that s[t] � = s and h S be a hash function defined from S to {1, · · · , n} . In this case, we have n(n−1) 2 reducers. In this method, the function g(q), which is the number of outputs that can be produced by a reducer of size q, can be affected by the presence of transitions with different alphabet symbols inside the same reducer. Formally, since we consider a complete observable nondeterministic FSM, one has q ≤ 2 × (|I| + |I| × |O|) and g(q) = |I| . Thus, the following proposition gives the upper bound on the replication rate for this method.

Proposition 2
The replication rate r in the state-based mapping scheme is r ≤ (n − 1).

Mapping based on input alphabets
In the second method, we have one reducer for each of the input alphabets. Thus, the number of reducers is equal to the input alphabet size |I|. The mappers will send each transition t to the reducer corresponding to its input symbol I[t]. More precisely, from a transition t ∈ E , the mappers produce a set of key-value pairs having the form key, t , where key = h In (I[t]) such that h In be a hash function defined from I to {1, · · · , |I|} . We will now have g(q) = n(n−1) 2 , where q ≤ n + n × |O| . Assuming that the alphabet symbols are uniformly distributed, we have Proposition 3 The replication rate in the input alphabets based mapping scheme is optimal and equal to 1.

Mapping based on both states and input alphabets
In the last method, we propose a hybrid mapping between first and second method. In other words, keys will be based on the states and input alphabets in the same time. Then, we consider the key form key = (s[t], s, I[t]) , where s ∈ S such that s = s[t] . The number of reducers, in this case, is equal to n(n−1) 2 × |I| , the reducer size q ≤ 2 × |O| , and each reducer will produce no more than one edge of the truncated successor tree. Thus, we can deduce an upper bound of the replication rate in the following proposition.

Proposition 4
The replication rate in the hybrid mapping method is r ≤ (n − 1).

Proof
In the map function of Algorithm 2, getTransitionFrom(t) returns the set of keys associated with the transition t w.r.t. a given mapping method. Then, it sends the transition t to all reducers indexed by these keys. In order to ensure that the algorithm correctly constructs the successor tree, it is necessary to have the property (*): all the transitions with the same input and output symbols inside the same reducer. Then, the reducer function computes their pairwise intersection to extend the successor tree. Using the proposed mapping methods, we have: • for the mapping based on states, a reducer s i , s j receives from the mappers all the transitions starting from the state s i or s j ; as a consequence, all outgoing transitions from state s i or state s j are inside this reducer. Then the property (*) is verified using this mapping method. • for the mapping based on input alphabets, a reducer receives from the mappers transitions having the same input symbol c; Then inside a reducer, we have also all transitions having the same input and output symbols. • for the hybrid mapping, a reducer s i , s j , c receives from the mappers transitions having the same input symbol c and starting from the state s i or s j ; then the property (*) is obviously verified.
The Proof for the communication complexity follows from Proposition 1.

MapReduce algorithm for the derivation step
In this step, multiple MapReduce rounds are used to derive a set of shortest distinguishing sequences for each pair of states of an observable nondeterministic FSM. In each round, the mappers run in parallel and produce a collection of pairs key, edge , while the reducers trait this collection and derive a set of short DSs if they exist. In this step, we derive a shortest DS for each pair of states. It received from the intersection step n(n−1) 2 × |I| truncated successor tree edges and produces n(n−1) 2 pairs of states with their DS if exists and if not we mention the "not found" notation. To that end, we use a single mapping method based on states. Each map function takes as input an edge e from the truncated successor tree and produces a set of key, e pairs. A key key is the pair of states in the source node n s (e) if the destination node n d (e) is empty; otherwise, it is the set of state pairs in the destination node of the edge e.
Let us compute the replication rate in each MapReduce round of the derivation step based on Algorithm 3. We have n(n−1) 2 available reducers and each reducer cannot contain more than q ≤ ( n(n−1) 2 ) × |I| edges. The number of outputs that can be produced is g(q) = 1 in the last iteration, then the following proposition holds.

Proposition 5
The replication rate r in each MapReduce round of the derivation step is r ≤ n(n−1)

Proof
It is obvious to see that a DS is the label of a path from the root node to a leaf node indexed by the empty set {} in the successor tree. During each MapReduce round of Algorithm 3, the successor tree Tree is compacted by level until the stop condition. Without loss of generality, let us consider a leaf node ne k indexed by the empty set which is located at the kth level in the successor tree, and let prec(n) be the set of predecessor nodes of the node n. In each MapReduce round, the node ne k replaces all nodes located at the level k − 1 , belonging to the set prec(ne k ) = {ne 1 k−1 , · · · , ne l k−1 } . As a consequence the successor tree is compacted in the following way: a label x of an edge (ne i k−1 , x, ne k ) is concatenated with all labels of the set of edges {(n, y, ne i k−1 | n ∈ prec(ne i k−1 )} , for all 1 ≤ i ≤ l , to produce the set of edges {(n, yx, ne k | n ∈ prec(ne 1 k−1 )} . The number of MapReduce rounds in Algorithm 3 is related to the stop condition which is true when the root node of the successor tree is reached and a set of DSs is derived, if exists, in less than mn iterations.

Implementation and experimental results
This section includes extensive experiments with the above described methods to evaluate their efficiency and effectiveness in terms of the communication cost and the execution time.
The experiments are conducted on randomly generated complete observable nondeterministic FSMs which cover a varying number of states, input and output alphabets sizes, degree of nondeterminism, and range. We run five different experiments then we calculate and depict the average of the obtained results in the corresponding figures. Finally, we compare the proposed methods in terms of the communication cost and the execution time required in MapReduce framework to derive the truncated successor tree and extract a short distinguishing sequence for each pair of states if exists.

Cluster configuration
Our experiments were run on Hadoop on the French scientific testbed Grid'5000 [64] at the site of Lille. We used for our experiments a cluster composed of 15 nodes, 30 CPUs, 300 cores. Each node is a machine equipped with two Intel Xeon E5-2630 v4 with 10-cores processors, 256 GB of main memory, and two disk drives (HDD) at 300 GB. The machines are connected by 10 Gbps Ethernet network and run 64-bit Debian 9. The Hadoop version installed on all machines is 2.7.

Data generation method
We randomly generated a large variety of FSMs data sets in two phases based on different combinations of the input alphabet size |I|, the output alphabet size |O|, the number of states |S|, the degree of nondeterminism D and the range R. First, we randomly generated complete deterministic finite automata using the method described in Abbadingo One competition (see http:// abbad ingo. cs. nuim. ie). Then, we randomly selected {(D × |S| × |I|)} transitions where |D| equals 20, 40, 60 or 80. Finally, we add to the obtained FSM nondeterministic the observability property, by generating randomly ( R×|O| 100 ) number of replications for each selected transition where the observability range R is 20-30, 40-50, 60-70 or 80-90. Table 1 summarizes all the datasets used in our experiments along with their respective properties. In order to obtain accurate results, we have generated five samples S 1 i , · · · , S 5 i for each dataset S i in Table 1. Then, the results of different experiments are the mean of results obtained from these samples. Elghadyry et al. Journal of Big Data (2021) 8:145 Communication cost analysis The communication cost is equal to the total number of key-value pairs sent from the map phase to the reduce phase. It can be optimized by minimizing the replication rate parameter i.e. the number of input copies sent to the reducers. The following table summarizes the relationship between the FSM size and the communication cost in the MapReduce algorithm for different datasets: Since we have introduced three mapping methods in the intersection step of our approach, we present the communication cost for the considered datasets in Fig. 5.
In Fig. 5, the obtained results show clearly that the mapping based on input alphabet outperforms the other mapping methods in terms of the communication cost. This is due to the fact that the number of the transition copies sent to the reducers using this method is less than the other ones as proved formally in Proposition 4. In some particular cases of FSMs, when the number of states is less than the input alphabet size, the state-based mapping has less communication cost. This coincides with the results of Propositions 2 and 3.

Computation cost analysis
The computation cost is the time required to execute a MapReduce job. The graphs below present comparative results in terms of the execution time of the proposed methods when varying different parameters such as the number of states |S|, the input alphabets size |I|, the output alphabet size |O|, the degree of nondeterminism D, the range R, and the total number of transitions. We note that we provide a real execution time without any inference computation rules. The Figs. 6, 7, 8, 9 and 10 show increasing curves that present the execution time of the proposed methods for the intersection and the derivation steps for different data sets when varying the input alphabet size in Fig. 6, the number of states in Fig. 7, the degree of nondeterminism in Fig. 8, the range in Fig. 9 and the number of transitions in Fig. 10. The input alphabet method is more efficient when the alphabet size is less than or equal to the number of available reducers. The change of the FSM parameters has little effect on the performance of the symbol-based method because each reducer receives only useful transitions, so the only influence is the increase of the input size (i.e. number of transitions). However, if we have a large number of resources, it involves a waste of them, and each reducer can contain multiple transitions having the same input alphabet from different states. The hybrid mapping method is more parallel compared to the other methods, for example, such that each reducer process a few numbers of transitions which leads to a global reduced running time. However, this method takes a lot of time in the mapping phase when replicating the transitions. Otherwise, this may require a large number of reducers compared to available resources. Therefore, a set of reducers has to wait, which implies a rise in the execution time. The states mapping method is the weakest one because it has an important replication rate and so a large number of reducers. Besides, the transitions' intersection is performed inside a reducer which is defined from the associated key. In the derivation step, we propose only one mapping method, which is based on states. The results show that the execution time is nearly linear for all considered parameters. According to the experiments results, minimizing the replication rate decreases the time used by the mappers to replicate each transition, and avoids read/write of the large intermediate results. In the same time, it reduces the number of transitions that are assigned to a reducer. On the other hand, using the adequate number of reducers diminishes the waiting time that a reducer spends to use a CPU. Therefore, we get an optimal parallel MapReduce scheme to produce a set of short distinguishing sequences for a complete observable nondeterministic FSM.

Comparative study
We performed a set of experiments to evaluate the efficiency and scalability of the proposed methods in comparison with the state-of-the-art approach. We used the speedup metric, which is defined as how much faster the parallel method is in comparison to the sequential method, to evaluate the performance of our MapReduce methods with a multi-threads based approach [65]. We conducted experiments in a single node from the cluster described above, a node is a machine equipped with 2 Intel Xeon E5-2630 v4 with 10 cores per processor, 2 threads per core, 256 GB of main memory, and two disk drives (HDD) at 300 GB. For each experiment, we run different methods five times on 40 threads to determine which one gives the best performance and analyze the effect of the number of transitions and range nondeterminism of the FSM on the speedup test. The comparison of the three MapReduce methods proposed previously in the intersection step of our solution with a multi-threads parallel implementation based on OpenMP of the sequential Exact Algorithm (EA) step on a multicore CPU [66] using OMP4J [67].
OMP4J is an open-source implementation of OpenMP and is used as a preprocessor for Java. The experiments confirm previous results, in Fig. 11 depicts the speedup for different datasets from Table 2. It can be seen that the mapping method based on input alphabets (Symbol) is the best and effective for different datasets. Figure 12 shows the speedup according to the ranges of nondeterminism R 50-60, 60-70, 70-80consistent performance gains that the speedup of the alphabets-based method is better than the three other methods under different circumstances. This performance is due to the low cost of the replication rate between map and reduce, and each reducer receives only useful transitions. Besides, the performance of the Symbol method is faster than the OpenMP-based method, and the speedup for this method grows exponentially as the number of transitions and range nondeterminism of the FSM increases.

Fig. 5 Communication cost of the three proposed methods in the intersection step
We have obtained satisfactory results proving that the proposed approach for deriving short DSs from nondeterministic FSMs is well adapted in a large-scale context. However, MapReduce is not designed for iterative processing, and the data has to be written onto the disk after every iteration, thus making the disk I/O a huge bottleneck. To combine the results from different nodes after every iteration presents a significant challenge due to the complex network structure. To overcome this limitation in the derivation step, we can use I2 MapReduce framework [68], which introduces a MapReduce Bipartite Graph model to represent iterative and incremental computations, which contains a loop between mappers and reducers. In this work, we addressed the scalability issue encountered while deriving distinguishing sequences from complete observable nondeterministic finite state machines Execution times for different ranges (FSMs) by introducing a massively parallel MapReduce version of the well-known Exact Algorithm, with experiments showing that this scaled much better than a classical generation algorithm. Our approach is based on two MapReduce steps: the intersection step and the derivation of short distinguishing sequences step. In the first step, we have proposed three MapReduce methods based respectively on a mapping based on states, a mapping based on input alphabets, and a hybrid mapping based on both states and input alphabets. The introduction of three methods is justified by the fact that the required time of this step takes about 96% of the whole EA's time. In the second step, an iterative MapReduce algorithm is introduced to derive a set of short distinguishing sequences. For both steps, we have analyzed the communication cost using Afrati et al. model that offers a formal aspect of an inherent trade-off between communication cost and parallelism degree in a distributed computing environment.
We performed experiments with randomly generated FSMs, the implementations are assessed with respect to communication cost, execution time, and speedup. During the experiments, we compared the results of the proposed algorithms with multiple threads based algorithm in terms of speedup and found that the proposed algorithm is efficient and much more scalable: Our MapReduce algorithm was able to process FSMs having 5 million transitions, which is up to 150 times larger than the existing PDS generation algorithms and with a speedup 2 6 more than OpenMP based intersection algorithm.
One particular line of future work is to investigate our parallel MapReduce-based algorithm for deriving other checking sequences used in test generation such as UIO sequences [17], characterizing sets, harmonized state identifiers [18] and synchronizing sequences [19]. Finally, there would also be value in additional experiments with FSMs from the industry.