This methodology of graph reduction is mainly applied in the process of modeling where these workflow graphs were used to model task processes, several reduction approaches has been proposed before such as [17–19]. These workflow graphs are a directed graph that starts with an initial node and ends with one final node. Which means that these initial nodes have no incoming edges connected to it and the final node doesn’t have any outgoing edges of it. The hassle with these directed graphs is that it has structural conflicts like deadlocks and lack of synchronization [20] may occur if the workflow graph wasn’t designed correctly.
In such cases of large network and where an ideal and a quick response is required it’s preferably as an initial step for us to do is to aim for a graph reduction. Where trying to reduce large amount of data depending on many certain methodologies that are based on Similarity graph reduction (SGR) [21, 22] that will help in solving the main aim that is reducing a graph to a simple structure that will help us to reveal some information that were previously difficult to be discovered or it took a lot of time to figure out this information. Thus we will be reducing the graph on similarity bases and by computing similarity that gives us a final reduction graph. This graph will be based on a certain threshold [23, 24] after reducing our large graph by applying the reduction algorithm.
The use of the shortest path algorithm is a commonly used nowadays and can be seen in a lot of common applications that we use in our daily life as an example the Navigation systems [1]. This made me more interested to research on this topic as improving the response time that is given to us from this algorithm could lead into a lot of better improvements and better decision in a smaller time frame making faster decisions more reliable and in certain cases giving a better response time to the tool where this algorithm will be used, especially if the path consist of a lot of data (nodes, edges) and a lot of alternative paths thus making it take much longer to find the shortest path. Therefore this algorithm has a lot of use and as well a lot of benefit, this made me more motivated to work on this topic.
In this section of the paper we talk about the procedure and the Algorithm used and the steps made to make the reduction and then finding the shortest path after the reduction part has been made.
Network reduction
In the reduction graph that is based on similarity the main aim is to gain a less complex (similarity) graph through the given original graph. As the Author [23] in this paper he defined the SGR were the formal definition he defines is as follows:
Definition 4
[Similarity graph reduction (SGR)] Given an similarity graph, G = (V, E), the goal of SGR is to generate a Gʹ = (Vʹ, Eʹ) such that it suffices Vʹ < V and Eʹ < E.
As shown from the definition of the author [22] that if a similarity value of a couple of nodes is adequately high then we can assume that the distance between both the nodes is negligible. Thus let’s assume two nodes that are identical have a similarity value being 1 hence their distance will be assumed as 0 this represents a complete overlapping graph. Therefore the author introduces another definition that will help in proceeding without loss in generality.
Definition 5
[Similarity clique (SC)] Given a set of nodes, N and N ≥ 2, N is said to form a similarity clique if and only if both of the following two conditions hold: (1) N forms a complete similarity graph; (2) The similarity value on each pair of the edges of the complete graph is sufficiently large, such that S (u, v) ≥ θ where u, v \(\in\) N and u ≠ v.
This section of reduction aims to find a similarity clique [22] for situations that has more than two nodes it does so by presenting an arbitrary threshold value θ. The threshold value will assist in controlling the SC’s similarity level through adjusting the value of θ.
The similarity clique collects groups of nodes that have negligible distances. Thus for the aim of reduction we need the maximal similarity clique (MSC) that’s defined as the following:
Definition 6
[Maximal similarity clique (MSC)] A similarity clique is said to be a maximal similarity clique if it violates the necessary and sufficient conditions of being a similarity clique by adding one more adjacent vertex.
Then from this result we try to combine the entire maximal similarity clique [22] into a single node thus making the number of nodes and as well as the number of edges in the graph to be reduced. As the author in paper [22] explained the main reason for this reduction is because with every knode maximal similarity clique its replaced with a single node where there is at least \(\frac{k2  k}{2}\) edges and a k − 1 nodes reduced from the original graph.
Shortest path
Shortest path on large graphs might not be an easy task as there are a lot of nodes and alternatives in that graph thus it takes a lot of time and as well as computational efforts to find the shortest path [1, 6]. By using Dijkstra’s algorithm to calculate the shortest path between two nodes in a graph has the asymptotic runtime complexity of O (m + nlog (n)), where n is the number of nodes and m is the number of edges. We will insert the network graph that will be used in Dijkstra’s method it will use the reduced graph from SGR the method we used before thus reducing the size of the graph and will help in giving a better upgrade in performance wise in Dijkstra’s method.
Principle of dijkstra’s shortest path
The main hypothesis of this algorithm works the following way:

Select the source vertex.

Define a set N of nodes; Initialize it to an empty set or infinite. Along the Algorithm and as it moves along these set N will be updated with those nodes that the shortest path has found.

Start the source Node with 0, and then insert it into N.

Then we consider each node not in N that it’s connected by an edge from the newly inserted Node. Label the node not in N with the label that’s inserted newly into the node and add the length of the edge. (If the node is not in N its new label will be the minimum)

Select the vertex that’s not in N with the least label and add it to N.

Repeat these steps from step 4 till the Node required is added to N or the labeled node is not available in N.
Thus if the final destination that is required is labeled, then its label will be the distance from the source till the destination. If it isn’t labeled, thus we can assume that there isn’t a path from the source to the destination required.
As shown in Fig. 1 an undirected graph network that’s weighted to clarify the procedure from this sample graph we are able to create the adjacency matrix and a distance matrix as well by creating a square matrix of size N × N (d
_{
i j
}).
The matrices will be created from the input data and visualized as shown in Fig. 1 where these matrices helps also in the process of analysis of the graph either it’s a social network graph that represents people and their communication with each other, friend connection on social network or even flight records analysis or all flights from one airport to another these helps us to have a better understanding and a better means for structural analysis and visualization techniques such as the work presented in [11, 25, 26]. Complex large scale networks are massive in size of data and the information that they hold and contain. Analysis and the interactivity of a network and in particularly network that deal with large scale “Big Data” have vital importance. As our networks are evolving incessantly and constantly tools that are used must be scalable and must maintain a superior visualization and interactivity as visualization plays an important role in understanding the big picture of the network as well as reveals to us hidden factors that weren’t made clear before therefore these methods are a necessity.
As shown in Fig. 2a and in Fig. 2b the matrices of the graph in Fig. 1 created where in Fig. 2a the matrices represent 0’s and 1’s where 0’s represents no edges between certain nodes and 1’s represent a weighted edge available between these nodes. As for the matrix in Fig. 2b represent the distance matrix for the graph for calculation of the path where it updates the matrix with each loop as explained later in the algorithm section of this paper, after the updating the distance matrix the measures or the weights of distance in the graph are important indicators in the process of statistical analysis. It quantifies dissimilarity between sample data for numerical computation. These distance methods data using a matrix of pairwise distances.
Inserting the weights to the matrix in Fig. 2a yields the distance matrix that makes it easier to calculate many analysis functions that are important such as clustering coefficient, degree distribution, the distance matrix created in Fig. 3 is an appropriate way to represent the dataset and make it much more uncomplicated and simpler to analyze the dataset and to visualize it. Splaying the dataset onto an adjacency matrix helps to display the graph and gather all the information that the dataset would have and thus can be represented in a more efficient way.
In Fig. 4 we use the same network graph used in Fig. 1 to show how the steps are covered during analysis of the graph to calculate the path between nodes. As shown in Fig. 4 we use node A as the start node and set the final destination as node F in the second step we calculate the lowest cost to reach the next node from node A and so on. As explained in the figure relating to the algorithm stages Figs. 1–4 expressed the phases from the start node A followed by node C then to node D and finally to the selected destination F traversing through the start node to the destination in the shortest possible path.