 Research
 Open Access
The MapReducebased approach to improve the shortest path computation in largescale road networks: the case of A* algorithm
 Wilfried Yves Hamilton Adoni^{1},
 Tarik Nahhal^{1},
 Brahim Aghezzaf^{1}Email author and
 Abdeltif Elbyed^{1}
 Received: 9 February 2018
 Accepted: 20 April 2018
 Published: 3 May 2018
Abstract
This paper deals with an efficient parallel and distributed framework for intensive computation with A* algorithm based on MapReduce concept. The A* algorithm is one of the most popular graph traversal algorithm used in route guidance. It requires exponential time computation and very costly hardware to compute the shortest path on largescale networks. Thus, it is necessary to reduce the time complexity while exploiting a low cost commodity hardwares. To cope with this situation, we propose a novel approach that reduces the A* algorithm into a set of Map and Reduce tasks for running the path computation on Hadoop MapReduce framework. An application on real road networks illustrates the feasibility and reliability of the proposed framework. The experiments performed on a 6node Hadoop cluster proves that the proposed approach outperforms A* algorithm and achieves significant gain in terms of computation time.
Keywords
 Pathfinding
 Largescale network
 A^{*} algorithm
 Big Data
 Hadoop
 MapReduce
 HDFS
 Parallel and distributed computing
Introduction
With the increasing size of road networks (4.4 billions of vertices and 6 billions of uploaded GPS points, according to OpenStreetMap data stats 2018 [1]), there has been vast improvement in hardware architecture for intelligent transportation system. The traditional GPS systems embedded in vehicles are only designed to find the shortest paths in small or medium road networks. With the increase in the size of roadnetworks, implementing efficient GPS programs has become challenging. This is mainly due to impractical computational time taken to compute the optimal path.
The basic algorithms used for the Single Source Shortest Path Problem (SSSPP) are not suited for intensive computation in largescale networks because of long latency time. This is one of the crucial problem of routeguidance systems for highway vehicles including the Vehicle Routing Problem (VPR), Traveling Salesman Problem (TSP) and Pickup and Delivery Problem (PDP).
Currently, there are lot of approach of SSSPP such as label setting, dynamic programming, heuristic and/or bidirectional heuristic. However, they are inefficient when applied to NPcomplete problems due to the large graph size, the hardware requirements and the time complexity. One of the problems emanating from the SSSPP is path finding in large road networks particularly with A* algorithm [2].
A* is mostly used in computer game and artificial intelligence. It is based on heuristic approach and presents great interest in the area of logistics/transportations, bioinformatics and social networks. The main problem is that A* is not adapted for intensive computation on large networks that consisting of millions of vertices and edges. It needs more resources in term of hardware configuration and the computational time increases significantly when we encounter larger graph size. For example, it will be very difficult for car drivers who travels long distances to get high quality solution in order to take a right decisions in some cases where the quick response is a necessity.
In this context, several research studies have been carried out in order to improve the efficiency of graph traversal algorithms based on Big Data technology. This new technology has attracted the attention of business and academic communities (e.g. vehicle controls on big traffic events [3]) because of its ability to meet the 5V (Volume, Velocity, Variety, Veracity and Value) challenges related to shortest path queries in large graphs. Most of the the efficient approaches [4–9] dedicated to these routing problems are based on the concept of parallel and distributed computing provided by Hadoop MapReduce [10].

Firstly, we propose a MapReduce framework that promotes parallel and distributed computing of shortest path in largescale graph with A* algorithm.

Secondly, our experimental analysis proves that the MapReduce version of A* outperforms the direct resolution approach of A*, and significantly reduce the time complexity with high quality results. In addition, our framework is reliable on real road network and works well with the scalability of network size.

Finally, we show by comparison that the proposed MapReduce framework of A* algorithm is more effective than the MapReduce framework of Dijkstra algorithm presented in [6, 7].
Related work
Sequential shortest paths computing
For some time now, the shortest path problem has aroused more interest especially when applying it in the fields of transportation engineering and artificial intelligence. So several pathfinding algorithms have emerged and there is a rich collection of literature in the current state of art [11–15]. The problem of concern is that SSSPP consists of finding the best path from an origin to a destination in graph.
Dijkstra [16] presented a Dijkstra’s algorithm for finding the shortest path from an origintodestination vertex in directed graphs with unbounded nonnegative weights. Dijkstra’s algorithm works as breadthfirst search [17]. It maintains a set of candidate vertices in a temporary queue and tends to expand the search space in all directions. Dijkstra’s algorithm is much faster than Bellman–Ford’s algorithm [15] and runs in \(O(n^2)\) [18–20] but is limited to smallerscale graphs.
In another related work, Xu et al. [21] introduced a Fibonacci head [22] to improve the Dijkstra’s algorithm, this approach allowed to reduce the computational time to \(O(m+n\log (n))\) and is very practical to find the shortest path in graph containing large numbers of vertices. Orlin et al. [23] followed this work by integrating binary heaps to speed up the process of finding edges which minimizes the path length, their contribution allowed to improve the time complexity to \(O(m\log (m))\).
In another early contribution, Ira and Poh [24] proposed a bidirectional search method consisting of partitioning the global search domain under two and compute simultaneously the path on the two subdomains. This approach is inspiring but presents some limitations due to hardware requirements when the search domain became very large. Other classes of pathfinding algorithms that use heuristic approaches aim to reduce the space domain and avoid unpromising vertices. Currently the best known shortest path approach that uses heuristic approach is A* algorithm [2].
Distributed shortest paths computing
When the data is too large, sequential algorithms became traditional and inefficient. In this sense, many related works have been performed to improve the velocity of existing pathfinding algorithms [4–9]. Presently, the most promising strategy for intensive path computation in largescale graph is the parallel and distributed model. The use of this approach comes from the failure to handle big graph with traditional technique [25]. There have been a lot of studies on parallelizing the shortest path algorithms. Work conducted by Djidjev et al. [4] aims to improve by twofold the path computation in large graph with Floyd–Warshall algorithm. The authors proposed a parallel model of Floyd–Warshall based on Graphics Processing Unit (GPU).
Cohen and Jonathan [26] proved that the concept of distributed computing with MapReducebased approach could be applied successfully in largescale graph problems such as graph mining [5] and shortest path problem [6, 7].
MapReduce paradigm attracts more interest in the era of parallel processing and provides an innovative approach for intensive computation on scalefree network [5]. In this scope, Aridhi et al. [5] proposed a parallel and distributed solution for largescale graph mining via the technique of graph partitioning under subgraphs. The experiments revealed that their approach reduces significantly the execution time and works well with the increasing number of cluster nodes (computers).
In recent papers [6, 7], the authors presented the MapReducebased approach for shortest path problem in largescale network. The proposed approach works in four stages including the map and reduce stages. Before the map stage, they had partitioned the graph into subgraphs and mapped them to each node. Next, Dijkstra’s program [16] is running on each machine to generate a set of intermediate paths. Finally in the reduce stage, all intermediate paths are aggregated to obtain the final shortest path. The authors contributions enabled a significant gain in terms of time complexity. In another work, Zhang and Xiong [8] followed the same approach for the search of dynamic path in large road network based on cloud computing. In addition, Seunghyeon et al. [27] proposed a parallel version of Girvan–Newman algorithm based on the concept of Hadoop MapReduce to improve the computational time in largescale network.
Background
Hadoop and MapReduce
According to Hadoop documentation [10], Hadoop is an Apache open source framework inspired by Google File System [28]. It allows parallel processing on distributed data sets across a cluster of multiple nodes connected under a masterslaves architecture. Hadoop consists of two main components: HDFS [28] and MapReduce [29, 30].
The first component is the Hadoop Distributed File System (HDFS). HDFS is designed to support very large file of data sets. It is also distributed, scalable and faulttolerant. The Big Data file uploaded into the HDFS is split into block file with specific size defined by the client and replicated across the cluster nodes. The master node (NameNode) manages the distributed file system, namespace and metadata. While the slave nodes (DataNode) manage the storage of block files and periodically report the status to NameNode.
 1
In the Map stage, the mappers (map tasks) are assigned to slave nodes that host the blocks data. Each mapper takes linebyline the records of its input and transforms them into <key, value> pairs. Next, the map function defined by the user is called to produces another intermediate <key, value> pairs. The intermediate results are sorted locally by keys and sent to the reduce stage when all map tasks are completed.
 2
In the Reduce stage, the reducers (reduce tasks) read the map stage outputs and group all values which share the same key to produce for each key an iterate values <key, iterable[value]>. Next the reduce function defined by the user is applied over the sorted intermediate data sets to produce a set of smaller <key, value> pairs and write finally its result into the HDFS.
A* algorithm
A* or some extended version (HPA*, SMA*, MA* and IDA*) [14, 31, 32] of it, is one of the most used algorithm for SSSPP and was originally presented by Hart et al. [2] in 1986. It can be viewed as an extension of Dijkstra algorithm [16] by adding a heuristic function h that guides the search [11].

V is the finite set of vertices of the graph G.

E is the set of edges, such as: if \((v, u) \in E\), then there is an edge between the vertices v and u.

We define the length function \(l{:V} \times V \mapsto R^+\), which for each edge (v, u), we associate a length l(v, u) if there is an edge between v and u, else \(\infty \) if there is no edge.

For each vertex \(v \in V\), we define a distance \(d{:V}\mapsto R^+\) such as \(d(v)= \infty \) if we cannot reach the goal vertex from v.

Step 1 initialization
set \(O = \emptyset \) and \(S = \emptyset \);
begin by setting \(g(v) = \infty \) for each vertex \(v \in V\);
next set current vertex \(c = s\), \(g(s) = 0\) and \(d(s) = h(s)\);
finally set \(c = s\) and let \(S = \{s\}\);

Step 2 vertex expanding
for each vertex \(v \in V\) where edge \((c ,v) \in E\); if \(g(v) > g(c) + l(c, v)\) then update \(g(v) = g(c) + l(c, v)\); set \(d(v) = g(v) + h(v)\), set \(d(v) = g(v) + h(v)\) and when \(v \notin O\) let \(O = O+\{v\}\);

Step 3 selection of promising vertex \(v^*\)
identify vertex \(v^*\in O\) where \(d(v^*) \le d(v)\) for all \(v \in O\); set \(O = O\{v^*\}\) and \(S = S+\{v^*\}\);
set \(c = v^*\);

Step 4 stopping criteria
if \(c = e\) then the path has been found;
elseif \(O = \emptyset \) then failure;
otherwise go to step 2.
 1.
Reduce the graph size by deleting some unnecessary vertices and edges of the graph;
 2.
Use sophisticated computers equipped with lot of ram memory for data persistence or run A* with multitasks approach such as HPA* [32];
 3.
Use MapReduce approach to run the A* program under a distributed environment.
Proposed MapReduce version of A*
 1.
Input stage: partition of the initial graph
 2.
Map stage: computation of intermediate paths
 3.
Reduce stage: concatenation of intermediate paths
 4.
Output stage: storage of full path
Input stage: partition of the initial graph

l: subgraph length in km;

A: source vertex;

E: target vertex.
Algorithm 2 describes the Euclidean_Distance function for computing the distance between two points of the graph. It takes as input the starting and ending vertices (points A and E) and applies the Euclidean formula [2] to determine the distance L.

d: diagonal length in km;

i: ith subgraph;

(A.lon, A.lat): GPS coordinates of the source vertex (point A);

(E.lon, E.lat): GPS coordinates of the target vertex (point E).
The first step (lines 11–12) consists of converting the GPS coordinates of the input points into cartesian coordinates. This step is necessary because it ensures high accuracy on large road network. In the second step (lines 14–16), we use the converted coordinates to calculate the normal vector (a, b, c) of the plane OAB (see Fig. 5a) assuming that O is the center of the earth with its cartesian coordinates (0, 0, 0). In the third step (lines 18–21), the obtained vector is used to determine the ending position of the ith subgraph. As shown in Fig. 5b, the point B is the ending point of the 1st subgraph, it partitions the projected arc \(\overset{\frown }{AE}\) into vectors \(\overrightarrow{AB}\) and \(\overrightarrow{BE}\). The length of vector \(\overrightarrow{AB}\) is equal to \(d \times i\). Moreover, the length of vector \(\overrightarrow{AE}\) is equal to \(r \times \alpha \) where r is the earth radius and \(\alpha = \widehat{(\overrightarrow{AO}, \overrightarrow{OE})}\) is the angle between \(\overrightarrow{AO}\) and \(\overrightarrow{OE}\) (see Fig. 5a). Then we can deduce that the angle \(\beta = \widehat{(\overrightarrow{AO}, \overrightarrow{OB})}\) between \(\overrightarrow{AO}\) and \(\overrightarrow{OB}\) is equal to \(d \times r \times i\) [6]. The obtained angle \(\beta \) value is used to compute the cartesian coordinates (B.x, B.y, B.z) of the point B by assuming that the unit of the normal vector (a, b, c) represents the rotation axis. The last step (line 23) is to convert back the cartesian coordinates of the point B into GPS coordinates (B.lon, B.lat) for cartographic projection.
The Create_Subgraph function described in Algorithm 4, is called after finding the position of the subgraph. It takes as input the GPS coordinates of the position and length in km of the subgraph. Next, it builds from the original graph G(E, V) a new subgraph \(G'(E', V')\) where the positions of the vertices are into the boundary M delimited by the starting (point C) and ending position (point D) of the subgraph.
Map stage: computation of intermediate paths
Algorithm 6 describes the Expand_Vertex function, it takes as input parameters the openlist O, the subgraph \(G'\) and the current vertex c to expand. Next, it explores in depth the neighborhood of the current vertex. For each expanded vertex, it evaluates the cost and verifies the triangle equality before adding it to the openlist.
Algorithm 7 describes the Select_Vertex function, it takes as input parameters the openlist O and the closelist S. It returns the most promising vertex \(v^*\) where \(d(v^*) \le d(v)\) for each vertex v in the openlist O.
Algorithm 8 describes the Generate_Path function, it takes as input the closelist S and concatenates the edges between the vertices contained in S in order to build an intermediate path.
Reduce stage: concatenation of intermediate paths
Output Stage: Storage of full path
In the last stage, the master node uploads the full path into the HDFS. Each path is written into a separate file. The full path is obtained by merging the content of the reduce files. To ensure fault tolerance, the cluster copies each file onto a separate peer node according to the replication factor.
Experimental results
Experimental parameter set
Parameter type  Parameter designation  Parameter value 

\(N_{node}\)  No. of cluster nodes  6 
n  No. of graph vertices  [8000, 100,000] 
\(G_{size}\)  Graph size in Gbit  [0.2, 16.6] 
\(B_{size}\)  Block size in Mbit  {64, 128, 256} 
\(G'_{size}\)  Subgraph size in Gbit  [0.3, 2] 
l  Subgraph length in km  [20, 400] 
\(N^{map}_{core}\)  No. of map cores  {1, 2, 3, 4} 
\(N^{red}_{core}\)  No. of reduce cores  {1, 2, 3, 4} 
Data set
The realworld road data used are benchmark data gathered from OpenStreetMap (OSM) spatial database [35]. The data are stored in the .osm.pbf data format, an alternative to XML based formats (KML and GML). The XML file contains points, ways, relations and nested tags in each of these objects. The graph data covers all types of road networks, including local roads, and contains weighted edges to estimate the travel distances/times. We used QGIS Desktop 2.18.3 and JOSM’s (Java OpenStreetMap Editor) tool to extract information. The criteria of filtering is based on ’osm_tab’, we extracted all objects whose tag keys corresponds to ’highway’.
Application: road trip from northern to southern Morocco
Subgraph boundaries
Subgraph  Starting point  Ending point  

Name  Lat  lon  Name  Lat  lon  
1  Tangier  35.759  − 5.818  El Gara  33.24  − 7.15 
2  El Gara  33.24  − 7.15  Ait M’Hamed  31.857  − 6.498 
3  Ait M’Hamed  31.857  − 6.498  Ikiafene  29.663  − 9.638 
4  Ikiafene  29.663  − 9.638  Boukraa  26.341  − 12.841 
5  Boukraa  26.341  − 12.841  Dahkla  23.03  − 15.02 
Ratio of MapReduceA* efficiency versus direct resolution
Comparison of computational time in seconds time between \(A^*\) and \(MRA^*\) into a 1node cluster
n  No. data  \(G_{size}\)  Time with A*  Time with MRA*  Ratio  

\(T_{A^*}\)  \(t_{map}\)  \(t_{red}\)  \(T_{MRA^*}\)  \(\frac{T_{A^*_{ }}}{T_{MRA^*}}\)  
8000  64 × 10^{6}  0.25  1,5752  157  11  168  94 
15,000  22.5 × 10^{7}  0.5  35,331  183  18  201  159 
20,000  40 × 10^{7}  0.8  51,469  233  36  269  191 
25,000  62.5 × 10^{7}  1.2  77,378  330  53  366  211 
30,000  90 × 10^{7}  2  101,909  405  76  481  212 
40,000  16 × 10^{8}  3  129,343  698  91  789  164 
50,000  25 × 10^{8}  5  152,863  1035  103  1137  134 
60,000  36 × 10^{8}  7  177,948  1397  203  1600  111 
80,000  64 × 10^{8}  11.5  269,102  1790  408  2198  122 
90,000  81 × 10^{8}  14  31,4425  2100  568  2668  118 
100,000  10 × 10^{9}  16.6  359,747  2257  682  2939  122 
Average ratio of time improvement  149 
Influence of number of core processors on the computational time
Influence of number of Hadoop nodes on the computational time
Influence of number of subgraphs on the computational time and the result quality
Influence of blocks size and subgraphs length on the computational time
Impact of blocks size and subgraphs length on the computational time in seconds
l  \(G'_{size}\)  Total time \(T_{MRA^*}\)  

\(B_{size} = 64\)  \(B_{size} = 128\)  \(B_{size} = 256\)  
60  0.3  371  376  368 
78  0.4  412  409  409 
100  0.5  463  464  464 
109  0.6  460  517  532 
125  0.7  458  596  605 
156  0.8  459  753  754 
200  1  461  1003  1065 
265  1.3  460  1008  1149 
400  2  461  1004  1164 
Average time  445  681  726 
MapReduceA* versus MapReduceDijkstra
Discussion
In this section, we are discussing about two topics related to the experiment results. The first topic concerns the optimal usage of the cluster. A cluster is defined as optimal if all nodes within the cluster participate in the full path computation. The answer to this question depends on the cluster configuration and the number of generated subgraphs. For the moment there is no formula for determining the optimal number of nodes for a given graph. However, a solution emanating from our experimental study consists of firstly splitting the graph under \(N_{node}\) \(\times \) \(N^{map}_{core}\) subgraphs in the input stage. Secondly, in the map stage we set the number of mappers to be equal to the number of subgraphs (\(N_{map}\) = \(N_{graph}\)). Third, in the reduce stage, we fix the number of reducers to \(N_{node}\) \(\times \) \(N^{red}_{core}\). This solution allows an optimal usage of the cluster but does not guarantee the optimal solution.
The second topic treats the optimality of the obtained solution. As shown in the experimental analysis, the quality of results depends on two parameters: the subgraphs length in km and the blocks size. We have remarked that when we set the subgraphs length l so that the subgraphs size \(G'_{size}\) are on average equal to the blocks size \(B_{size}\), then we get a result without optimality error (\(\epsilon \) = 0). So to conclude, the optimal solution is obtained when \(G'_{size} \approx B_{size}\).
Conclusion and further works

Adapting the traveling salesman problem to such framework.

Proposing a novel framework based on Big Data graph analysis tools such as Neo4j or Apache Shindig to better explore the road network graph.
Declarations
Authors’ contributions
All mentioned authors contribute in the elaboration of the article. All authors read and approved the final manuscript.
Authors’ information
Wilfried Yves Hamilton Adoni received the B.S. degree in Computer Science from Hassan II University of Casablanca, Morocco in 2012. He received the M.E. degree in Operational Research and System Optimization from Hassan II University of Casablanca, Morocco in 2014. He is currently in the final step of its Ph.D. in Computer Science, especially in Big Data technology and smart transportation with the Science Faculty, Hassan University of Casablanca, Morocco. He is currently Temporary Assistant, Teaching and Research, part time at Central School of Casablanca, Morocco. His research interests include Big Data, Graph Database, smart transportation, pathfinding algorithm and traffic flow analysis. He joined the elite group of young Big Data Specialist with IBM BigInsights V2.1 in 2015. Wilfried Adoni can be contacted at: adoniwilfried@gmail.com/wilfried.adoni09@etude.univcasa.ma.
Tarik Nahhal is an Associate Professor of Computer Science in the Faculty of Science at Hassan II University of Casablanca. He holds a PhD in Hybrid System and Artificial Intelligence from the Joseph Fourier University, Grenoble I, France. His research interests include Hybrid System, Cloud Computing, Big Data, NoSQL Database, IoT and Artificial Intelligence. He has animated several Big Data conferences and has published several research articles in peerreviewed international journals and conferences. He is Big Data specialist with IBM Big Data certification. His current research interests focus on high problem related to Big Data complexity. Dr Tarik Nahhal can be contacted at: t.nahhal@fsac.ac.ma.
Brahim Aghezzaf is a Full Professor of Operational Research at Hassan II University of Casablanca. His research interests include optimization, logistics, transportation modeling, multiobjective optimization, heuristics and combinatorial optimization. Dr Brahim Aghezzaf has published several research articles in peerreviewed international journals and conferences. He served several conferences as a program chair and program committee member for many international conferences. He has made several contributions in optimization problems. He holds Big Data certification by IBM and numerous scientific prices in the field of logistics/transportation optimization. Dr Brahim Aghezzaf is the corresponding author and can be contacted at: b.aghezzaf@fsac.ac.ma.
Abdeltif Elbyed is an Associate professor of Computer Science at Hassan II University of Casablanca. His research interests include Elearning, Interoperability, Ontologies, Semantic web, Multiagent system and urban transportation. He has published several research papers. He is also Business Intelligence Specialist. His current research interests reverse logistics and urban mobility. Dr Abdeltif Elbyed can be contacted at: a.elbyed@fsac.ac.ma.
Acknowledgements
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
All supporting data files are open source. Map data used are available and can be accessed directly from OpenStreetMap at http://download.geofabrik.de and informations about the subgraph format of the extracted road networks are available at http://fc.isima.fr/˜lacomme/OR hadoop/.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 OpenStreetMap. OpenStreetMap statistics. https://www.openstreetmap.org/stats/data_stats.html. Accessed 19 Mar 2018.
 Hart PE, Nilsson NJ, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern. 1968;4(2):100–7.View ArticleGoogle Scholar
 Adoni WYH, Nahhal T, Aghezzaf B, Elbyed A. The mapreducebased approach to improve vehicle controls on big traffic events. In: 2017 International colloquium on logistics and supply chain management (LOGISTIQUA). Rabat: IEEE; 2017. p. 1– 6. https://doi.org/10.1109/LOGISTIQUA.2017.7962864.
 Djidjev H, Chapuis G, Andonov R, Thulasidasan S, Lavenier D. Allpairs shortest path algorithms for planar graph for gpuaccelerated clusters. J Parallel Distrib Comput. 2015;85(C):91–103.View ArticleGoogle Scholar
 Aridhi S, d’Orazio L, Maddouri M, Mephu NE. Densitybased data partitioning strategy to approximate largescale subgraph mining. Inf Syst. 2015;48(Supplement C):213–23.View ArticleGoogle Scholar
 Aridhi S, Lacomme P, Benjamin V. A mapreducebased approach for shortest path problem in largescale networks. Eng Appl Artif Intell. 2015;41(C):151–65.View ArticleGoogle Scholar
 Aridhi S, Benjamin V, Lacomme P, Ren L. Shortest path resolution using hadoop. In: MOSIM 2014, 10ème Confèrence Francophone de Modèlisation, Optimisation et Simulation, Nancy, France; 2014.Google Scholar
 Zhang D, Xiong L. The research of dynamic shortest path based on cloud computing. In: 2016 12th international conference on computational intelligence and security (CIS). Wuxi: IEEE; 2016. p. 452–455.Google Scholar
 Plimpton SJ, Devine KD. Mapreduce in mpi for largescale graph algorithms. Parallel Comput. 2011;37(9):610–32.View ArticleGoogle Scholar
 Hadoop, A. Welcome to Apache Hadoop. http://hadoop.apache.org/. Accessed 10 Mar 2017.
 Cherkassky BV, Goldberg AV, Radzik T. Shortest paths algorithms: theory and experimental evaluation. Math Program. 1993;73:129–74.MathSciNetMATHGoogle Scholar
 Fu L, Sun D, Rilett LR. Heuristic shortest path algorithms for transportation applications: state of the art. Comput Oper Res. 2006;33(11):3324–43.View ArticleMATHGoogle Scholar
 Schrijver A. On the history of the shortest path problem. Documenta Math ismp. 2012;17:155–67.MathSciNetMATHGoogle Scholar
 Goldberg AV, Harrelson C. Computing the shortest path: a search meets graph theory. In: Proceedings of the sixteenth annual ACMSIAM symposium on discrete algorithms. Philadelphia: Society for Industrial and Applied Mathematics; 2005. p. 156–165.Google Scholar
 Bellman R. On a routing problem. Quart Appl Math. 1958;16(1):87–90.MathSciNetView ArticleMATHGoogle Scholar
 Dijkstra EW. A note on two problems in connexion with graphs. Numer math. 1959;1(1):269–71.MathSciNetView ArticleMATHGoogle Scholar
 Zhou R, Hansen EA. Breadthfirst heuristic search. Artif Intell. 2006;170(4):385–408.MathSciNetView ArticleMATHGoogle Scholar
 Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. 2nd ed. Cambridge: MIT Press; 2001.MATHGoogle Scholar
 Skiena SS. The algorithm design manual. 2nd ed. London: Springer; 2008.View ArticleMATHGoogle Scholar
 Even S. Graph algorithms. 2nd ed. New York: Cambridge University Press; 2011.View ArticleMATHGoogle Scholar
 Xu MH, Liu YQ, Huang QL, Zhang YX, Luan GF. An improved dijkstra’s shortest path algorithm for sparse network. Appl Math Comput. 2007;185(1):247–54.MathSciNetMATHGoogle Scholar
 Fredman ML, Tarjan RE. Fibonacci heaps and their uses in improved network optimization algorithms. J ACM JACM. 1987;34(3):596–615.MathSciNetView ArticleGoogle Scholar
 Orlin JB, Madduri K, Subramani K, Williamson M. A faster algorithm for the single source shortest path problem with few distinct positive lengths. J Discrete Algorithms. 2010;8(2):189–98.MathSciNetView ArticleMATHGoogle Scholar
 Ira P. Bidirectional search. Mach Intell. 1971;6:127–40.MathSciNetMATHGoogle Scholar
 Inokuchi A, Washio T, Motoda H. An aprioribased algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery. Prague: Springer; 2000. p. 13– 23.Google Scholar
 Cohen J. Graph twiddling in a MapReduce world. Comput Sci Eng. 2009;11(4):29–41.View ArticleGoogle Scholar
 Moon S, Lee JG, Kang M, Choy M, Lee JW. Parallel community detection on large graphs with MapReduce and GraphChi. Data Knowl Eng. 2016;104(Supplement C):17–31.View ArticleGoogle Scholar
 Ghemawat S, Gobioff H, Leung ST. The google file system. In: ACM SIGOPS operating systems review, vol. 37. New York: ACM; 2003. p. 29–43. https://doi.org/10.1145/945445.945450.
 Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.View ArticleGoogle Scholar
 Vavilapalli VK, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H. Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing. Santa Clara: ACM Press; 2013. p. 1– 16.Google Scholar
 Botea A, Müller M, Schaeffer J. Near optimal hierarchical pathfinding. J Game Dev. 2004;1(1):7–28.Google Scholar
 Russell S. Efficient memorybounded search methods. In: Proceedings of the 10th European conference on artificial intelligence. New York: John Wiley & Sons; 1992. p. 1– 5.Google Scholar
 Zeng W, Church RL. Finding shortest paths on real road networks: the case for a*. Int J Geogr Inf Sci. 2009;23(4):531–43.View ArticleGoogle Scholar
 Tarjan R. Depthfirst search and linear graph algorithms. SIAM J Comput. 1972;1(2):146–60.MathSciNetView ArticleMATHGoogle Scholar
 Geofabrik, OpenStreetMap. OpenStreetMap data extract. http://download.geofabrik.de. Accessed 10 Mar 2017.