Skip to main content

Table 1 Feature comparison of Spark-based \(k\)NN spatial extensions

From: Efficient spatial data partitioning for distributed \(k\)NN joins

Feature Magellan GeoSpark LocationSpark STARK Simba SpPart_kNN (Proposed)
Base code Scala Java Scala Scala Scala Scala
Modifies spark’s core No No Yes Yes Yes No
Data partitioning No Grid, R-tree, QuadTree, KDB-Tree Grid, QuadTree Grid, Cost-Based Binary Space Grid, R-Tree, Kd-Tree Grid
\(k\)NN Type None \(k\)NN Point \(k\)NN join \(k\)NN join \(k\)NN join \(k\)NN join
Data pruning No No No Yes Yes Yes
Carry non-spatial data No Yes Yes Yes Yes Yes
Accounts for non-spatial data overhead No No No No Yes Yes
Skew handling level None Partition Query Partition Partition Partition
Indexing options Z-Order Curves None, R-Tree, QuadTree None, Grid, R-tree, QuadTree, IR-tree. None, R-Tree HashMaps, TreeMaps, R-Tree R-Tree and QuadTree
Index persistence No Yes Yes Yes Yes Yes
Spatial objects Point, LineString, Polygon, MultiPoint, MultiPolygon Circle, LineString, Point, Polygon, Rectangle Box and Point Inherited from JTS Point, MBR Point
Geometry library Built-in JTSPlus JTS JTS Built-in Built-in
Spatial operation object accuracy Point only
MBR for others
Point, Polygon, and Circle only partial for LineString and MBR for Rectangles Point only MBR for others Point only MBR for others Point only MBR for others Built-in with spatial operation