Skip to main content

Table 1 Feature comparison of Spark-based \(k\)NN spatial extensions

From: Efficient spatial data partitioning for distributed \(k\)NN joins

Feature

Magellan

GeoSpark

LocationSpark

STARK

Simba

SpPart_kNN (Proposed)

Base code

Scala

Java

Scala

Scala

Scala

Scala

Modifies spark’s core

No

No

Yes

Yes

Yes

No

Data partitioning

No

Grid, R-tree, QuadTree, KDB-Tree

Grid, QuadTree

Grid, Cost-Based Binary Space

Grid, R-Tree, Kd-Tree

Grid

\(k\)NN Type

None

\(k\)NN Point

\(k\)NN join

\(k\)NN join

\(k\)NN join

\(k\)NN join

Data pruning

No

No

No

Yes

Yes

Yes

Carry non-spatial data

No

Yes

Yes

Yes

Yes

Yes

Accounts for non-spatial data overhead

No

No

No

No

Yes

Yes

Skew handling level

None

Partition

Query

Partition

Partition

Partition

Indexing options

Z-Order Curves

None, R-Tree, QuadTree

None, Grid, R-tree, QuadTree, IR-tree.

None, R-Tree

HashMaps, TreeMaps, R-Tree

R-Tree and QuadTree

Index persistence

No

Yes

Yes

Yes

Yes

Yes

Spatial objects

Point, LineString, Polygon, MultiPoint, MultiPolygon

Circle, LineString, Point, Polygon, Rectangle

Box and Point

Inherited from JTS

Point, MBR

Point

Geometry library

Built-in

JTSPlus

JTS

JTS

Built-in

Built-in

Spatial operation object accuracy

Point only

MBR for others

Point, Polygon, and Circle only partial for LineString and MBR for Rectangles

Point only MBR for others

Point only MBR for others

Point only MBR for others

Built-in with spatial operation