From: Efficient spatial data partitioning for distributed \(k\)NN joins
Dataset Name | Short Name | Summary | Observations |
---|---|---|---|
OSM POIÂ [63] | POI | \(\bullet ~38.814\)Â MB \(\bullet ~119,319\) Points | \(\bullet \)Â Open Street Map (OSM) points of interest \(\bullet \)Â New York City (NYC) only \(\bullet \)Â GPS location of buildings, restaurants, shops ... |
NYC Bus Trip Records [64] | BUS | \(\bullet ~22.147\) GB \(\bullet ~221.715\) Mil. Points | \(\bullet \) Similar format to the TAXI dataset but denser (Buses run over fewer city streets) \(\bullet \) Non-uniform distribution (Fig. 1a) \(\bullet \) Good for testing the behavior with locations significantly overloaded than others. |
NYC Taxi Trip Records [65] | TAXI | \(\bullet ~27.738\) GB \(\bullet ~165.114\) Mil. Points | \(\bullet \) Non-uniform distribution (Fig. 1b) \(\bullet \) Ideal for testing techniques that cannot handle the LARGE dataset |
TLC TPEP and LPEP [65] | TLC | \(\bullet ~141.99\) GB \(\bullet ~3.78\) Bil. Points | \(\bullet \) Non-uniform distribution (Fig. 1c) \(\bullet ~10.9\) Mil duplicate records. \(\bullet ~158.9\) Mil unmatchable records. |