From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
Data organization strategy | Data model | Attributes | Decrease in processing time | Decrease in CPU usage |
---|---|---|---|---|
Multiple partitioning | SS-P DT-P | “Od_Year”, “S_Region” | Yes | Yes |
SS-P | “S_Region”, “S_Nation”, “S_City” | Yes | NA | |
Bucketing | SS-B DT-B | “Orderkey” | No | No |
DT-B | “Od_Year”, “P_Brand” | Yes | Yes | |
SS-B | “Suppkey” | Yes (Hive) No (Presto) | No (SF = 100) Yes (Hive, SF = 300) | |
SS-B | “Orderdate”, “Custkey”, “Suppkey”, “Partkey” | No | No | |
Partitioning and bucketing | SS-PB | “Od_Year”, “Orderkey” | No | NA |
SS-PB | “S_Region”, “Suppkey” | Yes | NA | |
SS-PB | “Od_Year”, “S_Region”, “Suppkey” | Yes | Yes | |
DT-PB | “Od_Year”, “P_Brand” | Yes | NA | |
DT-PB | “Od_Year”, S_Region”, “Suppkey”. | Yes | Yes |