Skip to main content

Table 14 Data organization strategies and their impact on processing time and CPU usage (NA: Not Available)

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data organization strategy Data model Attributes Decrease in processing time Decrease in CPU usage
Multiple partitioning SS-P
DT-P
“Od_Year”, “S_Region” Yes Yes
SS-P “S_Region”, “S_Nation”, “S_City” Yes NA
Bucketing SS-B
DT-B
“Orderkey” No No
DT-B “Od_Year”, “P_Brand” Yes Yes
SS-B “Suppkey” Yes (Hive)
No (Presto)
No (SF = 100)
Yes (Hive, SF = 300)
SS-B “Orderdate”, “Custkey”, “Suppkey”, “Partkey” No No
Partitioning and bucketing SS-PB “Od_Year”, “Orderkey” No NA
SS-PB “S_Region”, “Suppkey” Yes NA
SS-PB “Od_Year”, “S_Region”, “Suppkey” Yes Yes
DT-PB “Od_Year”, “P_Brand” Yes NA
DT-PB “Od_Year”, S_Region”, “Suppkey”. Yes Yes