Skip to main content

Table 21 Role of the attributes in the data organization strategies and their impact on processing time and CPU usage

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data organization strategy Data model Attributes Decrease in processing time Decrease in CPU usage Role of the attributes
Multiple partitioning SS-P
DT-P
“Od_Year”
“S_Region”
Yes Yes Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses
SS-P “S_Region”
“S_Nation”
“S_City”
Yes NA Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses
Bucketing SS-B
DT-B
“Orderkey” No No Attribute not used in the “where” conditions nor used for “group by” or “order by
DT-B “Od_Year”
“P_Brand”
Yes Yes Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses
SS-B “Suppkey” Yes (Hive)
No (Presto)
No (SF = 100)
Yes (Hive, SF = 300)
Attribute not used in the “where” conditions nor used for “group by” or “order by”. Attribute used for joining tables
SS-B “Orderdate”
“Custkey”
“Suppkey”
“Partkey”
No No Attributes not used in the “where” conditions nor used for “group by” or “order by”. Attributes used for joining tables
Partitioning and bucketing SS-PB “Od_Year”
“Orderkey”
No NA Only “Od_Year” is used in the “where” conditions, and in the “group by” and “order by” clauses
SS-PB “S_Region”
“Suppkey”
Yes NA Only “S_Region” is used in the “where” conditions. “Suppkey” is used for joining tables
SS-PB
DT-PB
“Od_Year”
“S_Region”
“Suppkey”
Yes Yes “Od_Year” and “S_Region” are used in the “where” conditions, and “Od_Year” is also used in the “group by” and “order by” clauses.
“Suppkey” is used for joining tables in the SS-PB scenario
DT-PB “Od_Year”
“P_Brand”
Yes NA Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses
  1. NA, not available