Skip to main content

Table 21 Role of the attributes in the data organization strategies and their impact on processing time and CPU usage

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data organization strategy

Data model

Attributes

Decrease in processing time

Decrease in CPU usage

Role of the attributes

Multiple partitioning

SS-P

DT-P

“Od_Year”

“S_Region”

Yes

Yes

Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses

SS-P

“S_Region”

“S_Nation”

“S_City”

Yes

NA

Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses

Bucketing

SS-B

DT-B

“Orderkey”

No

No

Attribute not used in the “where” conditions nor used for “group by” or “order by

DT-B

“Od_Year”

“P_Brand”

Yes

Yes

Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses

SS-B

“Suppkey”

Yes (Hive)

No (Presto)

No (SF = 100)

Yes (Hive, SF = 300)

Attribute not used in the “where” conditions nor used for “group by” or “order by”. Attribute used for joining tables

SS-B

“Orderdate”

“Custkey”

“Suppkey”

“Partkey”

No

No

Attributes not used in the “where” conditions nor used for “group by” or “order by”. Attributes used for joining tables

Partitioning and bucketing

SS-PB

“Od_Year”

“Orderkey”

No

NA

Only “Od_Year” is used in the “where” conditions, and in the “group by” and “order by” clauses

SS-PB

“S_Region”

“Suppkey”

Yes

NA

Only “S_Region” is used in the “where” conditions. “Suppkey” is used for joining tables

SS-PB

DT-PB

“Od_Year”

“S_Region”

“Suppkey”

Yes

Yes

“Od_Year” and “S_Region” are used in the “where” conditions, and “Od_Year” is also used in the “group by” and “order by” clauses.

“Suppkey” is used for joining tables in the SS-PB scenario

DT-PB

“Od_Year”

“P_Brand”

Yes

NA

Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses

  1. NA, not available