Skip to main content

Table 14 Data organization strategies and their impact on processing time and CPU usage (NA: Not Available)

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data organization strategy

Data model

Attributes

Decrease in processing time

Decrease in CPU usage

Multiple partitioning

SS-P

DT-P

“Od_Year”, “S_Region”

Yes

Yes

SS-P

“S_Region”, “S_Nation”, “S_City”

Yes

NA

Bucketing

SS-B

DT-B

“Orderkey”

No

No

DT-B

“Od_Year”, “P_Brand”

Yes

Yes

SS-B

“Suppkey”

Yes (Hive)

No (Presto)

No (SF = 100)

Yes (Hive, SF = 300)

SS-B

“Orderdate”, “Custkey”, “Suppkey”, “Partkey”

No

No

Partitioning and bucketing

SS-PB

“Od_Year”, “Orderkey”

No

NA

SS-PB

“S_Region”, “Suppkey”

Yes

NA

SS-PB

“Od_Year”, “S_Region”, “Suppkey”

Yes

Yes

DT-PB

“Od_Year”, “P_Brand”

Yes

NA

DT-PB

“Od_Year”, S_Region”, “Suppkey”.

Yes

Yes