Skip to main content

Table 8 Definition of the number of buckets (Partitioning and Bucketing)

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data model

Scenario

SF

Partition size

HDFS block (128 MB)

At least 1 GB

SS

P = Od_Year

B = Orderkey

30

828 MB

\( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)

100

2844 MB

\( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets

\( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)

300

8670 MB

\( \frac{{8670 {\text{MB}}}}{{128 {\text{MB}}}} \cong 68\) buckets

\( \frac{{8670 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)

P = S_Region

B = Suppkey

30

879 MB

\( \frac{{879 {\text{MB}}}}{{128 {\text{MB}}}} \cong 7\varvec{ }\,buckets \)

100

3306 MB

\( \frac{{3306 {\text{MB}}}}{{128 {\text{MB}}}} \cong 26\) buckets

\( \frac{{3306 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)

300

9830 MB

\( \frac{{9830 {\text{MB}}}}{{128 {\text{MB}}}} \cong 77\) buckets

\( \frac{{9830 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)

DT

P = Od_Year

B = P_Brand

30

828 MB

\( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)

100

2844 MB

\( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets

\( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)

P = Od_Year S_Region

B = P_Brand

30

240 MB

\( \frac{{240 {\text{MB}}}}{{128 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)

100

1815 MB

\( \frac{{1815 {\text{MB}}}}{{128 {\text{MB}}}} \cong 14\) buckets

\( \frac{{1815 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)

  1. Italic values highlight the approach used for the definition of the number of buckets