Skip to main content

Table 8 Definition of the number of buckets (Partitioning and Bucketing)

From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Data model Scenario SF Partition size HDFS block (128 MB) At least 1 GB
SS P = Od_Year
B = Orderkey
30 828 MB \( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)
100 2844 MB \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets \( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
300 8670 MB \( \frac{{8670 {\text{MB}}}}{{128 {\text{MB}}}} \cong 68\) buckets \( \frac{{8670 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)
P = S_Region
B = Suppkey
30 879 MB \( \frac{{879 {\text{MB}}}}{{128 {\text{MB}}}} \cong 7\varvec{ }\,buckets \)
100 3306 MB \( \frac{{3306 {\text{MB}}}}{{128 {\text{MB}}}} \cong 26\) buckets \( \frac{{3306 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
300 9830 MB \( \frac{{9830 {\text{MB}}}}{{128 {\text{MB}}}} \cong 77\) buckets \( \frac{{9830 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)
DT P = Od_Year
B = P_Brand
30 828 MB \( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)
100 2844 MB \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets \( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
P = Od_Year S_Region
B = P_Brand
30 240 MB \( \frac{{240 {\text{MB}}}}{{128 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)
100 1815 MB \( \frac{{1815 {\text{MB}}}}{{128 {\text{MB}}}} \cong 14\) buckets \( \frac{{1815 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)
  1. Italic values highlight the approach used for the definition of the number of buckets