From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
Data model | Scenario | SF | Partition size | HDFS block (128 MB) | At least 1 GB |
---|---|---|---|---|---|
SS | P = Od_Year B = Orderkey | 30 | 828 MB | \( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \) | – |
100 | 2844 MB | \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets | \( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \) | ||
300 | 8670 MB | \( \frac{{8670 {\text{MB}}}}{{128 {\text{MB}}}} \cong 68\) buckets | \( \frac{{8670 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \) | ||
P = S_Region B = Suppkey | 30 | 879 MB | \( \frac{{879 {\text{MB}}}}{{128 {\text{MB}}}} \cong 7\varvec{ }\,buckets \) | – | |
100 | 3306 MB | \( \frac{{3306 {\text{MB}}}}{{128 {\text{MB}}}} \cong 26\) buckets | \( \frac{{3306 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \) | ||
300 | 9830 MB | \( \frac{{9830 {\text{MB}}}}{{128 {\text{MB}}}} \cong 77\) buckets | \( \frac{{9830 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \) | ||
DT | P = Od_Year B = P_Brand | 30 | 828 MB | \( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \) | – |
100 | 2844 MB | \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets | \( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \) | ||
P = Od_Year S_Region B = P_Brand | 30 | 240 MB | \( \frac{{240 {\text{MB}}}}{{128 {\text{MB}}}} \cong 2\varvec{ }\,buckets \) | – | |
100 | 1815 MB | \( \frac{{1815 {\text{MB}}}}{{128 {\text{MB}}}} \cong 14\) buckets | \( \frac{{1815 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 2\varvec{ }\,buckets \) |