From: Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
Data model | SF | Table size (MB) | HDFS block (128 MB) | At least 1 GB |
---|---|---|---|---|
SS | 30 | 5088 | \( \frac{{5088 {\text{MB}}}}{{128 {\text{MB}}}} \cong 40\, \text{buckets} \) | \( \frac{{5088 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 5\varvec{ }\, \text{buckets} \) |
100 | 16,533 | \( \frac{{16533 {\text{MB}}}}{{128 {\text{MB}}}} \cong 129\, \text{buckets} \) | \( \frac{{16533 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 16\varvec{ }\,\text{buckets} \) | |
300 | 49,700 | \( \frac{{49700 {\text{MB}}}}{{128 {\text{MB}}}} \cong 388\, \text{buckets} \) | \( \frac{{49700 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 49\varvec{ }\,\text{buckets} \) | |
DT | 30 | 14,650 | \( \frac{{14650 {\text{MB}}}}{{128 {\text{MB}}}} \cong 114\, \text{buckets} \) | \( \frac{{14650 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 14\varvec{ }\,\text{buckets} \) |
100 | 46,800 | \( \frac{{46800 {\text{MB}}}}{{128 {\text{MB}}}} \cong 366\, \text{buckets} \) | \( \frac{{46800 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 46\varvec{ }\,\text{buckets} \) |