From: Evaluation of high-level query languages based on MapReduce in Big Data
Characteristic | MR | JAQL | Big SQL | Hive | Pig |
---|---|---|---|---|---|
Description | A programming model for parallel processing and generating large data sets | A data-flow processing and querying language | A HLQL designed for providing native SQL access for Hadoop | A data warehouse infrastructure for Hadoop | A high-level data flow interface for Hadoop |
Language name | MapReduce | Jaql | Ansi-SQL | HiveQL | Pig Latin |
Developed by | IBM | IBM | Yahoo | ||
Type of language | Data processing paradigm | Data flow | SQL | SQL-like (presenting a declarative language) | Data flow |
Evaluation | At runtime | At runtime | At runtime | During compilation | During compilation |
Supported data | Structured and unstructured data (for structured data, MR may not be as efficient as Big SQL, Hive and Pig) | JSON and semi-structured | Mostly structured | Mostly structured | Complex |
Process category | Batch processing | Dataflow for JSON/batch | Dataflow system OLAP/batch | Data warehouse OLAP/batch | Dataflow/batch |
User defined functions | Extendable | Extendable | Extendable | Extendable | Extendable |
Schema optional? | Without schema | Yes | No, mandatory | No, mandatory | Yes |
Relational complete? | No | No | Yes | Yes | Yes |
Turing complete? | Yes | Yes | Yes, when extended UDF | Yes, when extended UDF | Yes, when extended UDF |
Source lines of code (mean ratio with MR Java) | – | 7.1% | 4.9% | 25.4% | 21.1% |
Join operation | Difficult (it is quite hard to perform a join operation between data sets, and very hard with multiple data sources) | Simple | Simple | Simple | Simple |