Evaluation of high-level query languages based on MapReduce in Big Data

Journal of Big Data

Table 1 Comparative study between Big SQL, Pig, Hive and JAQL

Characteristic	MR	JAQL	Big SQL	Hive	Pig
Description	A programming model for parallel processing and generating large data sets	A data-flow processing and querying language	A HLQL designed for providing native SQL access for Hadoop	A data warehouse infrastructure for Hadoop	A high-level data flow interface for Hadoop
Language name	MapReduce	Jaql	Ansi-SQL	HiveQL	Pig Latin
Developed by	Google	IBM	IBM	Facebook	Yahoo
Type of language	Data processing paradigm	Data flow	SQL	SQL-like (presenting a declarative language)	Data flow
Evaluation	At runtime	At runtime	At runtime	During compilation	During compilation
Supported data	Structured and unstructured data (for structured data, MR may not be as efficient as Big SQL, Hive and Pig)	JSON and semi-structured	Mostly structured	Mostly structured	Complex
Process category	Batch processing	Dataflow for JSON/batch	Dataflow system OLAP/batch	Data warehouse OLAP/batch	Dataflow/batch
User defined functions	Extendable	Extendable	Extendable	Extendable	Extendable
Schema optional?	Without schema	Yes	No, mandatory	No, mandatory	Yes
Relational complete?	No	No	Yes	Yes	Yes
Turing complete?	Yes	Yes	Yes, when extended UDF	Yes, when extended UDF	Yes, when extended UDF
Source lines of code (mean ratio with MR Java)	–	7.1%	4.9%	25.4%	21.1%
Join operation	Difficult (it is quite hard to perform a join operation between data sets, and very hard with multiple data sources)	Simple	Simple	Simple	Simple