Skip to main content

Table 1 Comparative study between Big SQL, Pig, Hive and JAQL

From: Evaluation of high-level query languages based on MapReduce in Big Data

Characteristic

MR

JAQL

Big SQL

Hive

Pig

Description

A programming model for parallel processing and generating large data sets

A data-flow processing and querying language

A HLQL designed for providing native SQL access for Hadoop

A data warehouse infrastructure for Hadoop

A high-level data flow interface for Hadoop

Language name

MapReduce

Jaql

Ansi-SQL

HiveQL

Pig Latin

Developed by

Google

IBM

IBM

Facebook

Yahoo

Type of language

Data processing paradigm

Data flow

SQL

SQL-like (presenting a declarative language)

Data flow

Evaluation

At runtime

At runtime

At runtime

During compilation

During compilation

Supported data

Structured and unstructured data (for structured data, MR may not be as efficient as Big SQL, Hive and Pig)

JSON and semi-structured

Mostly structured

Mostly structured

Complex

Process category

Batch processing

Dataflow for JSON/batch

Dataflow system OLAP/batch

Data warehouse OLAP/batch

Dataflow/batch

User defined functions

Extendable

Extendable

Extendable

Extendable

Extendable

Schema optional?

Without schema

Yes

No, mandatory

No, mandatory

Yes

Relational complete?

No

No

Yes

Yes

Yes

Turing complete?

Yes

Yes

Yes, when extended UDF

Yes, when extended UDF

Yes, when extended UDF

Source lines of code (mean ratio with MR Java)

7.1%

4.9%

25.4%

21.1%

Join operation

Difficult (it is quite hard to perform a join operation between data sets, and very hard with multiple data sources)

Simple

Simple

Simple

Simple