Skip to main content

Table 2 Column-oriented databases

From: Modeling temporal aspects of sensor data for MongoDB NoSQL database

Name

Data model

Scalability

Description

Who uses it

Cassandra from Facebook, Apache [57]

Multi-dimensional column family is a set of rows; Partition (single or more row) key: identify a partition; Row key: identify row in column family

Partitioning, replication, availability

Multi-master no point of failure; in-memory with disk persistence; masterless; query method: CQL anf Thrift; MapReduce; secondary indexing; eventual consistency

CERN, Comcast, GitHub, GoDaddy, Hulu,eBay Netflix

HBase from Apache [72]

Tables have rows and columns; rows: row key and one or more columns; column: consist of column family

Auto sharding; asynchronous replication; availability

BigTable based; Hadoop Distributed File System (HDFS); MapReduce; consistent read/writes; failover support; Thrift anf REST-ful; ZooKeeper

Adobe, Kakao, Facebook, Flurry, LinkedIn, Netflix, Sears

Druid from http://druid.io

Columns are one of three types: a timestamp, a dimension, or a measure. Nested dimensions

Low latency; replication; sharding

Highly optimized for scans and aggregates ; MapReduce; fault-tolerant; ZooKeeper; index structures, not ACID

Alibaba, Cisco, eBay Netflix, Paypal, Yahoo

Accumulo from Apache [73]

Keys-values both byte arrays, timestamp as long; adds new key element of column visibility; sorts keys by element, secondary indexes

Sharding, replication, persistence, fault tolerant

BigTable based Java technology, top of Hadoop, ZooKeeper and Thrift; Map-Reduce; Zookeeper (multi-master) locks for consistency; cell-level access

US National Security Agency (NSA)