Skip to main content

Table 5 Summary on recent approaches used in big data privacy

From: Big data privacy: a technological perspective and review

S.No

Research paper

Publication and year

Focus

Limitations

1

“Toward Efficient and Privacy Preserving Computing in Big Data Era” [38]

IEEE Network July/Aug 2014

Introduced an efficient and privacy-preserving cosine similarity computing protocol

Need significant research efforts for addressing unique Privacy issues in some specific big data analytics

2

“Hiding a needle in a Haystack: privacy preserving Apriori algorithm in map reduce framework” [46]

ACM Nov 7, 2014

Proposed the privacy preserving data mining technique in Hadoop i.e. solve privacy violation without utility degradation

Execution time of proposed technique is affected by noise size

3

“Making big data, privacy, and anonymization work together in the enterprise: experiences and issues” [41]

IEEE International Congress 2014

Discusses experiences and issues encountered when successfully combined anonymization, privacy protection, and Big Data techniques to analyse usage data while protecting the identities of users

Uses K-anonymity technique which is vulnerable to correlation attack

4

“Microsoft Differential Privacy for Everyone” [40]

Microsoft Research 2015

Discussed and suggested how an existing approach “differential privacy” is suitable for big data

This method total depends on calculation of the amount of noise by the curator. So if curator is compromised the whole system fails

5

“A scalable two-phase top-down specialization approach for data anonymization using MapReduce on cloud” [69]

IEEE transactions on parallel and distributed systems 2014

Proposed a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the Map Reduce framework on cloud

Uses anonymization technique which is vulnerable to correlation attack

6

“HireSome-II: towards privacy-aware cross-cloud service composition for big data applications” [74]

IEEE transactions on parallel and distributed systems 2014

Proposed a privacy-aware cross-cloud service composition method, named HireSome-II (History record-based Service optimization method) based on its previous basic version HireSome-I

 

7

Protection of big data privacy [7]

IEEE translations 2016

Proposed various privacy issues dealing with big data applications

Customer segmentation and profiling can easily lead to discrimination based on age gender, ethnic background, health condition, social, background, and so on

8

Fast anonymization of big data streams [55]

ACM August, 2014

Proposed an anonymization algorithm (FAST) to speed up anonymization of big data streams

Further research required to design and implement

FAST in a distributed cloud-based framework in order to gain cloud computation power and achieve high scalability

9

Privacy preserving Ciphertext multi-sharing control for big data storage [75]

IEEE Transactions on informatics Forensics and Security 2015

Proposed a privacy-preserving Ciphertext multi-sharing mechanism

The proxy can create delegation rights between the two parties which have never agreed upon the delegation process

10

Privacy-preserving machine learning algorithms for big data systems [76]

IEEE international conference on distributed computing systems 2015

Proposed a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion of large volume

Not able to achieve distributed feature selection

11

Privacy-preserving big data publishing [50]

ACM June–July 2015

Proposed approach towards privacy-preserving data mining of very massive data sets using MapReduce

Generalization is unable to handle high dimensional data, it reduces data utility. Perturbation reduces utility of data

12

Proximity-aware local-recoding anonymization with map reduce for scalable big data privacy preservation in cloud [70]

IEEE Transactions on computer August 2015

Model the problem of big data local recoding against proximity privacy breaches as a proximity-aware clustering problem, and propose a scalable two-phase clustering approach accordingly

Further research to integrate our approach with Apache

Mahout to achieve highly scalable privacy preserving big data mining or analytics

13

Deduplication on encrypted big data in cloud [77]

IEEE transactions on big data 2016

Proposed a practical scheme to manage the encrypted big data in cloud with deduplication based on ownership challenge and Proxy Re-Encryption (PRE)

Convergent encryption(CE) is subject to an inherent security limitation, namely, susceptibility to offline

Brute-force dictionary attacks

14

Security and privacy for storage and computation in cloud computing [22]

International Journal of Science and Research (IJSR) ISSN (Online): 2319–7064

Proposed methodology provides data confidentiality, secure data sharing without Re-encryption, access control for malicious insiders, and forward and backward access control

Limiting the trust level in the cryptographic server (CS)

  1. Provides a list of papers with emphasis on their focus and limitation