Skip to main content

Table 1 Summary of existing traditional heuristic based data hiding techniques

From: Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

S. no

Approach

Technique used

Achieved

Issues

1

[11]

Constructed lattice-like graph of a dataset

Greedy iterative traversal to immediate subset

Selected victim with maximum support

Good privacy level

Simple and fast

Have not considered the extent of loss of support for large itemsets hence data quality get affected

Scalability to handle large-scale data

2

[14]

Increase support of antecedent \((A) \to B\)

Decrease support of consequent \(A \to (B)\)

Hybrid decrement till confidence or support goes below the threshold

Decrease the support or confidence but not both

Based on strong assumption of any item contained in one sensitive itemset will not appear in another sensitive itemset

Scalability

3

[15]

Deleted maximum support item \(i \in s\) from minimum length transaction

The second algorithm sort sensitive itemset in terms of size and support of itemsets and mask them in round robin fashion

Sanitize minimum length transaction first in order to decrease the side effect on non-sensitive data

Second algorithm is fair enough by masking in round robin fashion

Scalability is still an issue

High execution time for large dataset

4

[9]

MaxFIA: deleted maximum support item \(i \in s\) where \(s \subseteq T\)

MinFIA: Delete minimum support item \(i \in s\) where \(s \subseteq T\)

IGA: make clusters of sensitive patterns sharing same itemsets and delete max or min support item

Cluster formation hides the set of sensitive itemsets at once

No traversal required, easily count and select max and min support item

Sensitive dataset is separated out in order to reduce the data size and sanitization time

Issue of scalability and high execution time in case of large scale dataset

5

[10]

SWA mask sensitive rules by hiding maximum frequency item \(i \in s\) where \(s \subseteq T\) SWA requires single database scan

Conceal all the sensitive rules

Require single database scan

Sliding window concept made the approach scalable to some extent

High execution time as well as scalability when data is big data

6

[12]

Aggregate: deleted transaction \(T\cap S\) supporting maximum sensitive itemsets

Disaggregate: delete maximum support item \(i \in s\) where \(s \subseteq T\) from remaining transactions

Hybrid approach is fast as it selectively identify transaction and delete maximum support item

Direct deletion of transactions affects the data quality as the transactions may contain non-sensitive information too

Scalability issue exists