Skip to main content

Table 1 Summary of existing traditional heuristic based data hiding techniques

From: Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

S. no Approach Technique used Achieved Issues
1 [11] Constructed lattice-like graph of a dataset
Greedy iterative traversal to immediate subset
Selected victim with maximum support
Good privacy level
Simple and fast
Have not considered the extent of loss of support for large itemsets hence data quality get affected
Scalability to handle large-scale data
2 [14] Increase support of antecedent \((A) \to B\)
Decrease support of consequent \(A \to (B)\)
Hybrid decrement till confidence or support goes below the threshold
Decrease the support or confidence but not both Based on strong assumption of any item contained in one sensitive itemset will not appear in another sensitive itemset
3 [15] Deleted maximum support item \(i \in s\) from minimum length transaction
The second algorithm sort sensitive itemset in terms of size and support of itemsets and mask them in round robin fashion
Sanitize minimum length transaction first in order to decrease the side effect on non-sensitive data
Second algorithm is fair enough by masking in round robin fashion
Scalability is still an issue
High execution time for large dataset
4 [9] MaxFIA: deleted maximum support item \(i \in s\) where \(s \subseteq T\)
MinFIA: Delete minimum support item \(i \in s\) where \(s \subseteq T\)
IGA: make clusters of sensitive patterns sharing same itemsets and delete max or min support item
Cluster formation hides the set of sensitive itemsets at once
No traversal required, easily count and select max and min support item
Sensitive dataset is separated out in order to reduce the data size and sanitization time
Issue of scalability and high execution time in case of large scale dataset
5 [10] SWA mask sensitive rules by hiding maximum frequency item \(i \in s\) where \(s \subseteq T\) SWA requires single database scan Conceal all the sensitive rules
Require single database scan
Sliding window concept made the approach scalable to some extent
High execution time as well as scalability when data is big data
6 [12] Aggregate: deleted transaction \(T\cap S\) supporting maximum sensitive itemsets
Disaggregate: delete maximum support item \(i \in s\) where \(s \subseteq T\) from remaining transactions
Hybrid approach is fast as it selectively identify transaction and delete maximum support item Direct deletion of transactions affects the data quality as the transactions may contain non-sensitive information too
Scalability issue exists