Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

Sharma, Shivani; Toshniwal, Durga

doi:10.1186/s40537-017-0064-9

Journal of Big Data

Table 1 Summary of existing traditional heuristic based data hiding techniques

From: Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

S. no	Approach	Technique used	Achieved	Issues
1	[11]	Constructed lattice-like graph of a dataset Greedy iterative traversal to immediate subset Selected victim with maximum support	Good privacy level Simple and fast	Have not considered the extent of loss of support for large itemsets hence data quality get affected Scalability to handle large-scale data
2	[14]	Increase support of antecedent \((A) \to B\) Decrease support of consequent \(A \to (B)\) Hybrid decrement till confidence or support goes below the threshold	Decrease the support or confidence but not both	Based on strong assumption of any item contained in one sensitive itemset will not appear in another sensitive itemset Scalability
3	[15]	Deleted maximum support item \(i \in s\) from minimum length transaction The second algorithm sort sensitive itemset in terms of size and support of itemsets and mask them in round robin fashion	Sanitize minimum length transaction first in order to decrease the side effect on non-sensitive data Second algorithm is fair enough by masking in round robin fashion	Scalability is still an issue High execution time for large dataset
4	[9]	MaxFIA: deleted maximum support item \(i \in s\) where \(s \subseteq T\) MinFIA: Delete minimum support item \(i \in s\) where \(s \subseteq T\) IGA: make clusters of sensitive patterns sharing same itemsets and delete max or min support item	Cluster formation hides the set of sensitive itemsets at once No traversal required, easily count and select max and min support item Sensitive dataset is separated out in order to reduce the data size and sanitization time	Issue of scalability and high execution time in case of large scale dataset
5	[10]	SWA mask sensitive rules by hiding maximum frequency item \(i \in s\) where \(s \subseteq T\) SWA requires single database scan	Conceal all the sensitive rules Require single database scan Sliding window concept made the approach scalable to some extent	High execution time as well as scalability when data is big data
6	[12]	Aggregate: deleted transaction \(T\cap S\) supporting maximum sensitive itemsets Disaggregate: delete maximum support item \(i \in s\) where \(s \subseteq T\) from remaining transactions	Hybrid approach is fast as it selectively identify transaction and delete maximum support item	Direct deletion of transactions affects the data quality as the transactions may contain non-sensitive information too Scalability issue exists

Back to article page