Computer Science and Information Systems 2019 Volume 16, Issue 3, Pages: 705-731
https://doi.org/10.2298/CSIS180915023L
Full text ( 737 KB)
Cited by


Research on improved privacy publishing algorithm based on set cover

Lv Haoze (School of Information Science and Technology, Dalian Maritime University China)
Liu Zhaobin (School of Information Science and Technology, Dalian Maritime University China)
Hu Zhonglian (School of Information Science and Technology, Dalian Maritime University China)
Nie Lihai (Division of Intelligence and Computing, Tianjin University China)
Liu Weijiang (School of Information Science and Technology, Dalian Maritime University China)
Ye Xinfeng (Department of Computer Science, University of Auckland New Zealand)

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Keywords: differential privacy, set cover, frequent itemsets, marginal table