Privacy‐preserving data‐mining through micro‐aggregation for web‐based e‐commerce
Abstract
Purpose
The purpose of this paper is to anonymize web server log files used in e‐commerce web mining processes.
Design/methodology/approach
The paper has applied statistical disclosure control (SDC) techniques to achieve its goal. More precisely, it has introduced the micro‐aggregation of web access logs.
Findings
The experiments show that the proposed technique provides good results in general, but it is especially outstanding when dealing with relatively small websites.
Research limitations/implications
As in all SDC techniques there is always a trade‐off between privacy and utility or, in other words, between disclosure risk and information loss. In this proposal, it has borne this issue in mind, providing k‐anonymity, while preserving acceptable information accuracy.
Practical implications
Web server logs are valuable information used nowadays for user profiling and general data‐mining analysis of a website in e‐commerce and e‐services. This proposal allows anonymizing such logs, so they can be safely outsourced to other companies for marketing purposes, stored for further analysis, or made publicly available, without risking customer privacy.
Originality/value
Current solutions to the problem presented here are very poor and scarce. They are normally reduced to the elimination of sensitive information from query strings of URLs in general. Moreover, to its knowledge, the use of SDC techniques has never been applied to the anonymization of web logs.
Keywords
Citation
Navarro‐Arribas, G. and Torra, V. (2010), "Privacy‐preserving data‐mining through micro‐aggregation for web‐based e‐commerce", Internet Research, Vol. 20 No. 3, pp. 366-384. https://doi.org/10.1108/10662241011050759
Publisher
:Emerald Group Publishing Limited
Copyright © 2010, Emerald Group Publishing Limited