4.6 Article

A Real-Time Query Log Protection Method for Web Search Engines

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 87393-87413

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2992012

Keywords

Anonymization; data streams; privacy; query logs; web search engines

Funding

  1. CYBERCAT (Center for Cybersecurity Research of Catalonia, Tarragona, Catalonia)
  2. UNESCO Chair in Data Privacy
  3. Cyber CNI chair of the Institut Mines-Telecom
  4. European Commission [H2020-830892, H2020-871042]
  5. Government of Catalonia [2017 SGR 705]
  6. Spanish Government [RTI2018-095094-B-C21, RED2018-102321-T]

Ask authors/readers for more resources

Web search engines (e.g., Google, Bing, Qwant, and DuckDuckGo) may process a myriad of search queries per second. According to Internet Live Stats, Google handles more than two hundred million queries per hour, i.e., about two trillion queries per year. For monetization purposes, the queries can be stored and complemented with additional data, referred to as query logs. Together, they can correlate valuable information to build accurate user profiles. Before releasing the query logs to third parties (e.g., for profit purposes), the personal information contained in the query logs must be properly protected by the web search engines. Current regulations establish strict control, and require from provable anonymization processing (e.g., in terms of statistical disclosure) of any personally identifiable information. In this paper, we tackle this challenge. We propose a real-time anonymization solution to protect streams of unstructured data at the server side. Our approach is based on the use of a probabilistic k-anonymity technique. It allows probabilistic processing of personally identifiable attributes contained in the query logs, with provable privacy properties. Our solution handles limitations of traditional k-anonymity approaches with respect to unstructured data and real-time processing. We present the implementation of our solution and report experimental evaluation results. The evaluation is conducted in terms of privacy, utility, and scalability achievement. Results validate the feasibility of our proposal.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available