Li Xiong, Georgia Institute of Technology
Advances in distributed service-oriented computing and global communications have formed a strong technology push for large scale data integration among organizations and enterprises. In addition, there is a growing demand for integration of information across multiple autonomous, possibly untrusted, and private databases. Data privacy increasingly becomes an important aspect of data integration and management because organizations or individuals do not want to reveal their private data for various legal or commercial reasons.
This talk discusses constraints imposed by data privacy and how they impact information management and integration through a few example application scenarios. It gives an overview of the current state of the research in this relatively new area and presents the challenges and our solutions in developing efficient and privacy-preserving protocols for information sharing across multiple databases. Concretely, given an aggregate query or data mining task spanning multiple private databases, we wish to compute the answer without revealing any additional information of each individual database apart from the result. One way to tackle this problem in practice is to relax the privacy constraint to allow efficient information integration while minimizing the amount of information disclosure. We developed a suite of decentralized privacy-conscious protocols for important integration operations that on one hand effectively minimize the information disclosure of individual databases and on the other hand are efficient in terms of both computation and communication costs. In particular, this talk focuses on the design of a novel privacy preserving protocol for topK selection, including a privacy measurement metric that formalizes the notion of loss of data privacy, the protocol, an analytical model, and a set of experimental evaluations that show the correctness, efficiency and strong privacy characteristics of the proposed protocol. It also illustrates how the topK protocol can be served as a building block for more complex information integration problems such as kNN classification across multiple databases.