On Data Publishing with Clustering Preservation

back to overview


Vlachos, M., Schneider, J., & Vassiliadis, V. G. (2015). On Data Publishing with Clustering Preservation. ACM Transactions on Knowledge Discovery from Data (TKDD), 9(3).

Publication type

Article in Scientific Journal


The emergence of cloud-based storage services is opening up new avenues in data exchange and data dissemination. This has ampli?ed the interest in right-protection mechanisms to establish ownership in the event of data leakage. Current right-protection technologies, however, rarely provide strong guarantees on dataset utility after the protection process. This work presents techniques that explicitly address this topic and provably preserve the outcome of certain mining operations. In particular, we take special care to guarantee that the outcome of hierarchical clustering operations remains the same before and after right protection. Our approach considers all prevalent hierarchical clustering variants: single-, complete-, and average-linkage. We imprint the ownership in a dataset using watermarking principles, and we derive tight bounds on the expansion/contraction of distances incurred by the process. We leverage our analysis to design fast algorithms for right protection without exhaustively searching the vast design space. Finally, because the right-protection process introduces a user-tunable distortion on the dataset, we explore the possibility of using this mechanism for data obfuscation. We quantify the tradeoff between obfuscation and utility for spatiotemporal datasets and discover very favorable characteristics of the process. An additional advantage is that when one is interested in both right-protecting and obfuscating the original data values, the proposed mechanism can accomplish both tasks simultaneously.


Organizational Units

  • Institute of Information Systems
  • Hilti Chair of Business Process Management