Properties of a smilarity preserving hash function and their realization in sdhash

zurück zur Übersicht


Breitinger, F., & Baier, H. (2012). Properties of a smilarity preserving hash function and their realization in sdhash. Paper presented at the Information Security for South Africa (ISSA).


Beitrag in Konferenztagungsband


Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved.



  • Institut für Wirtschaftsinformatik
  • Hilti Lehrstuhl für Daten- und Anwendungssicherheit

Original Source URL