FRASH: A Framework to Test Algorithms of Similarity Hashing

Reference

Breitinger, F., Sivaktakis, G., & Baier, H. (2013). FRASH: A Framework to Test Algorithms of Similarity Hashing. Digital Investigation, 10, 50 - 58.

Publication type

Article in Scientific Journal

Abstract

Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Word document), and fragments (e.g., network packets), too. Over the recent years a couple of different similarity hashing algorithms were published. However, due to the absence of a definition and a test framework, it is hardly possible to evaluate and compare these approaches to establish them in the community. The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing. First, we describe common use cases of a similarity hashing algorithm to motivate our two test classes efficiency and sensitivity & robustness. Next, our open and freely available framework is briefly described. Finally, we apply FRASH to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses.

Persons

Dr.-Ing. Frank Breitinger

Organizational Units

Institute of Information Systems
Hilti Chair for Data and Application Security

Original Source URL

Link