Skip to main content

Application of q-Gram Distance in Digital Forensic Search

  • Conference paper
Computational Forensics (IWCF 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5158))

Included in the following conference series:

Abstract

In order to find evidence, digital forensic investigation often includes search procedures applied on large data sets. For such search procedures, appropriate fault tolerant distance measures are needed in order to detect evidence even if it has been previously distorted/partially erased from the search media. One of the appropriate fault-tolerant distance measures for this purpose is constrained edit distance, where the maximum numbers of consecutive insertions and deletions represent the constraints. However, the time complexity of its computation is too high. We propose a two-phase indexless search procedure for application in forensic evidence search that makes use of q-gram distance instead of the constrained edit distance. The q-gram distance is known to approximate well the unconstrained edit distance. We study how well q-gram distance approximates edit distance with special constraints needed in forensic search applications. We compare the performances of the search procedure with the two distances applied in it. Experimental results show that the procedure with the q-gram distance implemented achieves for some values of q almost the same accuracy as the one with the constrained edit distance, but the efficiency of the procedure that implements the q-gram distance is much better, for a much lower time complexity of computation of the q-gram distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beebe, N., Clark, J.: Dealing with terabyte data sets in digital investigations. In: Advances in Digital Forensics: Proceedings of the IFIP International Conference on Digital Forensics, pp. 3–16 (2005)

    Google Scholar 

  2. Boyer, R., Moore, J.: A fast string searching algorithm. Comm. ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  3. http://www.dtsearch.com

  4. Kashyap, R., Oommen, B.: The Noisy Substring Matching Problem. IEEE Trans. Software Eng. SE-9(3), 365–370 (1983)

    Article  Google Scholar 

  5. Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Computing 6(2), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  6. Kurtz, S.: Approximate String Searching under Weighted Edit Distance. In: Proceedings of Third South American Workshop on String Processing, Recife, Brazil, August, pp. 156–170 (1996)

    Google Scholar 

  7. Mihov, S., Schulz, K.U.: Fast approximate search in large dictionaries. Computational Linguistics 30(4), 451–477 (2004)

    Article  MathSciNet  Google Scholar 

  8. Mihov, S., Mitankin, P., Schulz, K.U.: Fast selection of small and precise candidate sets from dictionaries for text correction tasks. In: Proceedings of ICDAR 2007, vol. 1, pp. 471–475 (2007)

    Google Scholar 

  9. Oommen, B.: Recognition of Noisy Subsequences Using Constrained Edit Distances. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9(5), 676–685 (1987)

    Article  Google Scholar 

  10. Roussev, V., Richard III, G.: Breaking the performance wall: the cases for distributed digital forensics. In: Proceedings of the Digital Forensics Research Workshop, pp. 1–16 (2004)

    Google Scholar 

  11. Petrović, S., Franke, K.: Improving the Efficiency of Digital Forensic Search by Means of the Constrained Edit Distance. In: Proceedings of the Third International Symposium on Information Assurance and Security, pp. 405–410 (2007)

    Google Scholar 

  12. Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms 1(4), 359–373 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  13. Shi, F.: Fast Approximate Search in Text Databases. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 259–267. Springer, Heidelberg (2004)

    Google Scholar 

  14. Shi, F., Mefford, C.: A New Indexing Method for Approximate Search in String Databases. In: Proceedings of Fifth International Conference on Computer and Information Technology (CIT 2005), pp. 70–76. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  15. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sargur N. Srihari Katrin Franke

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petrović, S., Bakke, S. (2008). Application of q-Gram Distance in Digital Forensic Search. In: Srihari, S.N., Franke, K. (eds) Computational Forensics. IWCF 2008. Lecture Notes in Computer Science, vol 5158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85303-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85303-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85302-2

  • Online ISBN: 978-3-540-85303-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics