File similarity evaluation scheme for multimedia data using partial hash information

Kim, Byung-Kwan; Oh, Su-Jin; Jang, Sung-Bong; Ko, Young-Woong

doi:10.1007/s11042-016-3373-7

File similarity evaluation scheme for multimedia data using partial hash information

Published: 22 February 2016

Volume 76, pages 19649–19663, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Byung-Kwan Kim¹,
Su-Jin Oh¹,
Sung-Bong Jang² &
…
Young-Woong Ko¹

323 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

File similarity is a numerical indicator that how many duplicated data exist in target files. With this information, we can reduce storage capacity with data deduplication scheme, further it can be exploited in the digital forensic field for finding malicious software. However, measuring file similarity between files can cause a high overhead in terms of processing time and the capacity of disk storage. For this reason, in this paper, we propose a novel file similarity evaluation algorithm called PHISA (Partial Hash Information String Algorithm). To evaluate the performance of the proposed system, we compare PHISA to well-known file similarity tools. The evaluation result shows that PHISA reduces the processing time and increases the similarity evaluation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design of Multimedia File Similarity Evaluation Scheme Using Fingerprinting

Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

Similarity Hashing Based on Levenshtein Distances

References

Bhagwat D, Eshghi K, Long DD, Lillibridge M (2009) Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009. MASCOTS'09. IEEE International Symposium on, 2009. IEEE, pp 1–9
Breitinger F, Baier H (2012) Performance Issues about Context-Triggered Piecewise Hashing. Digital Forensics and Cyber Crime. Springer, In, pp. 141–155
Google Scholar
Breitinger F, Stivaktakis G, Baier H (2013) FRASH: A framework to test algorithms of similarity hashing. Digit Investig 10:S50–S58
Article Google Scholar
Dubnicki C, Gryz L, Heldt L, Kaczmarczyk M, Kilian W, Strzelczak P, Szczepkowski J, Ungureanu C, Welnicki M (2009) HYDRAstor: A Scalable Secondary Storage. FAST 2009:197–210
Google Scholar
El-Shimi A, Kalach R, Kumar A, Ottean A, Li J, Sengupta S (2012) Primary Data Deduplication-Large Scale Study and System Design. USENIX Annual Technical Conference 2012:285–296
Google Scholar
Hua Y, Liu X, Feng D (2013) Data similarity-aware computation infrastructure for the cloud. IEEE Transactions on Computers p 1
Ko Y-W, Jung H-M, Lee W-Y, Kim M-J, Yoo C (2013) Stride Static Chunking Algorithm for Deduplication System. IEICE Trans Inf Syst 96(7):1544–1547
Article Google Scholar
Kornblum J (2006) Identifying almost identical files using context triggered piecewise hashing. digital investigation 3:91–97
Li R, Ju L, Peng Z, Yu Z, Wang C (2011): Batch text similarity search with mapreduce. In: Du, X., Fan, W., Peng, Z., Sharaf, M.A. (eds.) APWeb. Lecture Notes in Computer Science, vol. 6612, pp. 412–423. Springer, Heidelberg
Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P (2009) Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. Fast 2009:111–123
Google Scholar
Manber U (1994) Finding Similar Files in a Large File System. Usenix Winter 1994:1–10
Google Scholar
Meyer DT, Bolosky WJ (2012) A study of practical deduplication. ACM Transactions on Storage (TOS) 7(4):14
Google Scholar
Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. In: ACM SIGOPS Operating Systems Review, 2001. vol 5. ACM, pp 174–187
Pucha H, Andersen DG, Kaminsky M (2007) Exploiting Similarity for Multi-Source Downloads Using File Handprints. NSDI, In
Google Scholar
Quinlan S, Dorward S (2002) Venti: A New Approach to Archival Storage. FAST 2002:89–101
Google Scholar
Roussev V (2010) Data fingerprinting with similarity digests. Advances in Digital Forensics VI. Springer, In, pp. 207–226
Google Scholar
Roussev V, Quates C (2012) Content triage with similarity digests: The M57 case study. Digit Investig 9:S60–S68
Article Google Scholar
Song L, Deng Y, Xie J (2013): Exploiting fingerprint prefetching to improve the performance of data deduplication. In: Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications. IEEE
Xia W, Jiang H, Feng D, Hua Y (2011a): Silo: A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, pp. 26–28. USENIX Association
Xia W, Jiang H, Feng D, Hua Y (2011b) (2011) SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput. USENIX Annual Technical Conference, In
Google Scholar
Yang J, Huang T, Su L (2014) Using similarity analysis to detect frame duplication forgery in videos. Multimedia Tools and Applications 1-19

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and future Planning(2014R1A2A1A11054160).

Author information

Authors and Affiliations

Department of Computer Engineering, Hallym University, Chuncheon, Gangwon, 200-702, South Korea
Byung-Kwan Kim, Su-Jin Oh & Young-Woong Ko
Department of Computer Software Engineering, Kumoh National Institute of Technology, 61 Daehak-ro, Gumi, Kyoung-Buk, 730-701, South Korea
Sung-Bong Jang

Authors

Byung-Kwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Su-Jin Oh
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Bong Jang
View author publications
You can also search for this author in PubMed Google Scholar
Young-Woong Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Woong Ko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, BK., Oh, SJ., Jang, SB. et al. File similarity evaluation scheme for multimedia data using partial hash information. Multimed Tools Appl 76, 19649–19663 (2017). https://doi.org/10.1007/s11042-016-3373-7

Download citation

Received: 23 November 2015
Revised: 06 February 2016
Accepted: 15 February 2016
Published: 22 February 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-016-3373-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

File similarity evaluation scheme for multimedia data using partial hash information

Abstract

Access this article

Similar content being viewed by others

Design of Multimedia File Similarity Evaluation Scheme Using Fingerprinting

Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

Similarity Hashing Based on Levenshtein Distances

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

File similarity evaluation scheme for multimedia data using partial hash information

Abstract

Access this article

Similar content being viewed by others

Design of Multimedia File Similarity Evaluation Scheme Using Fingerprinting

Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

Similarity Hashing Based on Levenshtein Distances

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation