Skip to main content

A Method of Recovering HBase Records from HDFS Based on Checksum File

  • Conference paper
  • First Online:
Collaborate Computing: Networking, Applications and Worksharing (CollaborateCom 2016)

Abstract

Data recovery is a key problem in disaster recovery and digital forensics fields. The HDFS (Hadoop Distributed File System) is widely used for storing high-volume, velocity and variety dataset. However, previous work about data recovery mainly focuses on personal computers or mobile phones, and few attentions have been taken to HFDS. This paper analyzes the feature of HDFS and proposes a recovery method based on checksum file in order to address the records recovery problem of HBase, which is a common application on HDFS. We first carve out the Data blocks of HFile (HBase data file) using the corresponding checksum file, then analyze the format of HBase table records to extract them from the carved Data blocks. The experiments demonstrate that our method can restore HBase records effectively. The recovery rate is nearly 100% when the cluster size is 4 KB and 2 KB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Richard III, G.G., Roussev, V.: Scalpel: a frugal, high performance file carver. In: DFRWS (2005)

    Google Scholar 

  2. Garfinkel, S.L.: Carving contiguous and fragmented files with fast object validation. Digital Invest. 4, 2–12 (2007)

    Article  Google Scholar 

  3. Memon, N., Pal, A.: Automated reassembly of file fragmented images using greedy algorithms. IEEE Trans. Image Process. 15(2), 385–393 (2006)

    Article  Google Scholar 

  4. EnCase Forensic. http://guidancesoftware.com/encase-forensic.htm

  5. Adroit Photo Forensics. http://digital-assembly.com/products/adroit-photo-forensics/

  6. Pal, A., Sencar, H.T., Memon, N.: Detecting file fragmentation point using sequential hypothesis testing. Digital Investigation. 5, 2–13 (2008)

    Article  Google Scholar 

  7. Autopsy/The Sleuth Kit. http://sleuthkit.org

  8. X-Ways Forensics. http://x-ways.net/forensics

  9. Cohen, M.: Advanced jpeg carving. In: Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications. Information, and Multimedia and Workshop, pp. 16:1–16:6 (2008)

    Google Scholar 

  10. Na, G., Shim, K., Moon, K., Kong, S., Kim, E.: Lee, J: Frame-based recovery of corrupted video files using codec specifications. IEEE Trans. Image Process. 23(2), 517–526 (2014)

    Article  MathSciNet  Google Scholar 

  11. Bock, J., Smet, P.: JPGarve: An advanced tool for automated recovery of fragmented JPEG files. IEEE Trans. Inf. Forensics Secur. 11(1), 19–24 (2016)

    Article  Google Scholar 

  12. Uzun, E., Sencar, H.T.: Carving orphaned JPEG file fragments. IEEE Trans. Inf. Forensics Secur. 10(8), 1549–1563 (2015)

    Article  Google Scholar 

  13. Cohen, M.: Advanced carving techniques. Digital Invest. 4, 119–128 (2007)

    Article  Google Scholar 

  14. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In IEEE Symposium on Mass Storage Systems (2010)

    Google Scholar 

  15. Martini, B., Choo, K.R.: Distributed filesystem forensics: XtreemFS as a case study. Digital Invest. 11, 295–313 (2014)

    Article  Google Scholar 

  16. Yoon, J., Jeong, D., Kang, C., Lee, S.: Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study. Digital Invest. 17, 53–65 (2016)

    Article  Google Scholar 

  17. Pal, A., Memon, N.: The Evolution of File Carving. IEEE Signal Process. Mag. 3, 59–71 (2009)

    Article  Google Scholar 

  18. Karresand, M., Shahmehri, N.: Fileprints: identifying file type by n-gram analysis. In: Proceedings of 7th IEEE Systems, Man and Cybernetics Information Assurance Workshop, pp. 64–71. IEEE (2006)

    Google Scholar 

  19. Veenman, C.J.: Statistical disk cluster classification for file carving. In: Proceedings of 3rd International Symposium on Information Assurance and Security, pp. 393–398. IEEE Computer Society (2007)

    Google Scholar 

  20. Jeon, S., Bang, J., Byun, K., Lee, S.: A recovery method of deleted record for SQLite database. Pers. Ubiquit. Comput. 16(6), 707–715 (2012)

    Article  Google Scholar 

  21. Xu, M., Yang, X., Wu, B., Yao, J., Zhang, H.P., Xu, J., Zheng, N.: A metadata-based method for recovering files and file traces from YAFFS2. Digital Invest. 10, 62–72 (2013)

    Article  Google Scholar 

  22. Sencar, H.T., Memon, N.: Identification and recovery of JPEG files with missing fragments. Digital Invest. 6, 88–98 (2009)

    Article  Google Scholar 

  23. Yoo, B., Park, J., Lim, S., Bang, J., Lee, S.: A study on multimedia file carving method. Multimedia Tools Appl. 61(1), 243–261 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of China under Grant Nos. 61070212 and 61572165, the State Key Program of Zhejiang Province Natural Science Foundation of China under Grant No. LZ15F020003 and Key Lab of Information Network Security, Ministry of Public Security.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Zeng, L., Xu, M., Xu, J., Zheng, N., Yang, T. (2017). A Method of Recovering HBase Records from HDFS Based on Checksum File. In: Wang, S., Zhou, A. (eds) Collaborate Computing: Networking, Applications and Worksharing. CollaborateCom 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 201. Springer, Cham. https://doi.org/10.1007/978-3-319-59288-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59288-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59287-9

  • Online ISBN: 978-3-319-59288-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics