Skip to main content
Log in

Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang J G, Lo E, Yiu M L, Tong J C, Wang G, Liu X G. The impact of solid state drive on search engine cache management. In Proc. the 36th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, July28-August 1, 2013, pp.693-702.

  2. Wang J G, Park D, Kee Y S, Papakonstantinou Y, Swanson S. SSD in-storage computing for list intersection. In Proc. the 12th Int. Workshop on Data Management on New Hardware, June 26-July1, 2016, Article No. 4.

  3. Park D, Debnath B, Du D. CFTL: A convertible flash translation layer adaptive to data access patterns. In Proc. the ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2010, pp.365-366.

  4. Park D, Wang J G, Kee Y S. In-storage computing for Hadoop MapReduce framework: Challenges and possibilities. IEEE Trans. Computers, 2016 PP(99). https://doi.org/10.1109/TC.2016.2595566

  5. Park D, Debnath B, Du D H C. A dynamic switching flash translation layer based on page-level mapping. IEICE Trans. Information and Systems, 2016, E99-D(6): 1502-1511

    Article  Google Scholar 

  6. Gray J. Tape is dead disk is tape flash is disk RAM locality is king. December 2006. http://signallake.com/innovation/Flash_is_Good.pdf, Dec. 2017.

  7. Martin J. Is tiered storage obsolete? Yes and no! November 2013. https://www.computerworld.com/article/2474599/data-center/is-tiered-storage-obsolete–yes-andno-.html, January 2018.

  8. Tagawa I, Williams M. High density data-storage using shingled-write. In Proc. the IEEE Int. Magnetics Conf., March 2009.

  9. Kasiraj P, New R M H, De Souza J C, Williams M L. System and method for writing data to dedicated bands of a hard disk drive: US 7490212. http://www.freepatentsonline.com/7490212.html, Dec. 2017.

  10. Gibson G, Polte M. Directions for shingled-write and two dimensional magnetic recording system architectures: Synergies with solid-state disks. Carnegie Mellon University Parallel Data Lab Technical Report, CMU-PDL-09-104, 2009. http://www.doc88.com/p-1866949816678.html, Dec. 2017.

  11. HGST. HGST delivers world’s first 10TB enterprise HDD for active archive applications. June 2015. http://www.hgst.com/company/media-room/press-releases/HGST-Delivers-Worlds-First-10TB-Enterprise-HDD-for-Active-Archive-Applications, Dec. 2017.

  12. SMR. Seagate, breaking capacity barriers with seagate shingled magnetic recording. Aug. 2013. http://www.seagate.com/tech-insights/breaking-areal-density-barriers-with-seagate-smr-master-ti/, Dec. 2017.

  13. Sanvido M, Bandic Z, Cassuto Y, De Souza J, Guyot C, Harayama T. Distributed field self-test for shingled magnetic recording drives: US 8599507, http://www.freepatentsonline.com/8599507.html, Dec. 2017.

  14. Chang L P, Kuo T W. Efficient management for large-scale FlashMemory storage systems with resource conservation. ACM Trans. Storage, 2005, 1(4): 381-418.

    Article  Google Scholar 

  15. Debnath B, Subramanya S, Du D, Lilja D J. Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks. In Proc. IEEE Int. Symp. Modeling Analysis & Simulation of Computer and Telecommunication Systems, September 2009.

  16. Kim H, Ahn S. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 16.

  17. Levandoski J J, Larson P Å, Stoica R. Identifying hot and cold data in main-memory databases. In Proc. the 29th IEEE Int. Conf. Data Engineering, April 2013, pp.26-37.

  18. Chang Y H, Hsieh J W, Kuo T W. Endurance enhancement of FlashMemory storage systems: An efficient static wear leveling design. In Proc. the 44th ACM/IEEE Design Automation Conf., June 2007, pp.212-217.

  19. Boboila S, Desnoyers P. Write endurance in flash drives: Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

  20. Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.

  21. Nath S, Kansal A. FlashDB: Dynamic self-tuning database for NAND flash. In Proc. the 6th Int. Symp. Information Processing in Sensor Networks, April 2007, pp.410-419.

  22. Sun G Y, Joo Y, Chen Y B, Niu D M, Xie Y, Chen Y R, Li H. A hybrid solid-state storage architecture for the performance, energy consumption and lifetime improvement. In Proc. the 16th Int. Symp. High Performance Computer Architecture, January 2010,

  23. Chang L P. Hybrid solid-state disks: Combining heterogeneous NAND flash in large SSDs. In Proc. the Asia and South Pacific Design Automation Conf., March 2008, pp.428-433.

  24. Lin C I, Park D, He W P, Du D H C. H-SWD: Incorporating hot data identification into shingled write disks. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.321-330.

  25. Jones S N, Amer A, Miller E L, Long D D E, Pitchumani R, Strong C R. Classifying data to reduce long term data movement in shingled write disks. In Proc. the 31st Symp. Mass Storage Systems and Technologies, May 2015.

  26. Chiang M L, Lee P C H, Chang R C. Managing flash memory in personal communication devices. In Proc. IEEE International Symp. Consumer Electronics, December 1997, pp.177-182.

  27. Chang L P, Kuo T W. An adaptive striping architecture for flash memory storage systems of embedded systems. In Proc. IEEE Real-Time and Embedded Technology and Applications Symp., September 2002, pp.187-196.

  28. Hsieh J W, Kuo T W, Chang L P. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage, 2006, 2(1): 22-40.

    Article  Google Scholar 

  29. Park D, Du D H C. Hot data identification for flash-based storage systems using multiple bloom filters. In Proc. the 27th IEEE Symp. Mass Storage Systems and Technologies, May 2011.

  30. Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422-426.

    Article  MATH  Google Scholar 

  31. Dharmapurikar S, Krishnamurthy P, Taylor D E. Longest prefix matching using bloom filters. IEEE/ACM Trans. Networking, 2006, 14(2): 397-409.

    Article  Google Scholar 

  32. Kryder M H, Gage E C, McDaniel T W, Challener W A, Rottmayer R E, Ju G P, Hsia Y T, Erden M F. Heat assisted magnetic recording. Proceedings of the IEEE, 2008, 96(11): 1810-1835.

    Article  Google Scholar 

  33. Challener W A, Peng C, Itagi A V, Karns D, Peng Y, Yang X, Zhu X, Gokemeijer N J, Hsia Y T, Ju G, Rottmayer R E, Seigler M A, Gage E C. The road to HAMR. In Proc. Asia-Pacific Magnetic Recording Conf., January 2009.

  34. Rottmayer R E, Batra S, Buechel D, Challener W A, Hohlfeld J, Kubota Y, Li L, Lu B, Mihalcea C, Mountfield K, Pelhos K, Peng C, Rausch T, Seigler M A, Weller D, Yang X M. Heat-assisted magnetic recording. IEEE Trans. Magnetics, 2006, 42(10): 2417-2421.

    Article  Google Scholar 

  35. Dobisz E A, Bandic Z Z, Wu T W, Albrecht T. Patterned media: Nanofabrication challenges of future disk drives. Proceedings of the IEEE, 2008, 96(11): 1836-1846.

    Article  Google Scholar 

  36. Kikitsu A, Kamata Y, Sakurai M, Naito K. Recent progress of patterned media. IEEE Trans. Magnetics, 2007, 43(9): 3685-3688.

    Article  Google Scholar 

  37. Zhang S H, Chai K S, Cai K, Chen B J, Qin Z L, Foo S M. Write failure analysis for bit-patterned-media recording and its impact on read channel modeling. IEEE Trans. Magnetics, 2010, 46(6): 1363-1365.

    Article  Google Scholar 

  38. Amer A, Holliday J, Long D D E, Miller E L, Paris J F, Schwarz T. Data management and layout for shingled magnetic recording. IEEE Trans. Magnetics, 2011, 47(10): 3691-3697.

    Article  Google Scholar 

  39. Greaves S, Kanai Y, Muraoka H. Shingled recording for 2-3 Tbit/in2. IEEE Trans. Magnetics, 2009, 45(10): 3823-3829.

    Article  Google Scholar 

  40. Amer A, Long D D E, Miller E L, Paris J F, Schwarz S J T. Design issues for a shingled write disk system. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010.

  41. Cassuto Y, Sanvido M A A, Guyot C, Hall D R, Bandic Z Z. Indirection systems for shingled-recording disk drives. In Proc. the 26th Symp. Mass Storage Systems and Technologies, May 2010.

  42. Feldman T. Host-aware SMR. Nov. 2014. http://open-zfs.org/w/images/2/2a/Host-Aware_SMR-Tim_Feldman.pdf, Dec. 2017.

  43. Feldman T, Gibson G. Shingled magnetic recording: Areal density increase requires new data management. The Magazine of USENIX & SAGE, 2013, 38(3): 22-30.

    Google Scholar 

  44. Aghayev A, Desnoyers P. Skylight: A window on shingled disk operation. In Proc. the 13th USENIX Conf. File and Storage Technologies, February 2015, pp.135-149.

  45. Wu F G, Yang M C, Fan Z Q, Zhang B Q, Ge X Z, Du D. Evaluating host aware SMR drives. In Proc. the 8th USENIX Workshop on Hot Topics in Storage and File Systems, June 2016, pp.31-35.

  46. Narayanan D, Donnelly A, Rowstron A. Write off-loading: Practical power management for enterprise storage. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, pp.253-267.

  47. Russinovich M. DiskMon for Windows v2.01, 2006. https://technet.microsoft.com/enus/sysinternals/diskmon.aspx, Jan. 2018.

Download references

Acknowledgment

We would like to thank David Schwaderer (Samsung Semiconductor Inc., USA) for his valuable comments and proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongchul Park.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 328 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, D., He, W. & Du, D.H.C. Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision. J. Comput. Sci. Technol. 33, 79–97 (2018). https://doi.org/10.1007/s11390-018-1809-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1809-4

Keywords

Navigation