Loop transformations for flash memory: cost models and performance effects

Paik, Joon-Young; Chung, Tae-Sun; Cho, Eun-Sun

doi:10.1007/s10617-014-9144-7

Loop transformations for flash memory: cost models and performance effects

Published: 02 October 2014

Volume 17, pages 627–667, (2013)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Joon-Young Paik¹,
Tae-Sun Chung² &
Eun-Sun Cho³

290 Accesses
1 Citation
Explore all metrics

Abstract

Loop optimization, made of a sequence of loop transformations, plays an important role in performance improvement in data centric applications. Programs using flash memory are no exception to this, but, under certain conditions careless applications of specific loop transformations might cause unexpected results, due to the characteristics of flash memory and underlying management systems. In this article, we analyze how loop transformations affect the performance in flash translation layers (FTLs). First, we choose four loop structures which have distinct reference patterns and propose a cost model for each structure, reflecting the properties of flash memory. Then, using these cost models, we investigate how loop transformations affect the block associative sector translation (BAST)’s and fully associative sector translation (FAST)’s internal operations and analyze the performance effect of loop transformations experimentally. As a result, we find that some of the major loop transformations cause unexpected performance effects in those major FTLs under certain conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indexing in flash storage devices: a survey on challenges, current approaches, and future trends

Article 03 August 2019

Athanasios Fevgas, Leonidas Akritidis, … Yannis Manolopoulos

Optimizing random write performance of FAST FTL for NAND flash memory

Article 03 December 2014

XuFeng Guo & YuPing Wang

Gemini: A Novel Hardware and Software Implementation of High-performance PCIe SSD

Article 21 July 2016

Yang Ou, Nong Xiao, … Lizhou Wu

Notes

To simplify the analysis and cost models, our work does not consider any layers interleaved between applications and FTL. We believe that such an assumption is reasonable starting point of complex models analysis. In addition, this assumption is immediately applicable for explicit I/O operations [19, 20]. In Fig. 1, iowrite() indicates a function to request explicit write operations [19].
lbn is calculated by dividing the lpn by the number of pages in a block. In Fig. 3, the spare area has lpn, and the number of pages per block is 4. Thus, the lbn within pbn, 10, is 1 (4 div 4).
Fig. 3
Distinct FTL designs between BAST and FAST [4]
Full size image
The immature state means that a log block has available space.

References

Intel Corporation (1998) Understanding the flash translation layer (FTL) specification. Intel Technical Report AP-684
Chung TS, Park DJ, Park S, Lee DH, Lee SW, Song HJ (2009) A survey of flash translation layer. J Syst Archit 55:332–343
Article Google Scholar
Kim JS, Kim JM, Noh SH, Min SL, Cho YK (2002) A space-efficient flash translation layer for compactflash systems. IEEE Trans Consum Electron 48:366–375
Article Google Scholar
Lee SW, Park DJ, Chung TS, Lee DH, Park S, Song HJ (2007) A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans Embed Comput Syst doi:10.1145/1275986.1275990
Jung D, Kang J, Jo H, Kim JS, Lee J (2010) Superblock FTL: a superblock-based flash translation layer with a hybrid address translation scheme. ACM Trans Embed Comput Syst 9:1–41
Article Google Scholar
Park C, Cheon W, Kang J, Roh K, Cho W, Kim JS (2008) A re-configurable FTL architecture for NAND flash-based applications. ACM Trans Embed Comput Syst 7:1–23
Google Scholar
Kim H, Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 1–14
Jo H, Kang JU, Park SY, Kim JS, Lee J (2006) FAB: flash-aware buffer management policy for portable media players. IEEE Trans Consum Electron 52:485–493
Article Google Scholar
Kang S, Park S, Jung H, Shim H, Cha J (2009) Performance trade-offs in using nvram write buffer for flash memory-based storage devices. IEEE Trans Comput 58:744–758
Article MathSciNet Google Scholar
Shi L, Li J, Xue CJ, Yang C, Zhou X (2011) ExLRU: a unified write buffer cache management for flash memory. In: Proceeding of ACM international conference on embedded software (EMSOFT), pp 339–348
Lee SW, Moon B (2007) Design of flash-based DBMS: an in-page logging approach. In: Proceeding of SIGMOD conference, pp 55–66
Kang WH, Lee SW, Moon B, Oh GH, Min C (2013) X-FTL: transactional FTL for SQLite databases. In: Proceeding of SIGMOD conference, pp 97–108
Chen F, Luo T, Zhang X (2011) CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceeding of USENIX Conference file and storage technologies (FAST), pp 77–90
Kim J, Lee C, Lee S, Son S, Choi J, Yoon S, Lee HU, Kang S, Won Y, Cha J (2012) Deduplication in SSDs: model and quantitative analysis. In: Proceeding of IEEE conference massive data storage (MSST), pp 1–12
Gupta A, Pisolkar R, Urgaonkar B, Sivasubramaniam A (2011) Leveraging value locality in optimizing NAND flash-based SSDs. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 91–103
Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceeding of USENIX Conference on file and storage technologies (FAST)
Muchnick SS (1998) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Mckinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst 18(4):424–453
Article Google Scholar
Kandemir M, Choudhary A, Ramanujam J (2002) An I/O-conscious tiling strategy for disk-resident data sets. J Supercomput 21(3):257–284
Article MATH Google Scholar
Mowry TC, Demke AK, Krieger O (1996) Automatic compiler-inserted I/O prefetching for out-of-core applications. In: Proceeding of USENIX symposium on operating systems design and implementation (OSDI), pp 3–17
Samsung Electronics (2004) NAND flash memory & smartmedia data book
Shin D (2010) Workload-driven adaptive log buffer-based FTL. IEICE Electron Express 7(11):804–809
Article Google Scholar
Lee D, Shin F, Kim Y, Kim J (2008) LAST: locality-aware sector translation for NAND flash memory-based storage systems. Proc ACM SIGOPS Oper Syst Rev 42(6):36–42
Article Google Scholar
Shin H, Jung D, Kim J, Kim J, Maeng S (2010) Co-optimization of buffer layer and FTL in high-performance flash-based storage systems. Des Autom Embed Syst 14:415–443
Article Google Scholar
Bouganim L, Jonsson B, Bonnet P (2009) uFlip: understanding flash IO patterns. In: Proceeding of conference on innovative data systems research (CIDR)
Paik JY, Chung TS, Cho ES (2013) Cost model based analyses on performance effects of loop transformations in block associative sector translation. In: Proceeding of IEEE/IFIP international conference on embedded and ubiquitous computing (EUC), pp 1998–2005
Zhang W, Leiss EL (2001) A compiler driven out-of-core programming approach for optimizing data locality in loop nests. In: Proceeding of international conference on parallel and distributed processing techniques and applications (PDPTA), pp 25–28
Johnson T and Shasha D (1994) 2Q: A low overhead high performance buffer management replacement algorithm. In: Proceeding of international conference on very large data bases (VLDB), pp 439–450 Zhang W, pp 25–28
Megiddo N and Modha DS (2003) ARC: a self-tuning, low overhead replacement cache. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 115–130
Jo H, Kang K, Park S, Kim J, Lee J (2006) FAB: flash-aware buffer management replacment algorithm. IEEE Trans Consum Electron 52(2):485–493
Article Google Scholar
Kim H and Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 1–14
Laforest E (2010) Survey of loop transformation techniques. Technical Report. University of Toronto, Toronto
Qiu M, Wu J, Xue C.J., Hu JA, Tseng W-C, and Sha E H-M (2008) Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory. In: Proceedings of IEEE international conference on field programmable logic and applications (FPL), pp 459–462
Xue C, Shao Z, Chen Y, and Sha E H-M (2005) Optimizing DSP scheduling via address assignment with array and loop transformation. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 85–88
Xue CJ, Sha EH-M, Shao Z, Qiu M (2008) Effective loop partitioning and scheduling under memory and register dual constraints. In: Proceedings of IEEE/ACM design, automation and test in Europe (DATE), pp 1202–1207
Hosomi M, Yamagishi H, Yamamoto Y, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H, Kano H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: Proceeding of international electron devices meeting, pp 459–462
Qiu K, Zhao M, Fu C, Shi L, Xue CJ (2013) Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems. Proceeding of IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 83–89
Paik JY, Cho ES, Chung TS (2009) Performance improvement for flash memories using loop optimization. In: Proceeding of IEEE international conference on computational science and engineering (CSE), pp 508–513
Kim S, Kwon K, Kim C, Jang C, Lee J, Min SL (2010) Demand paging techniques for flash memory using compiler post-pass optimizations. ACM Trans Embed Comput Syst 10(4):40
Lin CC, Chen CL, and Tseng CH (2007) Source code arrangement of embedded java virtual machine for NAND flash memory. In: Proceeding of international conference on symposium on communications and information technologies (ISCIT), pp 152–157
Gupta A, Kim Y, Urgaonkar B (2009) DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In: Proceeding of internationl conference on architectural support for programming languages and operating systems (ASPLOS) pp 229–240
Qin Z, Wang Y, Liu D, Shao Z (2010) Demand-based block-level address mapping in large-scale NAND flash storage systems. In: Proceeding of CODES + ISSS, pp 173–182
Qin Z, Wang Y, Liu D, Shao Z (2011) A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems. In: Proceeding of international conference on IEEE real-time and embedded technology and applications symposium, pp 157–166
Gniady C, Butt AR, Hu YC (2004) Program-counter-based pattern classification in buffer caching. In: Proceeding of USENIX conference, symposium on opearting systems design and implementation (OSDI), pp 27–27
Brock J, Gu X, Bao B, Ding C (2013) Pacman: Program-assisted cache management. In: Proceeding of international symposium on memory management (ISMM), pp 39–50

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2010-0013386).

Author information

Authors and Affiliations

Software Research Center, Chungnam National University, Daejeon, Republic of Korea
Joon-Young Paik
College of Information Technology, Ajou University, Suwon, Republic of Korea
Tae-Sun Chung
Department of Computer Science and Engineering, Chungnam National University, Daejeon, Republic of Korea
Eun-Sun Cho

Authors

Joon-Young Paik
View author publications
You can also search for this author in PubMed Google Scholar
Tae-Sun Chung
View author publications
You can also search for this author in PubMed Google Scholar
Eun-Sun Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eun-Sun Cho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paik, JY., Chung, TS. & Cho, ES. Loop transformations for flash memory: cost models and performance effects. Des Autom Embed Syst 17, 627–667 (2013). https://doi.org/10.1007/s10617-014-9144-7

Download citation

Received: 15 January 2014
Accepted: 12 September 2014
Published: 02 October 2014
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10617-014-9144-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Loop transformations for flash memory: cost models and performance effects

Abstract

Access this article

Similar content being viewed by others

Indexing in flash storage devices: a survey on challenges, current approaches, and future trends

Optimizing random write performance of FAST FTL for NAND flash memory

Gemini: A Novel Hardware and Software Implementation of High-performance PCIe SSD

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Loop transformations for flash memory: cost models and performance effects

Abstract

Access this article

Similar content being viewed by others

Indexing in flash storage devices: a survey on challenges, current approaches, and future trends

Optimizing random write performance of FAST FTL for NAND flash memory

Gemini: A Novel Hardware and Software Implementation of High-performance PCIe SSD

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation