Abstract
Loop optimization, made of a sequence of loop transformations, plays an important role in performance improvement in data centric applications. Programs using flash memory are no exception to this, but, under certain conditions careless applications of specific loop transformations might cause unexpected results, due to the characteristics of flash memory and underlying management systems. In this article, we analyze how loop transformations affect the performance in flash translation layers (FTLs). First, we choose four loop structures which have distinct reference patterns and propose a cost model for each structure, reflecting the properties of flash memory. Then, using these cost models, we investigate how loop transformations affect the block associative sector translation (BAST)’s and fully associative sector translation (FAST)’s internal operations and analyze the performance effect of loop transformations experimentally. As a result, we find that some of the major loop transformations cause unexpected performance effects in those major FTLs under certain conditions.
Similar content being viewed by others
Notes
To simplify the analysis and cost models, our work does not consider any layers interleaved between applications and FTL. We believe that such an assumption is reasonable starting point of complex models analysis. In addition, this assumption is immediately applicable for explicit I/O operations [19, 20]. In Fig. 1, iowrite() indicates a function to request explicit write operations [19].
lbn is calculated by dividing the lpn by the number of pages in a block. In Fig. 3, the spare area has lpn, and the number of pages per block is 4. Thus, the lbn within pbn, 10, is 1 (4 div 4).
The immature state means that a log block has available space.
References
Intel Corporation (1998) Understanding the flash translation layer (FTL) specification. Intel Technical Report AP-684
Chung TS, Park DJ, Park S, Lee DH, Lee SW, Song HJ (2009) A survey of flash translation layer. J Syst Archit 55:332–343
Kim JS, Kim JM, Noh SH, Min SL, Cho YK (2002) A space-efficient flash translation layer for compactflash systems. IEEE Trans Consum Electron 48:366–375
Lee SW, Park DJ, Chung TS, Lee DH, Park S, Song HJ (2007) A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans Embed Comput Syst doi:10.1145/1275986.1275990
Jung D, Kang J, Jo H, Kim JS, Lee J (2010) Superblock FTL: a superblock-based flash translation layer with a hybrid address translation scheme. ACM Trans Embed Comput Syst 9:1–41
Park C, Cheon W, Kang J, Roh K, Cho W, Kim JS (2008) A re-configurable FTL architecture for NAND flash-based applications. ACM Trans Embed Comput Syst 7:1–23
Kim H, Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 1–14
Jo H, Kang JU, Park SY, Kim JS, Lee J (2006) FAB: flash-aware buffer management policy for portable media players. IEEE Trans Consum Electron 52:485–493
Kang S, Park S, Jung H, Shim H, Cha J (2009) Performance trade-offs in using nvram write buffer for flash memory-based storage devices. IEEE Trans Comput 58:744–758
Shi L, Li J, Xue CJ, Yang C, Zhou X (2011) ExLRU: a unified write buffer cache management for flash memory. In: Proceeding of ACM international conference on embedded software (EMSOFT), pp 339–348
Lee SW, Moon B (2007) Design of flash-based DBMS: an in-page logging approach. In: Proceeding of SIGMOD conference, pp 55–66
Kang WH, Lee SW, Moon B, Oh GH, Min C (2013) X-FTL: transactional FTL for SQLite databases. In: Proceeding of SIGMOD conference, pp 97–108
Chen F, Luo T, Zhang X (2011) CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceeding of USENIX Conference file and storage technologies (FAST), pp 77–90
Kim J, Lee C, Lee S, Son S, Choi J, Yoon S, Lee HU, Kang S, Won Y, Cha J (2012) Deduplication in SSDs: model and quantitative analysis. In: Proceeding of IEEE conference massive data storage (MSST), pp 1–12
Gupta A, Pisolkar R, Urgaonkar B, Sivasubramaniam A (2011) Leveraging value locality in optimizing NAND flash-based SSDs. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 91–103
Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceeding of USENIX Conference on file and storage technologies (FAST)
Muchnick SS (1998) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco
Mckinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst 18(4):424–453
Kandemir M, Choudhary A, Ramanujam J (2002) An I/O-conscious tiling strategy for disk-resident data sets. J Supercomput 21(3):257–284
Mowry TC, Demke AK, Krieger O (1996) Automatic compiler-inserted I/O prefetching for out-of-core applications. In: Proceeding of USENIX symposium on operating systems design and implementation (OSDI), pp 3–17
Samsung Electronics (2004) NAND flash memory & smartmedia data book
Shin D (2010) Workload-driven adaptive log buffer-based FTL. IEICE Electron Express 7(11):804–809
Lee D, Shin F, Kim Y, Kim J (2008) LAST: locality-aware sector translation for NAND flash memory-based storage systems. Proc ACM SIGOPS Oper Syst Rev 42(6):36–42
Shin H, Jung D, Kim J, Kim J, Maeng S (2010) Co-optimization of buffer layer and FTL in high-performance flash-based storage systems. Des Autom Embed Syst 14:415–443
Bouganim L, Jonsson B, Bonnet P (2009) uFlip: understanding flash IO patterns. In: Proceeding of conference on innovative data systems research (CIDR)
Paik JY, Chung TS, Cho ES (2013) Cost model based analyses on performance effects of loop transformations in block associative sector translation. In: Proceeding of IEEE/IFIP international conference on embedded and ubiquitous computing (EUC), pp 1998–2005
Zhang W, Leiss EL (2001) A compiler driven out-of-core programming approach for optimizing data locality in loop nests. In: Proceeding of international conference on parallel and distributed processing techniques and applications (PDPTA), pp 25–28
Johnson T and Shasha D (1994) 2Q: A low overhead high performance buffer management replacement algorithm. In: Proceeding of international conference on very large data bases (VLDB), pp 439–450 Zhang W, pp 25–28
Megiddo N and Modha DS (2003) ARC: a self-tuning, low overhead replacement cache. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 115–130
Jo H, Kang K, Park S, Kim J, Lee J (2006) FAB: flash-aware buffer management replacment algorithm. IEEE Trans Consum Electron 52(2):485–493
Kim H and Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 1–14
Laforest E (2010) Survey of loop transformation techniques. Technical Report. University of Toronto, Toronto
Qiu M, Wu J, Xue C.J., Hu JA, Tseng W-C, and Sha E H-M (2008) Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory. In: Proceedings of IEEE international conference on field programmable logic and applications (FPL), pp 459–462
Xue C, Shao Z, Chen Y, and Sha E H-M (2005) Optimizing DSP scheduling via address assignment with array and loop transformation. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 85–88
Xue CJ, Sha EH-M, Shao Z, Qiu M (2008) Effective loop partitioning and scheduling under memory and register dual constraints. In: Proceedings of IEEE/ACM design, automation and test in Europe (DATE), pp 1202–1207
Hosomi M, Yamagishi H, Yamamoto Y, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H, Kano H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: Proceeding of international electron devices meeting, pp 459–462
Qiu K, Zhao M, Fu C, Shi L, Xue CJ (2013) Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems. Proceeding of IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 83–89
Paik JY, Cho ES, Chung TS (2009) Performance improvement for flash memories using loop optimization. In: Proceeding of IEEE international conference on computational science and engineering (CSE), pp 508–513
Kim S, Kwon K, Kim C, Jang C, Lee J, Min SL (2010) Demand paging techniques for flash memory using compiler post-pass optimizations. ACM Trans Embed Comput Syst 10(4):40
Lin CC, Chen CL, and Tseng CH (2007) Source code arrangement of embedded java virtual machine for NAND flash memory. In: Proceeding of international conference on symposium on communications and information technologies (ISCIT), pp 152–157
Gupta A, Kim Y, Urgaonkar B (2009) DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In: Proceeding of internationl conference on architectural support for programming languages and operating systems (ASPLOS) pp 229–240
Qin Z, Wang Y, Liu D, Shao Z (2010) Demand-based block-level address mapping in large-scale NAND flash storage systems. In: Proceeding of CODES + ISSS, pp 173–182
Qin Z, Wang Y, Liu D, Shao Z (2011) A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems. In: Proceeding of international conference on IEEE real-time and embedded technology and applications symposium, pp 157–166
Gniady C, Butt AR, Hu YC (2004) Program-counter-based pattern classification in buffer caching. In: Proceeding of USENIX conference, symposium on opearting systems design and implementation (OSDI), pp 27–27
Brock J, Gu X, Bao B, Ding C (2013) Pacman: Program-assisted cache management. In: Proceeding of international symposium on memory management (ISMM), pp 39–50
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2010-0013386).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paik, JY., Chung, TS. & Cho, ES. Loop transformations for flash memory: cost models and performance effects. Des Autom Embed Syst 17, 627–667 (2013). https://doi.org/10.1007/s10617-014-9144-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-014-9144-7