Elsevier

Journal of Systems Architecture

Volume 98, September 2019, Pages 374-387
Journal of Systems Architecture

Reinforcement learning-driven address mapping and caching for flash-based remote sensing image processing

https://doi.org/10.1016/j.sysarc.2019.02.007Get rights and content

Abstract

Flash memory is featured with salient advantages over conventional hard disks for massive data storage and efficient on-board data processing. A flash translation layer (FTL) is a critical component for flash-based storage devices to handle particular technical constraints of flash. It is desirable to use flash memory for the storage of massive remote sensing images and support on-board remote sensing data processing applications, which typically require high I/O performance and hence call for advanced FTL design and implementations. In this paper, we introduce our efforts in developing a reinforcement learning driven page-level mapping and caching scheme (named Q-FTL) that is adaptive and responsive to ever-changing I/O streams of on-board remote sensing image processing operations. The adaptability and responsiveness are achieved by the separation of large and small I/O requests, an integrated weighting scheme to measure access costs of cached translation pages, and a reinforcement learning driven cache replacement algorithm. We demonstrate the efficiency of the proposed approach using actual I/O traces generated from on-board remote sensing image processing applications. Experimental results show that Q-FTL improves over several current state-of-the-art FTLs by a large margin and even achieves competitive performance close to an idealized pure page mapping FTL in some cases.

Introduction

The fast advancement of earth observation technologies creates an ever-increasing gap between data acquisition and data processing capabilities. Massive volumes of remote sensing image data are being collected at an unprecedented pace. Typically, these image data cannot be used before being downlinked to ground stations, in which image data are processed and distributed to end users. This traditional image data processing procedure poses challenges for the efficient transmission of massive remote sensing image data. An alternative solution is to transfer image processing capabilities from ground to sensor platforms and to perform on-orbit (or in-flight) image processing [1], [2], [3], [4]. Under such circumstances, an appropriate storage solution is urgently needed to underpin high performance I/O. Over the last decade, we observe a wide embrace of NAND flash memory as a persistent storage media in both portable devices (e.g., digital cameras, tablets, and mobile phones) and enterprise storage solutions [5], [6], [7]. Flash memory has salient advantages over conventional hard disks, such as fast random access rate and shock resistance. These merits make it a suitable media for on-board remote sensing data storage. Other than these advantages, flash memory has some unique properties that must be considered in the development of flash-based data storage and management technologies. These particularities include: I/O asymmetry between read and write, the erase-before-write constraint, and limited life expectancy. Consequently, flash needs to perform out-of-place update to avoid degraded performance caused by erase operations.

A flash translation layer (FTL) can be implemented and deployed in flash controllers to handle above-mentioned technical constraints of flash [8]. The primary function of an FTL is to perform logical-to-physical address translation to emulate traditional block devices and provide backwards compatibility for high-level applications running on flash memory. T

According to mapping granularity, three types of address mapping methods can be identified: page-level mapping [9], block-level mapping [10], [11], and hybrid mapping [12], [13]. A page-level mapping is highly efficient and flexible since address translation is performed at a finer granularity of page level. This study adopts a page-level mapping approach and uses advanced cache management mechanisms to exploit SRAM economically.

Some efforts have been made to develop specific flash-aware index structures for geospatial data [14], [15]. However, little effort has been made to develop flash-based storage solutions dedicated for remote sensing data processing applications, which often manifest unique I/O characteristics and require high I/O throughput rates with limited SRAM size. Zhang et al. [16] proposed a probability-based cache management approach for remote sensing image processing tasks. Nevertheless, this approach is largely static and its adaptability can be improved by machine learning technologies. Fig. 1 presents visualizations of typical non-spatial I/O traces and remote sensing processing I/O traces. Compared to generic non-spatial I/O traces, remote sensing traces exhibit strong spatial locality and correlated access patterns. Large-sized sequential I/O requests in remote sensing traces are multiplexed with small-sized random reads, which are subject to change over time. These dynamic I/O behaviors cannot be well handled by existing cache management schemes. The lack of adaptive data-dependent cache management methods in existing FTLs may lead to decreased hit ratios and inferior I/O performance for remote sensing applications. To address this issue, we need a data-driven modeling approach to separate I/O request with distinct patterns, to provide accurate measurements on access costs on cached data, and to enable adaptive cache replacement with acceptable overhead.

In this paper, we propose a page-level mapping and caching scheme (named Q-FLT) that is adaptive and responsive to ever-changing I/O streams of on-board image process operations. The adaptability and responsiveness are achieved by the separation of large and small I/O requests, an integrated weighting scheme to measure access costs, and a reinforcement learning driven cache replacement algorithm.

Section snippets

Cache management in FTL

In addition to address translation, cache management is another important function of an FTL. Different from traditional cache management algorithms, flash-based cache replacement must consider special idiosyncrasies of flash memory. For example, the write cost of dirty pages is a critical concern for cache replacement since the eviction of dirty pages would incur significant write and erase overheads. A page-level FTL maintains a large mapping table to store fine-grained logical-to-physical

Visual analysis of I/O traces and research motivation

We examine the spatio-temporal patterns of I/O activities produced by typical remote sensing processing tasks, which inspires us to develop advanced flash-aware FTL to support high-performance on-board remote sensing image processing. As illustrated in Fig. 2, a remote sensing image processing task usually involves multiple types of operations, which produce distinctly different data access patterns than regular generic applications (e.g., accesses to web servers or database transactions). The

Methodology

This study focuses on real-time cache management of address mapping data for on-board remote sensing image processing with limited SRAM on flash. The objective is to promote the overall I/O performance for flash-based remote sensing image processing tasks through a novel FTL scheme, which leverages the particular access patterns of remote sensing image processing tasks and implements locality-aware I/O separation, cost evaluation, and cache management techniques in light of these I/O

Experimental configurations

The experiments were conducted on realistic remote sensing image processing I/O traces using FlashSim, which is a widely-used NAND flash simulator in the literature [41]. The simulated flash device has a size of 30GB. The sizes of block and page are set to 512 KB and 2 KB, respectively. The latencies of read, write, and erase operations are set to 25us, 200us, and 1.5 ms. Note reads and writes are performed in the granularity of page whereas an erase operation is performed at the block level.

Conclusion

In this paper, we propose an advanced FTL scheme that implements a reinforcement learning driven page-level address mapping and caching scheme to enable efficient on-board remote sensing image processing. Q-FTL maintains a balance between read and write to flash. It has better performance than some existing FTLs due to two salient features: (1) Q-FTL purposely separates small random I/Os with sequential I/Os, with the goal to promote the utilization efficiency of SRAM; (2) It relies on an

Funding

This research was funded by National Natural Science Foundation of China grant number [91538102]; Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase) grant number [U1501501]; National Key Research and Development Program [No. 2017YFB0503502] and Basic Scientific Research Fund Program of Chinese Academy of Surveying and Mapping [No. 7771820].

Conflicts of interest

The authors declare no conflict of interest.

Tong Zhang received the M.Eng degree in Cartography and GIS from Wuhan University, China, in 2003, and the Doctoral degree in Geography from San Diego State University and University of California, Santa Barbara in 2007. He is currently a professor at State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan. His current research topics include high performance geocomputation, remote sensing data analysis and geospatial

References (42)

  • J. Gray et al.

    Flash disk opportunity for server applications

    Queue - Enterprise Flash Stor.

    (2008)
  • S. Lee et al.

    A case for flash memory SSD in enterprise database applications

  • E. Gal et al.

    Algorithms and data structures for flash memories

    ACM Comput. Surv.

    (2005)
  • A. Gupta et al.

    DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings

  • S. Lee et al.

    Fast: an efficient flash translation layer for flash memory

  • C. Park et al.

    A re-configurable FTL (flash translation layer) architecture for NAND flash based applications

    ACM Trans. Embed. Comput. Syst. Art. no.

    (2008)
  • J. Kim et al.

    A space-efficient flash translation layer for compact flash systems

    IEEE Trans. Consum. Electron.

    (2002)
  • M. Sarwat et al.

    Generic and efficient framework for search trees on flash memory storage systems

    Geoinformatica

    (2013)
  • D. That et al.

    TRIFL: a generic index for flash storage

    ACM Trans. Spat. Algor. Syst.

    (2015)
  • S. Jiang et al.

    S-FTL: an efficient address translation for flash memory by exploiting spatial locality

  • Z. Qin et al.

    A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems

  • Cited by (6)

    • Improving in-memory file system reading performance by fine-grained user-space cache mechanisms

      2021, Journal of Systems Architecture
      Citation Excerpt :

      Inspired by this, some researchers have employed advanced mathematical methods, or machine learning models to describe and analyze cache problems. For example, [24,36] consider cache replacement problems in the context of Markov decision processes, and [37,38] propose intelligent caching replacement frameworks under more complicated scenarios based on reinforcement learning and long short-term memory neural network. Particularly, among these algorithms, EVA [24], which is based on probabilistic model, achieves the best performance by taking full advantage of the distribution of all cache units’ hits, evictions and ages.

    • Self-Adapting Channel Allocation for Multiple Tenants Sharing SSD Devices

      2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
    • An Adaptive Image Contrast Enhancement Algorithm Based on Retinex

      2020, Proceedings - 2020 Chinese Automation Congress, CAC 2020
    • SSDKeeper: Self-Adapting Channel Allocation to Improve the Performance of SSD Devices

      2020, Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020

    Tong Zhang received the M.Eng degree in Cartography and GIS from Wuhan University, China, in 2003, and the Doctoral degree in Geography from San Diego State University and University of California, Santa Barbara in 2007. He is currently a professor at State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan. His current research topics include high performance geocomputation, remote sensing data analysis and geospatial optimization.

    Ze Cheng is currently a M.S. student at LIESMARS, Wuhan University. His research interests include flash-aware image storage, visual I/O analysis, and machine learning.

    Jing Li received MS in Earth System Science from George Mason University, USA in 2009 and the Ph.D in Earth System and Geoinformation Science from the same university in 2012. She is currently an Assistant Professor in the Department of Geography and the Environment at University of Denver, USA. Her research interests are high performance computing, spatiotemporal data modeling, and geovisualization.

    View full text