Abstract:
NAND-based solid state drives (SSDs) are almost ubiquitously used in safety-critical systems, and recent advances have demonstrated redundant array of independent disks (...Show MoreMetadata
Abstract:
NAND-based solid state drives (SSDs) are almost ubiquitously used in safety-critical systems, and recent advances have demonstrated redundant array of independent disks (RAID) implementations that built on the top of SSDs can effectively enhance the data integrity and reliability. RAID can restore the lost data chunks in case of failures of RAID components (i.e., SSDs in the context), through a process of RAID reconstruction. Specially, online RAID reconstruction allows the RAID system to continue fulfilling user I/O requests during reconstruction. Servicing user I/O requests, however, significantly affects the performance of reconstruction due to contention for the shared SSD bandwidth. This article proposes a fast online reconstruction method for SSD-based RAID systems, that preferably restores the lost chunks if the replaced SSD device is idle to reduce the reconstruction time, thus minimizing the probability of a second disk failure in the RAID system during reconstruction. Furthermore, it schedules the tasks of restoring data/parity chunks according to the their impacts on other working SSDs in the RAID system, for the purpose of reducing the overall I/O latency. Through a series of experiments based on the selected disk traces of real-world applications, we show that the proposed reconstruction scheme can reduce the reconstruction time by up to 45.6%, and meanwhile cut down the I/O latency by 9.8% on average compared to state-of-the-art methods.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 6, June 2024)