With the rapid advances in manufacturing and communication technologies, embedded systems have evolved tremendously in recent years. However, embedded systems usually have limited energy, computing power, and memory/storage space. In particular, the data transfer cost between CPU and storage/memory becomes the critical challenge for such systems. Nonetheless, in the past decades, memory and storage technologies have been significantly advanced and gradually support the computing capability, and such a development trend of adding computing functions in/near memory or storage devices to enable “memory and storage computing” provides a new opportunity to resolve the performance bottleneck caused by the massive amount of data movement between CPU and memory/storage units. This has been a hot topic and widely acknowledged by academics and industries.
After a rigorous review process, a set of articles were selected for their expertise on the precise topics of each article. Thus, this special issue represents a collective effort from the research community and industry participants on an international scale. From the many excellent submissions received, ten articles are included in this special issue. The articles appearing in this special issue tackle some of the most recent and impactful design issues of in/near memory and storage computing for embedded systems. These articles are briefly discussed in the rest of the introduction.
With the increasing scale of cloud computing applications of next-generation embedded systems, a major challenge that domain scientists are facing is how to efficiently store and analyze the vast volume of output data. Compression is one of most popular methods to solve this problem. Li et al. in “
AMP: Total Variation Reduction for Lossless Compression via Approximate Median-based Preconditioning” present a total variation reduction method to address the issue of most large datasets being in floating-pint format by employing a median-based hyperplane to precondition the data.
EADR feature is introduced recently that guarantees to flush data buffered in CPU cache to persistent memory on a power outage, thereby making the CPU cache a transient persistence domain. Atomic durability has been enabled for applications’ in-persistent memory data. Ye et al. in “
Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence Domain” propose a hardware logging design for the transaction-level atomic durability to enable atomic durability for persistent memory with transient persistence domain.
As the core operation of lattice cipher, large-scale polynomial multiplication is the biggest computational bottleneck in its realization process. How to quickly calculate polynomial multiplication under resource constraints has become an urgent problem to be solved in the hardware implementation of lattice ciphers. Du et al. in “
Analog In-memory Circuit Design of Polynomial Multiplication for Lattice Cipher Acceleration Application” propose an analog in-memory circuit for fast polynomial multiplication calculation.
Existing architectural studies on ReRAM-based
processing-in-memory (
PIM) DNN accelerators generally assume that all weights of the DNN can be mapped to the crossbar at once. Actually, ReRAM crossbar resources for calculation are limited because of technological limitations, so multiple weight mapping procedures are required during the inference process. Under this restriction, Gao et al in “
Static Scheduling of Weight Programming for DNN Acceleration with Resource Constrained PIM” propose a static scheduling framework which generates the mapping between DNN weights and ReRAM cells with minimum runtime weight programming cost.
Recent advancements in the fabrication of ReRAM devices have led to the development of large-scale crossbar structures. In-memory computing architectures relying on ReRAM crossbars aim at mitigating the processor-memory bottleneck that exists with current CMOS technology. However, the verification of the design realized on ReRAM crossbars is done either through manual inspection or using simulation based approaches which cannot be applied to the verification of complex designs on large-scale ReRAM crossbars. Bhunia et al. in “
ReSG: A Data Structure for Verification of Majority based In-Memory Computing on ReRAM Crossbars” propose an automatic equivalence checking flow that determines the equivalence between the original function specification and the crossbar micro-operations file formats.
Voltage scaling is one of the most promising approaches for energy efficiency improvement but also brings challenges to fully guaranteeing stable operation in modern VLSI. To tackle these issues, Liang et al. in “
A Robust and Energy Efficient Hyperdimensional Computing System for Voltage-scaled Circuits” propose a
Hyper-Dimensional Computing (
HDC) which can tolerate bit-level memory failure in the low voltage region with high robustness. It is the second version of DependableHD, including the concept of margin enhancement for model retraining, noise injection for improving the robustness, and a dimension-swapping technique.
The data movement in large-scale computing facilities (from compute nodes to data nodes) is categorized as one of the major contributors to high cost and energy utilization. To tackle it,
in-storage processing (
ISP) within storage devices, such as
computational storage drives (
CSDs), are widely studied. One of the key challenges of building a CSD-based storage system within a compute node is that commercialized CSDs have different hardware resources and performance characteristics. Byun et al. in “
An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems” propose an analytical model-based storage capacity planner for system architects to build performance-effective CSD-based compute nodes.
Near-data processing (
NDP) is widely studied to solve the write-amplification issue caused by compaction operations in LSM-tree-based
key-value stores (
KV stores). However, the performance of NDP frameworks with synchronous parallel schemes is limited by the subsystem that has lower compaction performance. Sun et al. in “
An Asynchronous Compaction Acceleration Scheme for Near-Data Processing-enabled LSM-Tree-based KV Stores” propose an asynchronous parallel scheme to solve this problem by designing a multi-tasks queue and three priority-based scheduling methods.
In-memory processing is becoming a popular method to alleviate the memory bottleneck of the von Neumann computing model. Meanwhile, Spintronic
Racetrack Memory (
RM) is one of the non-volatile memory technologies which is widely studied to meet the requirements of latency and energy cost for in-memory processing. Bera et al. in “
SPIMulator: A Spintronic Processing-In-Memory Simulator for Racetracks” propose a spintronic PIM simulator that can simulate the storage and PIM architecture of executing PIM commands in Racetrack memory.
Energy-harvesting technology-based
Internet of Things (
IoT) devices have received attentions due to their advantages of green and low-carbon economy, convenient maintenance, and theoretical infinite lifetime, and so on. Meanwhile, ReRAM-based
convolutional neural networks (
CNN) accelerators are widely studied to solve the problem caused by unstable harvested energy. By considering the mismatch between the power requirement of different CNN layers and variation of harvested power, Zhou et al. in “
REC: REtime Convolutional layers to fully exploit harvested energy for ReRAM-based CNN accelerators” propose a novel strategy that retimes convolutional layers of CNN inferences to improve the performance and energy efficiency of energy harvesting ReRAM-based accelerators.
The guest editors thank the reviewers for their valuable time, expertise, and constructive feedback in their reviews. We also thank all the authors for their submissions and their accommodation of the publication deadlines and constraints. Finally, we would also like to thank the Editor-in-Chief of ACM Transactions on Embedded Computing Systems, Professor Tulika Mitra, whose help made this special issue possible.
Liang Shi
East China Normal University, China
Jingtong Hu
University of Pittsburgh, USA
Hussam Amrouch
University of Stuttgart, Germany
Kuan-Hsun Chen
University of Twente, Netherlands
Mengying Zhao
Shangdong University, China
Weichen Liu
Nanyang Technological University, Singapore
Guest Editors