skip to main content
10.1145/3194554.3194642acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

On the Design of Reliable Heterogeneous Systems via Checkpoint Placement and Core Assignment

Published: 30 May 2018 Publication History

Abstract

This paper studies two basic problems in the design of high-performance and high-reliability heterogeneous systems: (1) what type of core to execute each task, and (2) where to place checkpoints in the execution of tasks. The implementation of checkpointing techniques on the novel persistent memory (e.g., 3D Xpoint memory) based heterogeneous systems faces a bundle of new problems. First, the assignments of tasks may greatly influence the execution time of the whole application. Therefore, with the same time constraint, the reliability of the resultant system can be significantly affected. Second, creating checkpoints will incur heavy writes on persistent memories and reduce the lifetime of devices. In this paper, we optimally construct reliable systems by assigning tasks to the most suitable cores and placing minimum number of checkpoints in the application, such that the resultant system can satisfy the time constraint in the presence of faults. We devise an efficient dynamic programming algorithm to obtain the optimal assignment and checkpoint placement. Experimental results demonstrate that, compared with existing approaches, our technique can achieve 44% reductions on the number of checkpoints on average.

References

[1]
3d xpoint storage memory. http://www.intel.com.
[2]
Odroid-xu3. http://goo.gl/Nn6z3O.
[3]
Samsung exynos 5422. http://www.samsung.com.
[4]
Haris Javaid et al. Optimal synthesis of latency and throughput constrained pipelined mpsocs targeting streaming applications. In CODES+ISSS, pages 75--84. IEEE, 2010.
[5]
Weiwen Jiang et al. Optimal functional-unit assignment and buffer placement for probabilistic pipelines. In CODES+ISSS, page 13, 2016.
[6]
Weiwen Jiang et al. Optimal functional unit assignment and voltage selection for pipelined mpsoc with guaranteed probability on time performance. In LCTES'17, pages 41--50. ACM, 2017.
[7]
Weiwen Jiang et al. On the design of minimal-cost pipeline systems satisfying hard/soft real-time constraints. IEEE Transactions on Emerging Topics in Computing, 2018.
[8]
Shiann Rong Kuang et al. Partitioning and pipelined scheduling of embedded system using integer linear programming. In ICPADS - Workshops, pages 37--41, 2005.
[9]
Chen Pan et al. A lightweight progress maximization scheduler for non-volatile processor under unstable energy harvesting. In LCTES'17, pages 101--110. ACM, 2017.
[10]
Paul Pop et al. Design optimization of time and cost-constrained fault-tolerant embedded systems with checkpointing and replication. Transactions on VLSI Systems, 17(3):389--402, 2009.
[11]
Sasikumar Punnekkat et al. Analysis of checkpointing for real-time systems. Real-Time Systems, 20(1):83--102, 2001.
[12]
Mohammad Salehi et al. Two-state checkpointing for energy-efficient fault tolerance in hard real-time systems. IEEE Transactions on VLSI, 24(7):2426--2437, 2016.
[13]
Ayed Salman et al. Particle swarm optimization for task assignment problem. Microprocessors and Microsystems, 26(8):363--371, 2002.
[14]
Jelena Spasic et al. Energy-efficient mapping of real-time applications on heterogeneous mpsocs using task replication. In CODES+ISSS, page 28. ACM, 2016.
[15]
William Thies et al. Streamit: A language for streaming applications. In Compiler Construction, pages 179--196, 2002.
[16]
SMP Variable. A multi-core cpu architecture for low power and high performance. Whitepaper-http://www.nvidia.com, 2011.
[17]
Mimi Xie et al. Fixing the broken time machine: Consistency-aware checkpointing for energy harvesting powered non-volatile processor. In DAC, pages 1--6, 2015.
[18]
Lei Yang et al. Application mapping and scheduling for network-on-chip-based multiprocessor system-on-chip with fine-grain communication optimization. IEEE Transactions on VLSI Systems, 24(10):3027--3040, 2016.
[19]
Lei Yang et al. Fotonoc: A hierarchical management strategy based on folded torus-like network-on-chip for dark silicon many-core systems. In ASP-DAC'16, pages 725--730. IEEE, 2016.
[20]
Lei Yang et al. Task mapping on smart noc: Contention matters, not the distance. In DAC'17, pages 1--6. IEEE, 2017.

Cited By

View all
  • (2020)When Single Event Upset Meets Deep Neural Networks: Observations, Explorations, and RemediesProceedings of the 25th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC47756.2020.9045134(163-168)Online publication date: 17-Jan-2020
  • (2019)Fault-Tolerant Regularity-Based Real-Time Virtual Resources2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA.2019.8864575(1-12)Online publication date: Aug-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI
May 2018
533 pages
ISBN:9781450357241
DOI:10.1145/3194554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fault tolerance
  2. heterogeneous MPSOC
  3. real-time system

Qualifiers

  • Research-article

Funding Sources

  • National 863 Program
  • National Natural Science Foundation of China
  • China Scholarship Council

Conference

GLSVLSI '18
Sponsor:
GLSVLSI '18: Great Lakes Symposium on VLSI 2018
May 23 - 25, 2018
IL, Chicago, USA

Acceptance Rates

GLSVLSI '18 Paper Acceptance Rate 48 of 197 submissions, 24%;
Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25
Great Lakes Symposium on VLSI 2025
June 30 - July 2, 2025
New Orleans , LA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)When Single Event Upset Meets Deep Neural Networks: Observations, Explorations, and RemediesProceedings of the 25th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC47756.2020.9045134(163-168)Online publication date: 17-Jan-2020
  • (2019)Fault-Tolerant Regularity-Based Real-Time Virtual Resources2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA.2019.8864575(1-12)Online publication date: Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media