ABSTRACT
With the increase in the complexity of the modern system on Chips(SoCs) and the demand for a lower time-to-market, automation becomes essential in hardware design. This is particularly relevant in complex/time-consuming tasks, as the optimization of design cost for a hardware component. Design cost, in fact, may depend on several objectives, as for the hardware-software trade-off. Given the complexity of this task, the designer often has no means to perform a fast and effective optimization in particular for larger and complex designs. In this paper, we introduce Deep Reinforcement Learning(DRL) for design cost optimization at the early stages of the design process. We first show that DRL is a perfectly suitable solution for the problem at hand. Afterward, by means of a Pointer Network, a neural network specifically applied for combinatorial problems, we benchmark three DRL algorithms towards the selected problem. Results obtained in different settings show the improvements achieved by DRL algorithms compared to conventional optimization methods. Additionally, by using reward redistribution proposed in the recently introduced RUDDER method, we obtain significant improvements in complex designs. Here, the obtained optimization is on average 15.18% on the area as well as 8.25% and 8.12% on the application size and execution time on a dataset of industrial hardware/software interface design
Supplemental Material
- E. J Anderson et al. 1994. Genetic algorithms for combinatorial optimization: the assemble line balancing problem. ORSA Journal on Computing (1994).Google Scholar
- J. A. Arjona-Medina et al. 2019. RUDDER: Return decomposition for delayed rewards. In NeurIPS.Google Scholar
- Irwan Bello et al. 2016. Neural Combinatorial Optimization with Reinforcement Learning. (2016).Google Scholar
- A. Colorni et al. 1996. Heuristics from nature for hard combinatorial optimization problems. International Transactions in Operational Research (1996).Google Scholar
- W. Ecker et al. 2017. Metamodeling and code generation in the hardware/software interface domain. In Handbook of Hard./Soft. Codesign.Google Scholar
- W. Ecker and others. 2009. Hardware-dependent Software: Principles and Practice. Springer Publishing Company, Incorporated. Google ScholarDigital Library
- I. Goodfellow et al. 2016. Deep Learning. MIT Press. Google ScholarDigital Library
- A. Graves et al. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In ICANN. Springer. Google ScholarDigital Library
- H. Hu et al. 2017. Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method. (2017).Google Scholar
- B. Korte et al. 2012. Combinatorial optimization. Springer. Google ScholarDigital Library
- A. Laterre et al. 2018. Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization. (2018).Google Scholar
- A. Mirhoseini et al. 2020. Chip Placement with Deep Reinforcement Learning. arXiv preprint arXiv:2004.10746 (2020).Google Scholar
- J. Schulman et al. 2017. Proximal Policy Optimization Algorithms.Google Scholar
- L. Servadei et al. 2019. Accurate Cost Estimation of Memory Systems Inspired by Machine Learning for Computer Vision. In Design, Automation Test in Europe Conf. Exh. (DATE).Google Scholar
- F. Streit et al. 2018. Model-based design automation of hardware/software codesigns for Xilinx Zynq PSoCs. In 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).Google ScholarCross Ref
- R. Sutton et al. 2018. Reinforcement Learning: An Introduction. A Bradford Book. Google ScholarDigital Library
- O. Vinyals et al. 2015. Pointer networks. In NIPS. Google ScholarDigital Library
Index Terms
Cost Optimization at Early Stages of Design Using Deep Reinforcement Learning
Recommendations
Placement Optimization with Deep Reinforcement Learning
ISPD '20: Proceedings of the 2020 International Symposium on Physical DesignPlacement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by motivating ...
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics MeetingReinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Comments