Characterizing asynchronous variable latencies through probability distribution functions
Introduction
Each successive technology is aggravating the well-known inconvenience of synchronous systems, that is, the need of over-designing the system to satisfy design constraints – about performance, power-consumption or device reliability under any corner conditions of process, voltage, temperature and on-chip-variations – in order to meet yield rates. Traditional techniques for minimizing design exposure to process and environmental variations are quickly becoming difficult to implement and consume an increasingly large portion of the microprocessor design.
As a result, the interest on asynchronous systems in the community of circuit designers is growing, i.e. [1], [2]. These circuits have a number of interesting inherent properties that solve some of the problems of synchronous designs:
- •
High performance [3], [4]: the global circuit performance of fully asynchronous systems corresponds to the performance of the average case. In asynchronous systems a new computation starts immediately after the previous has finished [5].
- •
Robustness towards variations on supply voltage, temperature and fabrication process [6], [7]: the functionality is designed to be independent from the timing, which allows the circuit to compute as fast as possible under any temperature, process and voltage corner.
- •
Modular design [8], [9]: the local timing and the communication protocol interfaces allow designers to create modular systems, even based on templates or asynchronous IP cores.
- •
Absence of clock distribution problems: there is no global clock signal in the system.
Nevertheless, there are two main drawbacks when designing asynchronous systems.
First, the control logic that implements the handshaking between asynchronous circuits usually represents an overhead in terms of silicon area, delay and power consumption. But, as shown in [3], [10], the overhead of the control logic may be hidden or compensated.
Second, there is a lack of CAD tools devoted to asynchronous circuits. Despite many CAD tools and algorithms for synthesis of asynchronous systems, i.e. [11], [12], [13], are currently available, there are few tools related to architectural simulation of asynchronous systems.
One of the main obstacles for the architectural simulation of asynchronous circuits and, therefore, for the development of simulation tools, is the data-dependency of the computation delays. The delay due to the computation may be different for each incoming data on an asynchronous circuit because it computes as fast as possible without any timing constraint. Up to our knowledge, there are not reported methods in literature related to the data-dependent characterization of the variable computation delay applied to the architectural simulations of asynchronous processors.
The simulation of this kind of systems requires firstly a method that enables, in a cost-effective way, both in terms of memory requirements and computing power of the simulation infrastructure, the characterization of asynchronous modules with data-dependent computation delays; and secondly, a tool able to simulate a processor whose modules are asynchronous circuits.
Hence, the main contributions of this paper are:
- 1.
A modeling method based on probability distribution functions (PDF) that allows the cost-effective architectural simulation of complex asynchronous systems, and
- 2.
A tool that allows the simulation of an asynchronous Alpha 21264-like processor. This tool deploys the modeling method based on PDFs and permits to configure most of the parameters of the processor with the aim of studying the processor performance under standard workloads, typically SPEC2000 suite.
The rest of the paper is organized as follows. In Section 2 we review works focused on measuring the performance of asynchronous systems. In Section 3 we introduce the problem of characterizing and simulating the data-dependent behavior of asynchronous circuits. In Section 4 we detail the method followed to obtain the PDF from a sample of delays, and we show a statistical metric to prove the quality of the sample. In Section 5 we describe the implementation of the PDF characterization within an architectural simulator, and in Section 6 we verify the cost-effective and successful architectural simulation of a superscalar asynchronous microarchitecture. Finally, in Section 7 we present the conclusions and the future work.
Section snippets
Related work
PDFs, powerful statistical tools able to describe the probability of a given variable taking different values, may summarize the variable computation delay of an asynchronous module.
The use of PDFs is widespread in science: many natural phenomena can be modeled by using distribution functions. For instance, in relation to the energy state of a particle, three different distribution functions have been described [14]: Maxwell-Boltzmann, Bose–Einstein and Fermi–Dirac distributions; and many other
Simulating data-dependent delays
The timing of an asynchronous circuit is not homogeneous because its computation delay depends on the data being processed. Furthermore, considering a complex asynchronous system formed by modules that compute in an independent way – every individual module of the system presents its own data-dependent computation delay – and perform the communication of results between them using a handshake protocol, the characterization of the computation delay of the whole system becomes more difficult.
One
Generation
In this paper we will consider some statistical definitions and will relate each one of them with the steps followed to obtain a PDF able to describe the computation delay of an asynchronous module.
Population, : set of all possible computation delays for the given module. We consider the computation delay as a random variable. Two key parameters are frequently used to characterize : the population mean () and the population variance ().
Sample. A subset of . Any sample of a variable not
PDFs within an architectural simulator
In this section we present the usage of the PDFs in a tool able to evaluate the performance of a complex asynchronous microarchitecture using the PDFs introduced in Sections 1 Introduction, 3 Simulating data-dependent delays. Configurable and parameterized tools able to evaluate the performance of such a complex asynchronous systems are very useful for the community of systems designers. In our tool, and in opposition to works reviewed in Section 2, the performance of the system is measured by
Experimental results
In this section we show experimental results that prove the following issues: (1) the characterization of asynchronous modules using PDFs may be implemented in an architectural simulator running cost-effective simulations; and (2) the results of the simulations show that this characterization leads to the typical asynchronous behavior for the modeled microarchitecture.
Therefore, in order to validate the correct model of an asynchronous system using PDFs, we have run the SPEC2000 benchmarks on
Conclusions and future work
In this paper we describe a modeling method that allows the cost-effective architectural simulation of complex asynchronous systems. The method consists on characterizing the computation delay of all the modules of an asynchronous circuit as statistical variables. Each one of these modules is characterized by a PDF that returns the probability of a given delay to be spent on the computation of a data. The steps of the method start obtaining a sample of delays from the asynchronous module. Then,
Acknowledgements
The authors would like to thank all the reviewers for their insightful advices and to state that their comments have been helpful to improve the quality of the paper.
This work has been supported by Spanish Government Grant TIN2008-00508 and MEC Consolider Ingenio CSD00C-07-20811 of the Spanish Council of Science and Technology.
José Manuel Colmenar was born in Madrid in January, 1978. He obtained a M.S. degree in Computer Engineering in 2001, and received a Ph.D. degree in 2008, both from the Universidad Complutense de Madrid (UCM). He is currently an Assistant Professor of Computer Science at the Aranjuez campus of the UCM. His current research interests include asynchronous systems and microprocessors, multi-core and SoC architectures and evolutive algorithms.
References (47)
- et al.
Concentration fluctuation profiles from a water channel simulation of a ground-level release
General Topics on Atmospheric Environment: Part A
(1992) - et al.
Marked directed graphs
Journal of Computer and Systems Science
(1971) - J.D. Garside, W.J. Bainbridge, A. Bardsley, D.M. Clark, D.A. Edwards, S.B. Furber, J. Liu, D.W. Lloyd, S. Mohammadi,...
- et al.
Arm996HS: the first licensable, clockless 32-bit processor core
IEEE Micro
(2007) - et al.
An asynchronous low-power high-performance sequential decoder implemented with QDI templates
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
(2006) - et al.
Self-timed dynamically pipelined adaptive signal processing system: a case study of DLMS equalizer for read channel
IEEE Transactions on Circuits and Systems I: Regular Papers
(2005) - D. Kearney, Theoretical limits on the data dependent performance on asynchronous circuits, in: Proceedings of...
- et al.
Low-power operation using self-timed circuits and adaptive scaling of the supply voltage
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
(1994) - et al.
The first asynchronous microprocessor: the test results
Computer Architecture News
(1989) - et al.
High performance asynchronous design using single-track full-buffer standard cells
IEEE Journal of Solid-State Circuits
(2006)
Asynchronous techniques for system-on-chip design
Proceedings of the IEEE
Desynchronization: synthesis of asynchronous circuits from synchronous specifications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Synthesis of asynchronous controllers using integer linear programming
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Modern Physics
Concentration fluctuation measurements in clouds released from a quasi-instantaneous point-source in the atmospheric surface-layer
Boundary – Layer Meteorology
Using probability distribution functions for satellite validation
IEEE Transactions on Geoscience and Remote Sensing
Modeling of communication delays aiming at the design of networked supervisory and control systems. A first approach
Lecture Notes in Computer Science
A traffic characterization of popular on-line games
IEEE/ACM Transactions on Networking
Performance analysis of asynchronous circuits and systems using stochastic timed Petri nets
Cited by (2)
Statistical analysis of asynchronous pipelines in presence of process variation using formal models
2016, Integration, the VLSI JournalCitation Excerpt :However, we obtain accurate statistical delay with variation considerations, analyzing more pipeline templates, and better estimation of power and delay. In [20] a high level simulator with variable delays based on a distribution function has been developed. This approach is desirable for verification, however compared to our work they have neither considered variation problem.
Simulating a LAGS processor to consider variable latency on L1 D-Cache
2010, Summer Computer Simulation Conference, SCSC 2010 - Proceedings of the 2010 Summer Simulation Multiconference, SummerSim 2010
José Manuel Colmenar was born in Madrid in January, 1978. He obtained a M.S. degree in Computer Engineering in 2001, and received a Ph.D. degree in 2008, both from the Universidad Complutense de Madrid (UCM). He is currently an Assistant Professor of Computer Science at the Aranjuez campus of the UCM. His current research interests include asynchronous systems and microprocessors, multi-core and SoC architectures and evolutive algorithms.
Oscar Garnica has a graduate degree in Physics (B.S. in Physics and M.S. in Electrical Engineering), and a Ph.D. in Physics (Program in Computer Science). He has about 14 years of experience in the fields of circuit and systems design, and asynchronous and power-aware processors. Currently, he is an Associate Professor in the Department of Computer Architecture and System Engineering at Universidad Complutense de Madrid (UCM). He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Previously he has held several positions as Assistant Professor in the UCM, and ASIC Design Engineer in Lucent Technologies Bell Labs Innovations, Agere Systems Inc., and LSI Logic Inc. developing high-speed circuits for the telecommunication market. Currently his research interests include processor design with special emphasis on the application of novel timing methodologies, memory hierarchy optimization and management, thermal-aware designs, and the application of Bio-inspired optimization techniques in CAD problems.
Juan Lanchares Dávila has a graduate degree in Physics (B.S. in Physics and M.S. in Automatic Calculus), a Ph.D. in Physics (Program in Computer Science). He has about 18 years of research experience in the field of Systems Design, Evolutionary Computation Techniques and Asynchronous and Power Aware Processors. Currently, he is an Associate Professor in Computer Architecture and Technology in the Department of Computer Architecture and System Engineering at Complutense University of Madrid (Madrid, Spain). He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Over the last years, he has published papers and works on the subjects of Evolutionary Computation, Parallel Genetic Algorithms, Multi-FPGA systems design, asynchronous systems and power reduction techniques.
José Ignacio Hidalgo has a graduate degree in Physics (B.S. in Physics and M.S. in Electrical Engineering), a Ph.D. in Physics (Program in Computer Science). He has about 14 years of research experience in the fields of Evolutionary Computation Techniques for Systems Design and optimization and, Asynchronous and Power Aware Processors. Currently, he is an Associate Professor in Computer Architecture and Technology in the Department of Computer Architecture and System Engineering at Complutense University of Madrid (Madrid, Spain), where he served as the academicals secretariat for three years. He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Over the last years, he has published papers and works on the subjects of Evolutionary Computation, Parallel Genetic Algorithms, Multi-FPGA systems design, asynchronous systems and power reduction techniques. He also has reviewed articles for several National and International Journals and Conferences. He currently is Director of CES Felipe II Computer Science Undergraduate School since 2006. He has served also as Guest Editor of an Special Issue on Parallel Architectures and Bioinspired Algorithms of the International Journal of High Performance Systems Architecture. He is also Guest Editor of an Special Issue of Parallel Computing (ParCo) Journal.