Elsevier

Microprocessors and Microsystems

Volume 33, Issues 7–8, October–November 2009, Pages 483-497
Microprocessors and Microsystems

Characterizing asynchronous variable latencies through probability distribution functions

https://doi.org/10.1016/j.micpro.2009.09.005Get rights and content

Abstract

Asynchronous systems are attracting the interest of the designer community because of several useful features for sub-micron technologies: process-variation tolerant, low-power, removal of the clock tree generation, etc. One of the main problems for the simulation of these systems is the variable computation delays of their modules, that compute as fast as possible under the actual conditions of the system. This behavior complicates the high-level simulation of such systems and it is the main reason for the lack of simulation tools devoted to asynchronous microarchitectures. In this paper we present a modeling method useful for this kind of systems that describes the variable computation delay of an asynchronous circuit by using probability distribution functions. This method is deployed in an architectural simulator of a 64-bit superscalar asynchronous microarchitecture where the computation delay of each one of the modules of the microarchitecture was characterized through a probability distribution function. The experimental results show that the asynchronous behavior is successfully modeled, and the architectural simulations of standard benchmarks is affordable in terms of wall-clock simulation time.

Introduction

Each successive technology is aggravating the well-known inconvenience of synchronous systems, that is, the need of over-designing the system to satisfy design constraints – about performance, power-consumption or device reliability under any corner conditions of process, voltage, temperature and on-chip-variations – in order to meet yield rates. Traditional techniques for minimizing design exposure to process and environmental variations are quickly becoming difficult to implement and consume an increasingly large portion of the microprocessor design.

As a result, the interest on asynchronous systems in the community of circuit designers is growing, i.e. [1], [2]. These circuits have a number of interesting inherent properties that solve some of the problems of synchronous designs:

  • High performance [3], [4]: the global circuit performance of fully asynchronous systems corresponds to the performance of the average case. In asynchronous systems a new computation starts immediately after the previous has finished [5].

  • Robustness towards variations on supply voltage, temperature and fabrication process [6], [7]: the functionality is designed to be independent from the timing, which allows the circuit to compute as fast as possible under any temperature, process and voltage corner.

  • Modular design [8], [9]: the local timing and the communication protocol interfaces allow designers to create modular systems, even based on templates or asynchronous IP cores.

  • Absence of clock distribution problems: there is no global clock signal in the system.

Nevertheless, there are two main drawbacks when designing asynchronous systems.

First, the control logic that implements the handshaking between asynchronous circuits usually represents an overhead in terms of silicon area, delay and power consumption. But, as shown in [3], [10], the overhead of the control logic may be hidden or compensated.

Second, there is a lack of CAD tools devoted to asynchronous circuits. Despite many CAD tools and algorithms for synthesis of asynchronous systems, i.e. [11], [12], [13], are currently available, there are few tools related to architectural simulation of asynchronous systems.

One of the main obstacles for the architectural simulation of asynchronous circuits and, therefore, for the development of simulation tools, is the data-dependency of the computation delays. The delay due to the computation may be different for each incoming data on an asynchronous circuit because it computes as fast as possible without any timing constraint. Up to our knowledge, there are not reported methods in literature related to the data-dependent characterization of the variable computation delay applied to the architectural simulations of asynchronous processors.

The simulation of this kind of systems requires firstly a method that enables, in a cost-effective way, both in terms of memory requirements and computing power of the simulation infrastructure, the characterization of asynchronous modules with data-dependent computation delays; and secondly, a tool able to simulate a processor whose modules are asynchronous circuits.

Hence, the main contributions of this paper are:

  • 1.

    A modeling method based on probability distribution functions (PDF) that allows the cost-effective architectural simulation of complex asynchronous systems, and

  • 2.

    A tool that allows the simulation of an asynchronous Alpha 21264-like processor. This tool deploys the modeling method based on PDFs and permits to configure most of the parameters of the processor with the aim of studying the processor performance under standard workloads, typically SPEC2000 suite.

The rest of the paper is organized as follows. In Section 2 we review works focused on measuring the performance of asynchronous systems. In Section 3 we introduce the problem of characterizing and simulating the data-dependent behavior of asynchronous circuits. In Section 4 we detail the method followed to obtain the PDF from a sample of delays, and we show a statistical metric to prove the quality of the sample. In Section 5 we describe the implementation of the PDF characterization within an architectural simulator, and in Section 6 we verify the cost-effective and successful architectural simulation of a superscalar asynchronous microarchitecture. Finally, in Section 7 we present the conclusions and the future work.

Section snippets

Related work

PDFs, powerful statistical tools able to describe the probability of a given variable taking different values, may summarize the variable computation delay of an asynchronous module.

The use of PDFs is widespread in science: many natural phenomena can be modeled by using distribution functions. For instance, in relation to the energy state of a particle, three different distribution functions have been described [14]: Maxwell-Boltzmann, Bose–Einstein and Fermi–Dirac distributions; and many other

Simulating data-dependent delays

The timing of an asynchronous circuit is not homogeneous because its computation delay depends on the data being processed. Furthermore, considering a complex asynchronous system formed by modules that compute in an independent way – every individual module of the system presents its own data-dependent computation delay – and perform the communication of results between them using a handshake protocol, the characterization of the computation delay of the whole system becomes more difficult.

One

Generation

In this paper we will consider some statistical definitions and will relate each one of them with the steps followed to obtain a PDF able to describe the computation delay of an asynchronous module.

Population, Ω: set of all possible computation delays for the given module. We consider the computation delay as a random variable. Two key parameters are frequently used to characterize Ω: the population mean (μ) and the population variance (σ2).

Sample. A subset of Ω. Any sample of a variable not

PDFs within an architectural simulator

In this section we present the usage of the PDFs in a tool able to evaluate the performance of a complex asynchronous microarchitecture using the PDFs introduced in Sections 1 Introduction, 3 Simulating data-dependent delays. Configurable and parameterized tools able to evaluate the performance of such a complex asynchronous systems are very useful for the community of systems designers. In our tool, and in opposition to works reviewed in Section 2, the performance of the system is measured by

Experimental results

In this section we show experimental results that prove the following issues: (1) the characterization of asynchronous modules using PDFs may be implemented in an architectural simulator running cost-effective simulations; and (2) the results of the simulations show that this characterization leads to the typical asynchronous behavior for the modeled microarchitecture.

Therefore, in order to validate the correct model of an asynchronous system using PDFs, we have run the SPEC2000 benchmarks on

Conclusions and future work

In this paper we describe a modeling method that allows the cost-effective architectural simulation of complex asynchronous systems. The method consists on characterizing the computation delay of all the modules of an asynchronous circuit as statistical variables. Each one of these modules is characterized by a PDF that returns the probability of a given delay to be spent on the computation of a data. The steps of the method start obtaining a sample of delays from the asynchronous module. Then,

Acknowledgements

The authors would like to thank all the reviewers for their insightful advices and to state that their comments have been helpful to improve the quality of the paper.

This work has been supported by Spanish Government Grant TIN2008-00508 and MEC Consolider Ingenio CSD00C-07-20811 of the Spanish Council of Science and Technology.

José Manuel Colmenar was born in Madrid in January, 1978. He obtained a M.S. degree in Computer Engineering in 2001, and received a Ph.D. degree in 2008, both from the Universidad Complutense de Madrid (UCM). He is currently an Assistant Professor of Computer Science at the Aranjuez campus of the UCM. His current research interests include asynchronous systems and microprocessors, multi-core and SoC architectures and evolutive algorithms.

References (47)

  • B.M. Bara et al.

    Concentration fluctuation profiles from a water channel simulation of a ground-level release

    General Topics on Atmospheric Environment: Part A

    (1992)
  • F. Commoner et al.

    Marked directed graphs

    Journal of Computer and Systems Science

    (1971)
  • J.D. Garside, W.J. Bainbridge, A. Bardsley, D.M. Clark, D.A. Edwards, S.B. Furber, J. Liu, D.W. Lloyd, S. Mohammadi,...
  • A. Bink et al.

    Arm996HS: the first licensable, clockless 32-bit processor core

    IEEE Micro

    (2007)
  • R.O. Ozdag et al.

    An asynchronous low-power high-performance sequential decoder implemented with QDI templates

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    (2006)
  • S. Chen et al.

    Self-timed dynamically pipelined adaptive signal processing system: a case study of DLMS equalizer for read channel

    IEEE Transactions on Circuits and Systems I: Regular Papers

    (2005)
  • D. Kearney, Theoretical limits on the data dependent performance on asynchronous circuits, in: Proceedings of...
  • L.S. Nielsen et al.

    Low-power operation using self-timed circuits and adaptive scaling of the supply voltage

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    (1994)
  • A.J. Martin et al.

    The first asynchronous microprocessor: the test results

    Computer Architecture News

    (1989)
  • M. Ferretti et al.

    High performance asynchronous design using single-track full-buffer standard cells

    IEEE Journal of Solid-State Circuits

    (2006)
  • A.J. Martin et al.

    Asynchronous techniques for system-on-chip design

    Proceedings of the IEEE

    (2006)
  • W. Kuang, J.S. Yuan, Low power operation using self-timed circuits and ultra-low supply voltage, in: The 14th...
  • P. Endecott, S. Furber, Modelling and simulation of asynchronous systems using the LARD hardware description language,...
  • J. Cortadella et al.

    Desynchronization: synthesis of asynchronous circuits from synchronous specifications

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2006)
  • J. Carmona et al.

    Synthesis of asynchronous controllers using integer linear programming

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2006)
  • F.J. Blatt

    Modern Physics

    (1992)
  • E. Yee et al.

    Concentration fluctuation measurements in clouds released from a quasi-instantaneous point-source in the atmospheric surface-layer

    Boundary – Layer Meteorology

    (1994)
  • J. Fente, K. Knutson, C. Schexnayder, Defining a beta distribution function for construction simulation, in: Simulation...
  • D.J. Lary et al.

    Using probability distribution functions for satellite validation

    IEEE Transactions on Geoscience and Remote Sensing

    (2006)
  • R. Geist, J. Westall, Correlational and distributional effects in network traffic models, in: IEEE International...
  • K. Cantillo et al.

    Modeling of communication delays aiming at the design of networked supervisory and control systems. A first approach

    Lecture Notes in Computer Science

    (2005)
  • Feng Wu-chang et al.

    A traffic characterization of popular on-line games

    IEEE/ACM Transactions on Networking

    (2005)
  • A. Xie et al.

    Performance analysis of asynchronous circuits and systems using stochastic timed Petri nets

  • Cited by (2)

    • Statistical analysis of asynchronous pipelines in presence of process variation using formal models

      2016, Integration, the VLSI Journal
      Citation Excerpt :

      However, we obtain accurate statistical delay with variation considerations, analyzing more pipeline templates, and better estimation of power and delay. In [20] a high level simulator with variable delays based on a distribution function has been developed. This approach is desirable for verification, however compared to our work they have neither considered variation problem.

    • Simulating a LAGS processor to consider variable latency on L1 D-Cache

      2010, Summer Computer Simulation Conference, SCSC 2010 - Proceedings of the 2010 Summer Simulation Multiconference, SummerSim 2010

    José Manuel Colmenar was born in Madrid in January, 1978. He obtained a M.S. degree in Computer Engineering in 2001, and received a Ph.D. degree in 2008, both from the Universidad Complutense de Madrid (UCM). He is currently an Assistant Professor of Computer Science at the Aranjuez campus of the UCM. His current research interests include asynchronous systems and microprocessors, multi-core and SoC architectures and evolutive algorithms.

    Oscar Garnica has a graduate degree in Physics (B.S. in Physics and M.S. in Electrical Engineering), and a Ph.D. in Physics (Program in Computer Science). He has about 14 years of experience in the fields of circuit and systems design, and asynchronous and power-aware processors. Currently, he is an Associate Professor in the Department of Computer Architecture and System Engineering at Universidad Complutense de Madrid (UCM). He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Previously he has held several positions as Assistant Professor in the UCM, and ASIC Design Engineer in Lucent Technologies Bell Labs Innovations, Agere Systems Inc., and LSI Logic Inc. developing high-speed circuits for the telecommunication market. Currently his research interests include processor design with special emphasis on the application of novel timing methodologies, memory hierarchy optimization and management, thermal-aware designs, and the application of Bio-inspired optimization techniques in CAD problems.

    Juan Lanchares Dávila has a graduate degree in Physics (B.S. in Physics and M.S. in Automatic Calculus), a Ph.D. in Physics (Program in Computer Science). He has about 18 years of research experience in the field of Systems Design, Evolutionary Computation Techniques and Asynchronous and Power Aware Processors. Currently, he is an Associate Professor in Computer Architecture and Technology in the Department of Computer Architecture and System Engineering at Complutense University of Madrid (Madrid, Spain). He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Over the last years, he has published papers and works on the subjects of Evolutionary Computation, Parallel Genetic Algorithms, Multi-FPGA systems design, asynchronous systems and power reduction techniques.

    José Ignacio Hidalgo has a graduate degree in Physics (B.S. in Physics and M.S. in Electrical Engineering), a Ph.D. in Physics (Program in Computer Science). He has about 14 years of research experience in the fields of Evolutionary Computation Techniques for Systems Design and optimization and, Asynchronous and Power Aware Processors. Currently, he is an Associate Professor in Computer Architecture and Technology in the Department of Computer Architecture and System Engineering at Complutense University of Madrid (Madrid, Spain), where he served as the academicals secretariat for three years. He belongs to the Group of Architecture and Technology of Computing Systems (ArTeCS). Over the last years, he has published papers and works on the subjects of Evolutionary Computation, Parallel Genetic Algorithms, Multi-FPGA systems design, asynchronous systems and power reduction techniques. He also has reviewed articles for several National and International Journals and Conferences. He currently is Director of CES Felipe II Computer Science Undergraduate School since 2006. He has served also as Guest Editor of an Special Issue on Parallel Architectures and Bioinspired Algorithms of the International Journal of High Performance Systems Architecture. He is also Guest Editor of an Special Issue of Parallel Computing (ParCo) Journal.

    View full text