Skip to main content

Advertisement

Log in

Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system’s performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space—just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor’s register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

References

  1. Smith, J. E. (1982). “Decoupled Access/Execute Architectures”, Proceedings of the 9th International Symposium on Computer Architecture, pp. 112–119, May.

  2. Talla, D., John, L. K. (2001). “MediaBreeze: A Decoupled Architecture for Accelerating Multimedia Applications” ACM Computer Architecture News, ACM Press, ISSN 0163-5964, pp. 62–67, vol. 29. no. 5, December.

  3. Thies, W., Karczmarek, M., Amarasinghe, S. (2002). “StreamIt: A language for streaming applications,” in Int’l Conference on Compiler Construction, Apr.

  4. Buck, I. (2003). “Brook Specification v0.2,” merrimac.stanford.edu/brook/brookspec-v0.2.pdf, October.

  5. Gupta, S., Miranda, M., Catthoor, F., Gupta, R. (2000). “Analysis of high-level address code transformations for programmable processors,” Procedings ACM Conference on Design and Test in Europe 2000, Paris, France, pp. 9–13, March.

  6. Miranda, M., Catthoor, F., Janssen, M., & De Man, H. (1998). High-level Address Optimisation and Synthesis Techniques for Data-transfer Intensive Applications. IEEE Transactions on VLSI Systems, 6(4), 677–686.

    Article  Google Scholar 

  7. Panda, P. R., Catthoor, F. et al. (2001). Data and memory optimizations for embedded systems. ACM TODAES, April.

  8. Kandemir, M. T., & Choudhary, A. (2002). Compiler-directed scratch pad memory hierarchy design and management. New Orleans, USA: DAC.

    Google Scholar 

  9. Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., Mendias, J. (2004). “An integrated Hardware/Software Approach For Run-Time scratch-pad Management”, Proceedings of the 41st annual conference on Design automation, June 07–11, San Diego, CA, USA.

  10. Kandemir, M., et al. (2004). A Compiler Based Approach for Dynamically Managing Scratch-pad Memories in Embedded Systems. IEEE Transactions on Computer-Aided Design, 23(2), 243–260.

    Article  Google Scholar 

  11. Issenin, I., Brockmeyer, E., Miranda, M., Dutt, N. (2004). Data reuse analysis technique for software-controlled memory hierarchies. In proceedings of the Conference on Design Automation and Test in Europe (DATE ), pp. 202–207.

  12. Dasygenis, M., Brockmeyer, E., Durinck, B., Catthoor, F., Soudris, D., & Thanailakis, A. (2006). A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(3), 279–291.

    Article  Google Scholar 

  13. Kurian, L., Hulina, T., Coraor, L. D. (1994). “Memory Latency Effects in Decoupled Architectures”, IEEE Transactions on Computers, 43(10), October.

  14. Jones, G. P., Topham, N. P. (1997). “A Comparison of Data Prefetching on an Access Decoupled and Superscalar Machine” Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), December 1997, North Carolina, US.

  15. Mathew, B., Davis, A. (2004). “A Loop Accelerator for Low Power Embedded VLIW Processors”, Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 08–10, Stockholm, Sweden.

  16. Mowry, T. C., Lam, M. S., Gupta, A. (1991). “Design and Evaluation of a Compiler Algorithm for Prefetching”, Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October.

  17. Rich, K. D., Farrens, M. K. (2000). “Code Partitioning in Decoupled Compilers” European Conference on Parallel Processing (Euro–Par), pp.1008–1017.

  18. Kurdah, F. J., Parker, A. C. (1999). “REAL: a program for register allocation”, Proc. EuroPar Conf., Toulouse, France, pp.668–676, Sep.

  19. Burger, D., Austin, T. M. (1997). “The simplescalar toolset, Version 2.0,” Comp. Sciences Dept, UW, Tech. Rep., June.

  20. Zhang, Y., Parikh, D., Sankaranarayanan, K., Skadron, K., & Stan, M. (2003). HotLeakage: A temperature-Aware Model of Subthreshold and Gate Leakage for Architects. Charlottesville: University of Virginia.

    Google Scholar 

  21. Reinman, G., Jouppi, N. (1999). “An integrated cache timing and power model”, Technical report, Compaq Western Research Lab.

  22. Lee, C., Potkonjak, M., Mangione-Smith, W. H. (1997). “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems” International Symposium on Microarchitecture.

  23. Stobach, P. (1998). “A new technique in scene adaptive coding”, European Signal processing Conference (EUSIPCO).

  24. Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., Mendias, J. M. (2004). “An integrated hardware/software approach for run-time scratchpad management”, Proceedings of the 41st annual conference on Design automation, 238–243.

  25. Absar, J., Catthoor, F. (2006). “Analysis of scratch-pad and data-cache performance using statistical methods, Proceedings of the 2006 conference on Asia South Pacific design automation”, 820–825.

  26. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P. “Comparison of Cache and Scratch-Pad based Memory Systems with respect to Performance, Area and Energy Consumption”, Technical Report 762, University of Dortmun.

  27. Absar, J., and Catthoor, F. (2005). “Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access”. In proceedings of the Conference on Design Automation and Test in Europe (DATE), 1162–1167

  28. Kudriavtsev, A., and Kogge, P. SMT possibilities for decoupled architecture, Technical Committee on Computer Architecture (TCCA) Newsletter: Papers from MEmory access DEcoupling for superscalar and multiple issue Architectures (MEDEA-2000)

  29. Van Achteren, T., Lauwereins, R., Catthoor, F. (2000) “Systematic Data Reuse Exploration Methodology for Irregular Access Patterns”13th International Symposium on System Synthesis (ISSS), Madrid, Spain, Proceedings. IEEE Computer Society, pp.115–122, September

Download references

Acknowledgements

This work was supported by the project PENED 2003 No 03ΕD507, which is funded in 75% by the European Union- European Social fund and in 25% by the Greek state-Greek Secretariat for Research and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Porpodas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milidonis, A., Alachiotis, N., Porpodas, V. et al. Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy. J Sign Process Syst Sign Image Video Technol 59, 281–296 (2010). https://doi.org/10.1007/s11265-009-0393-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0393-9

Keywords

Navigation