Address Generation Optimization for Embedded High-Performance Processors: A Survey

Talavera, Guillermo; Jayapala, Murali; Carrabina, Jordi; Catthoor, Francky

doi:10.1007/s11265-008-0165-y

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Published: 31 May 2008

Volume 53, pages 271–284, (2008)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Guillermo Talavera¹,
Murali Jayapala²,
Jordi Carrabina¹ &
…
Francky Catthoor²

336 Accesses
23 Citations
Explore all metrics

Abstract

Nowadays embedded systems are growing at an impressive rate and provide more and more sophisticated applications characterized by having a complex array index manipulation and a large number of data accesses. Those applications require high performance specific computation that general purpose processors can not deliver at a reasonable energy consumption. Very long instruction word architectures seem a good solution providing enough computational performance at low power with the required programmability to speed up the time to market. Those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time. With the density of transistors doubling each 18 months, more and more sophisticated architectures with a high number of computational resources running in parallel are emerging. With this increasing parallel computation, the access to data is becoming the main bottleneck that limits the available parallelism. To alleviate this problem, in current embedded architectures, a special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data: the address generator unit, which comes in many flavors. Future architectures will have to deal with enormous memory bandwidth in distributed memories and the development of address generators units will be crucial for effective next generation of embedded processors where global trade-offs between reaction-time, bandwidth, energy and area must be achieved. This paper provides a survey of methods and techniques that optimize the address generation process for embedded systems, explaining current research trends and needs for future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of Address Generation on Multimedia Embedded VLIW Processors

Memory Partitioning in the Limit

Article 26 October 2015

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Article 13 May 2019

References

Turley, J. (1999). Embedded processors by the numbers. Embedded Systems Programming, 12(5), 13–14.
Google Scholar
Fisher, J. A., Faraboschi, P., & Young, C. (2004). Embedded computing: A VLIW approach to architecture, compilers and tools. Morgan Kaufmann.
Kuhn, P. (2004). Algorithms, complexity analysis and VLSI architectures for MPEG-4 estimation. Kluwer.
Panda, P. R., Nicolau, A., & Dutt, N. (1998). Memory issues in embedded systems-on-chip: Optimizations and exploration. Norwell, MA, USA: Kluwer.
Google Scholar
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., & Marwedel, P. (2002). Scratchpad memory: Design alterna tive for cache on-chip memory in embedded systems. In CODES ’02: Proceedings of the tenth international symposium on hardware/software codesign (pp. 73–78). New York, NY, USA: ACM Press.
Google Scholar
Wuytack, S., Catthoor, F., Nachtergaele, L., & De Man, H. (1996). Power exploration for data dominated video applications. In ISLPED ’96: Proceedings of the 1996 international symposium on low power electronics and design (pp. 359–364). Piscataway, NJ, USA: IEEE Press.
Google Scholar
Moolenaar, D., Nachtergaele, L., Catthoor, F., & De Man, H. (1997). System-level power exploration for MPEG-2 decoder on embedded cores: A systematic approach. IEEE workshop on signal processing systems (SIPS97) (pp. 395–404) (November). Leicester, UK
Kozyrakis, C., & Patterson, D. (2003). Overcoming the limitations of conventional vector processors. In ISCA ’03: Proceedings of the 30th annual international symposium on computer architecture (pp. 399–409). New York, NY, USA: ACM Press.
Google Scholar
Kozyrakis, C. E., & Patterson, D. A. (2003). Scalable vector processors for embedded systems. IEEE Micro, 23(6), 36–45.
Article Google Scholar
Karim, F., Mellan, A., Nguyen, A., Aydonat, U., & Abdelrahman, T. (2004). A multi-level computing architecture for embedded multimedia applications. In Proceedings of the IEEE micro (pp. 55–66).
Fritts, J., Wu, Z., & Wolf, W. (1999). Parallel media processors for the billion transistor era. In Proceedings of the international conference on parallel processing.
Semiconductor Industry Association (2005). International technology roadmap for semiconductors: Design. http://www.itrs.net/links/2005itrs/home2005.htm.
Zarrineh, K., & Upadhyaya, S. J. (1999). A new framework for automatic generation, insertion and verification of memory built-in self test units. In Proceedings of the 17th IEEE VLSI test symposium (pp. 391–396).
Dreibelbis, J., Barth, J., Kalter, H., & Kho, R. (1998). Processor-based built-in self-test for embedded dram. IEEE Journal of Solid-State Circuits, 33(11), 1731–1740, November.
Article Google Scholar
Leupers, R. (2000). Code generation for embedded processors. In ISSS ’00: Proceedings of the 13th international symposium on system synthesis (pp. 173–178). Washington, DC, USA: IEEE Computer Society.
Chapter Google Scholar
Palkovic, M., Brockmeyer, E., Vanbroekhoven, P., Corporaal, H., & Catthoor, F. (2005). Systematic pre processing of data dependent constructs for embedded systems. In Proceedings of PATMOS (pp. 89–98).
Palkovic, M., Corporaal, H., & Catthoor, F. (2005). Global memory optimisation for embedded systems allowed by code duplication. In SCOPES ’05: Proceedings of the 2005 workshop on software and compilers for embedded systems (pp. 72–79). New York, NY, USA: ACM Press.
Chapter Google Scholar
Gheorghita, S. V., Stuijk, S., Basten, T., & Corporaal, H. (2005). Automatic scenario detection for improved wcet estimation. In DAC ’05: Proceedings of the 42nd annual conference on design automation (pp. 101–104). New York, NY, USA: ACM Press.
Chapter Google Scholar
Araujo, G., Ottoni, G., & Cintra, M. (2002). Global array reference allocation. ACM Transactions on Design Automation of Electronic Systems, 7(2), 336–357.
Article Google Scholar
Philips PDSL (2004). http://www.coolfluxdsp.com. CF6 CoolFlux DSP.
TI Inc. (2006). TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (Rev. C). http://www.ti.com/.
Hennessy, J. L., & Patterson, D. A. (2006). Computer architecture: A quantitative approach (4th ed.). Morgan Kauffman.
Catthoor, F. (2002). Data access and storage management for embedded programable processors. Kluwer.
Vanhoof, J., Bolsens, I., Van Rompaey, K., Goossens, G., & De Man, H. (1993). High-level synthesis for real-time digital signal processing. Norwell, MA, USA: Kluwer.
MATH Google Scholar
Grant, D., Denyer, P. B., & Finlay, I. (1989). Synthesis of address generators. In ICCAD-98: IEEE international conference on computer-aided design (pp. 116–119).
Miranda, M., Catthoor, F., & De Man, H. (1994). Address equation multiplexing for realtime signal processing applications. In VLSI signal processing VII (pp. 188–197). New York: La Jolla California.
Google Scholar
Miranda, M., Kaspar, M., Catthoor, F., & de Man, H. (1997). Architectural exploration and optimization for counter based hardware address generation. In EDTC ’97: Proceedings of the 1997 European conference on design and test (p. 293). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Miranda, M. A., Catthoor, F., Janssen, M., & De Man, H. J. (1998). High-level address optimization and synthesis techniques for data-transfer-intensive applications. IEEE Transactions on Very Large Scale Integration Systems, 6(4), 677–686.
Article Google Scholar
Miranda, M., Catthoor, F., Janssen, M., & de Man. H. (1996). ADOPT: Efficient hardware address generation in distributed memory architectures. In 9th international symposium on system synthesis (ISSS) (p. 20).
Schmit, H., & Thomas, D. E. (1998). Address generation for memories containing multiple arrays. In IEEETCAD: IEEE transactions on computer-aided design of integrated circuits and systems (Vol. 17).
Hettiaratchi, S., Cheung, P., & Clarke, T. (2002). Performance-area trade-off of address generators for address decoder-decoupled memory. In DATE ’02: Proceedings of the conference on design, automation and test in Europe (p. 902). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Grant, D. M., & Denyer, P. B. (1991). Address generation for array access based on modulus m couters. In EDAC ’91: In proceedings of the 2nd ACM/IEEE European conference on design automation (EDAC) (pp. 118–123).
Lippens, P., Meerbergan, J. V., der Werf, A. V., & Verhaegh, W. (1991). PHIDEO: A silicon compiler for high speed algorithms. In In proceedings of the European conference on design automation (pp. 436–441).
Grant, D. M., Meerbergen, J. V., & Lippens, P. (1994). Optimization of address generator hardware. In DATE ’94: In proceedings of the 5th ACM/IEEE European design and test conference (pp. 325–329).
Mathew, B., & Davis, A. (2004). A loop accelerator for low power embedded vliw processors. In Proc of CODES and ISSS. Stockholm, Sweden, September.
Muchnick, S. S. (1997). Advanced compiler design and implementation. San Francisco, CA, USA: Morgan Kaufmann.
Google Scholar
Kennedy, K., & Allen, J. R. (2002). Optimizing compilers for modern architectures: A dependence-based approach. San Francisco, CA, USA: Morgan Kaufmann.
Google Scholar
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, techniques, and tools (2nd ed.). Boston, MA, USA: Addison Wesley.
Google Scholar
Aho, A. V., Sethi, R., & Ullman, J. D. (1986). Compilers: Principles, techniques, and tools. Boston, MA, USA: Addison Wesley.
Google Scholar
Liem, C., Paulin, P., & Jerraya, A. (1996). Address calculation for retargetable compilation and exploration of instruction-set architectures. In DAC ’96: Proceedings of the 33rd annual conference on design automation (pp. 597–600). New York, NY, USA: ACM Press.
Chapter Google Scholar
Liem, C., Paulin, P., & Jerraya, A. (1997). Compilation methods for the address calculation units of embedded processor systems. In In proceedings of the design automation for embedded systems (pp. 61–77). The Netherlands: Springer.
Google Scholar
Cheng, W.-K., & Lin, Y.-L. (1998). Addressing optimi zation for loop execution targeting dsp with auto-increment/decrement architecture. In ISSS ’98: Proceedings of the 11th international symposium on system synthesis (pp. 15–20). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Leupers, R. (2000). Code optimization techniques for embedded processors methods, algorithms, and tools. Kluwer.
Ramanujam, J., Krishnamurthy, S., Hong, J., & Kandemir, M. (2002). Address code and arithmetic optimizations for embedded systems. In ASP-DAC ’02: Proceedings of the 2002 conference on Asia South Pacific design automation/VLSI design (p. 619). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Leupers, R., & Marwedel, P. (1996). Algorithms for address assignment in DSP code generation. In ICCAD (pp. 109–112).
Sudarsanam, A., Liao, S., & Devadas, S. (1997). Analysis and evaluation of address arithmetic capabilities in custom dsp architectures. In DAC ’97: Proceedings of the 34th annual conference on design automation (pp. 287–292). New York, NY, USA: ACM Press.
Chapter Google Scholar
Wess, B. (1999). Minimization of data access computation overhead in dsp programs. In In proceedings of design automation for embedded systems (pp. 167–185).
Leupers, R., & David, F. (1998). A uniform optimization technique for offset assignment problems. In ISSS ’98: Proceedings of the 11th international symposium on system synthesis (pp. 3–8). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Basu, A., Leupers, R., & Marwedel, P. (1998). Register-constrained address computation in DSP programs. In DATE ’98: Proceedings of the conference on design, automation and test in Europe (pp. 929–930). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Gupta, S., Miranda, M., Catthoor, F., & Gupta, R. (2000). Analysis of high-level address code transformations for programmable processors. In DATE ’00: Proceedings of the conference on design, automation and test in Europe (pp. 9–13). New York, NY, USA: ACM Press.
Chapter Google Scholar
Ghez, C., Miranda, M., Vandecappelle, A., Catthoor, F., & Verkest, D. (2000). Systematic high-level address code transformations for piece-wise linear indexing: Illustration on a medical imaging algorithm. In Proceedings of the IEEE workshop on signal processing systems (pp. 623–632). IEEE Press.
Catthoor, F., Danckaert, K., Kulkarni, C., & Omnes, T. (2001). Programmable digital signal processors: Architecture, programming, and applications. New York, USA: Marcel Dekker.
Google Scholar
Gonzalez, R., & Horowitz, M. (1996). Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31, 1277–1284.
Article Google Scholar
Palkovic, M., Miranda, M., Catthoor, F., & Verkest, D. (2001). System design automation—Fundamentals, principles, methods, examples. Chapter high level condition expression transformations for desing exploration (pp. 56–64). Boston, USA: Kluwer, March.
Google Scholar
Palkovic, M., Miranda, M., Denolf, K., Vos, P., & Catthoor, F. (2002). Systematic address and control code transformations for performance optimisation of a MPEG-4 video decoder. In ASP-DAC ’02: Proceedings of the 2002 conference on Asia South Pacific design automation/VLSI design (p. 547). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Palkovic, M., Miranda, M., & Catthoor, F. (2002). Systematic power-performance trade-off in MPEG-4 by means of selective function inlining steered by address optimization opportunities. In DATE ’02: Proceedings of the conference on design, automation and test in Europe (p. 1072). Washington, DC, USA: IEEE Computer Society.
Chapter Google Scholar
Falk, H., & Marwedel, P. (2003). Control flow driven splitting of loop nests at the source code level. In DATE ’03: Proceedings of the conference on design, automation and test in Europe (pp. 410–415). Washington, DC, USA: IEEE Computer Society.
Google Scholar
Falk, H., & Verma, M. (2004). Combined data partitioning and loop nest splitting for energy consumption minimization. In SCOPES’04: Proceedings of the 8th workshop on software and compilers for embedded systems, September.
Falk, H. (2005). Control flow driven code hoisting at the source code level. In ODES’05: Proceedings of the 3rd work shop on optimizations for DSP and embedded systems, March.
Falk, H., & Marwedel, P. (2004). Source code optimization techniques for data flow dominated embedded software. Springer.
Flynn, M. J., Hung, P., & Rudd, K. W. (1999). Deep-submicron microprocessor design issues. IEEE MICRO, 19(4), 11–22, July–August.
Article Google Scholar
DeMan, H. (2005). Ambient intelligence: Giga-scale dreams and nano-scale realities. In Proc of ISSCC, keynote speech, February.
Jacome, M. F., & de Veciana, G. (2000). Design challenges for new application-specific processors. IEEE Design and Test, 17(2), 40–50.
Article Google Scholar
CSEM (2006). Low-power digital signal processing (MACGIC DSP). http://www.macgic.com.
Arm, C., Masgonty, J.-M., Morgan, M., Piguet, C., Pfister, P.-D., Rampogna, F., et al. (2006). Low-power quad-MAC 170 µW/MHz 1.0 V MACGIC DSP core. In ESSCIRC’06: Proceedings of the 32st European solid-state circuits conference.
Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., et al. (2001). Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems, 6(2), 149–206.
Article Google Scholar
Mathew, S., Anders, M., Krishnamurthy, R. K., & Borkar, S. (2003). A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core. IEEE Journal of Solid-State Circuits, 38(5), 126–127, May.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universitat Autonoma de Barcelona, Bellaterra, Spain
Guillermo Talavera & Jordi Carrabina
Inter-university Micro-Electronics Center (IMEC), Heverlee, Belgium
Murali Jayapala & Francky Catthoor

Authors

Guillermo Talavera
View author publications
You can also search for this author in PubMed Google Scholar
Murali Jayapala
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Carrabina
View author publications
You can also search for this author in PubMed Google Scholar
Francky Catthoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillermo Talavera.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Talavera, G., Jayapala, M., Carrabina, J. et al. Address Generation Optimization for Embedded High-Performance Processors: A Survey. J Sign Process Syst Sign Image Video Technol 53, 271–284 (2008). https://doi.org/10.1007/s11265-008-0165-y

Download citation

Received: 08 February 2007
Revised: 08 February 2007
Accepted: 07 February 2008
Published: 31 May 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11265-008-0165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Abstract

Access this article

Similar content being viewed by others

Impact of Address Generation on Multimedia Embedded VLIW Processors

Memory Partitioning in the Limit

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Abstract

Access this article

Similar content being viewed by others

Impact of Address Generation on Multimedia Embedded VLIW Processors

Memory Partitioning in the Limit

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation