Abstract
When implementing a 3D image reconstruction algorithm on a DSP architecture, we find ourselves confronted with a large memory transfer overhead, reducing the possible speedup attainable on recent multi-media oriented architectures. This paper describes how the critical part of the algorithm is re-specified and aggressively transformed at the algorithm code level, to improve the data access locality of the multi-dimensional image signal, while preserving the input/output behaviour. Experiments show that a close to optimal reuse of the data in the foreground memory and registers is obtained, removing the data transfer and storage bottleneck and enabling real-time prototyping of the algorithm on a DSP architecture.
Similar content being viewed by others
References
Amarasinghe, S., Anderson, J., Lam, M., and Tseng, C.1995. The SUIF compiler for scalable parallel machines. Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing.
Anderson, J., Amarasinghe, S., and Lam, M. 1995. Data and computation transformations for multiprocessors. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp. 39–50.
Banerjee, P., Chandy, J., Gupta, M., Hodges, E., Holm, J., Lain, A., Palermo, D., Ramaswamy, S., and Su, E. 1995. The paradigm compiler for distributed-memory multicomputers. IEEE Computer Magazine 28(10): 37–47.
Banerjee, U. 1993. Loop Transformations for Restructuring Compilers: the Foundations. Kluwer, Boston.
Blake, A., McCowen, D., Lo, H., and Lindsey, P. 1993. Triconular active range-sensing. IEEE PAMI 15(5): 477–483.
Catthoor, F., Janssen, M., Nachtergaele, L., and Man, H. D. 1998a. System-level data-flow transformation exploration and power-area trade-offs demonstrated on video codecs. In M. Ibrahim and W. Wolf, editors, special issue on Systematic trade-off analysis in signal processing systems design. Journal of VLSI Signal Processing Boston: Kluwer. 18(1): 39–50.
Catthoor, F., Wuytack, S., Greef, E. D., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998b. Custom memory management methodology—exploration of memory organisation for embedded multimedia system design, No. ISBN 0–7923–8288–9. Boston: Kluwer Acad. Publ.
Fang, J. and Lu, M. 1993. An iteration partition approach for cache or local memory thrashing on parallel processing. IEEE Trans. on Computers C-42(5): 529–546.
Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing 5: 568–586.
Ghosh, S., Martonosi, M., and Malik, S. 1997. Cache miss equations: an analytical representation of cache misses. IEEE TC on Computer Architecture Newsletter Special issue on Interaction between Compilers and Computer Architectures. pp. 52–54.
Greef, E. D., Catthoor, F., and Man, H. D. 1998. Program transformation strategies for memory size and power reduction of pseudo-regular multimedia subsystems. Transactions on Circuits and Systems for Video Technology 8(6): 719–733.
Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., and Lam, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer Magazine 30(12): 84–89.
Kelly, W., and Pugh, W. 1992. Generating schedules and code within a unified reordering transformation framework. Technical report umiacs-tr–92–126, cs-tr-2995 Institute for Advanced Computer Studies Dept. of Computer Science, Univ. of Maryland, College Park, MD 20742.
Kolson, D., Nicolau, A., and Dutt, N. 1996. Elimination of redundant memory traffic in high-level synthesis. IEEE Trans. on Comp-aided Design 15(11): 1354–1363.
Li, W. and Pingali, K. 1992. A singular loop transformation framework based on non-singular matrices. Proc. 5th Annual Workshop on Languages and Compilers for Parallelism. New Haven, CN.
Maruyama, M., and Abe, S. 1993. Range sensing by projecting multiple slits with random cuts. IEEE PAMI 15(6): 647–650.
McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. on Programming Languages and Systems 18(4): 424–453.
McKinley, K., Hall, M., Harvey, T., Kennedy, K., McIntosh, N., Oldham, J., Paleczny, M., and Roth, G. 1993. Experiences using the ParaScope editor: an interactive parallel programming tool. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. San Diego, USA.
Passos, N. and Sha, E. 1996. Synchronous circuit optimization via multi-dimensional retiming. IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing CAS-43(7): 507–519.
Proesmans, M., Gool, L. V., and Oosterlinck, A. 1996a. Active acquisition of 3D shape for moving objects. Proceedings ICIP International Conference on Image Processing. Lausanne, Switserland.
Proesmans, M., Gool, L. V., and Oosterlinck, A. 1996b. One shot active 3D shape reconstruction. Proceedings 13th ICPR International Conference on Pattern Recognition: applications & robotic systems Vienna, Austria, IIIC: 336–340.
Truong, D., Bodin, F., and Seznec, A. 1997. Accurate data distribution into blocks may boost cache performance. IEEE TC on Computer Architecture Newsletter. Special issue on Interaction between Compilers and Computer Architectures pp. 55–57.
Van Achteren, T., Adé, M., Lauwereins, R., Proesmans, M., Gool, L. V., Bormans, J., and Catthoor, F. 1999. Transformations of a 3D image reconstruction algorithm for data transfer and storage optimisation. Proc. 10th IEEE International Workshop on Rapid System Prototyping. Clearwater, FL, U.S.A., pp. 81–86.
Verbauwhede, I., Catthoor, F., Vandewalle, J., and Man, H. D. 1989. Background memory management for the synthesis of algebraic algorithms on multi-processor DSP chips. Proc. VLSI'89, Int. Conf. on VLSI. Munich, Germany, pp. 209–218.
Vuylsteke, P., and Oosterlinck, A. 1990. Range image acquisition with a single binary-encoded light pattern. IEEE PAMI 12(2): 148–164.
Wolf, M., and Lam, M. 1991. A data locality optimizing algorithm. Proc. of the SIGPLAN'91 Conf. on Programming Language Design and Implementation. Toronto, ON, Canada, pp. 30–43.
Wolfe, M. 1990. Data dependence and program restructuring. J. of Supercomputing Kluwer (4): 321–344.
Wolfe, M. 1991. The Tiny loop restructuring tool. Proc. of Intnl. Conf. on Parallel Processing pp. II.46-II.53.
Wuytack, S., Diguet, J., Catthoor, F., and Man, H. D. 1998. Formalized methodology for data reuse exploration for low-power hierarchical memory mapping. IEEE Trans. on VLSI Systems 6(4): 529–537.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Achteren, T.V., Adé, M., Lauwereins, R. et al. Transformations of a 3D Image Reconstruction Algorithm for Data Transfer and Storage Optimisation. Design Automation for Embedded Systems 5, 313–327 (2000). https://doi.org/10.1023/A:1008958303888
Issue Date:
DOI: https://doi.org/10.1023/A:1008958303888