A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems

Boutellier, Jani; Bhattacharyya, Shuvra S.; Silvén, Olli

doi:10.1007/s11265-009-0366-z

A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems

Published: 24 April 2009

Volume 60, pages 333–343, (2010)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Jani Boutellier¹,
Shuvra S. Bhattacharyya² &
Olli Silvén³

160 Accesses
3 Citations
Explore all metrics

Abstract

Fine-grained accelerators have the potential to deliver significant benefits in various platforms for embedded signal processing. Due to the moderate complexity of their targeted operations, these accelerators must be managed with minimal run-time overhead. In this paper, we present a methodology for applying flow-shop scheduling techniques to make effective, low-overhead use of fine-grained DSP accelerators. We formulate the underlying scheduling approach in terms of general flow-shop scheduling concepts, and demonstrate our methodology concretely by applying it to MPEG-4 video decoding. We present quantitative experiments on a soft processor that runs on a field-programmable gate array, and provide insight on trends and trade-offs among different flow-shop scheduling approaches when applied to run-time management of fine-grained acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dataflow-Based, Cross-Platform Design Flow for DSP Applications

FPGA-Based DSP

Accelerator Design with High-Level Synthesis

References

French, S. (1982). Sequencing and Scheduling: an Introduction to the mathematics of the job-shop. Chichester: Horwood.
MATH Google Scholar
Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Curtis, P., et al. (2008). Efficient embedded computing. Computer, 41(7), 27–32. doi:10.1109/MC.2008.224.
Article Google Scholar
Silvén, O., & Jyrkkä, K. (2005). Observations on power-efficiency trends in mobile communication devices. In Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2005 Proceedings. Lecture Notes in Computer Science, 3553, 142–151.
Article Google Scholar
Lucarz, C., Mattavelli, M., Thomas-Kerr, J., Janneck, J. (2007). Reconfigurable media coding: a new specification model for multimedia coders. Proceeding of the IEEE 2007 Workshop on Signal Processing Systems, 481–486.
Rintaluoma, T., Silvén, O., & Raekallio, J. (2006). Interface overheads in embedded multimedia software. Lecture Notes in Computer Science, 4017, 5–14. doi:10.1007/11796435_3.
Article Google Scholar
Boutellier, J., Bhattacharyya, S.S., Silven, O. (2007). Low-overhead run-time scheduling for fine-grained acceleration of signal processing systems. Proceedings of the 2007 IEEE Workshop on Signal Processing Systems, 457–462.
Dang, P.P. (2006). An efficient VLSI architecture for H.264 subpixel interpolation coprocessor. International Conference on Consumer Electronics 2006, Digest of Technical Papers, 87–88.
Chen, T.-H. (1999). A cost-effective 8 × 8 2-D IDCT core processor with folded architecture. IEEE Transactions on Consumer Electronics, 45(2), 333–339. doi:10.1109/30.793417.
Article Google Scholar
de Goede, G. (2005). Accelerating the XViD IDCT on DAMP, Master’s Thesis, Delft University of Technology, 2005.
Ma, Z., Wong, C., Yang, P., Vounckx, J., & Catthoor, F. (2005). Mapping the MPEG-4 visual texture decoder. IEEE Signal Processing Magazine, 22(3), 65–74. doi:10.1109/MSP.2005.1425899.
Article Google Scholar
Wang, S.-H., Peng, W.-H., He, Y., Lin, G.-Y., Lin, C.-Y., Chang, S.-C., et al. (2005). A software-hardware co-implementation of MPEG-4 Advanced Video Coding (AVC) decoder with block level pipelining. Journal of VLSI Signal Processing, 41(1), 93–110. doi:10.1007/s11265-005-6253-3.
Article Google Scholar
Ling, N., & Wang, N.-T. (2003). A real-time video decoder for digital HDTV. Journal of VLSI Signal Processing, 33(3), 295–306. doi:10.1023/A:1022179914445.
Article Google Scholar
Cortes, L. A., Eles, P., & Peng, Z. (2005). Quasi-static scheduling for multiprocessor real-time systems with hard and soft tasks. Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2005, 422–428.
Google Scholar
Gupta, J. N. D., & Stafford, E. F. (2006). Flowshop scheduling research after five decades. European Journal of Operational Research, 169(3), 699–711. doi:10.1016/j.ejor.2005.02.001.
Article MATH Google Scholar
Sriram, S., & Bhattacharyya, S. S. (2000). Embedded multiprocessors: Scheduling and synchronization. Basel: Dekker.
Google Scholar
Kwok, Y.-K., & Ahmad, I. (1999). Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3), 381–422. doi:10.1006/jpdc.1999.1578.
Article MATH Google Scholar
Stolberg, H.-J., Berekovic, M., Pirsch, P., Runge, H. (2001). The MPEG-4 advanced simple profile—a complexity study. Proc. Workshop Exhibition MPEG-4, 33–36.
Bagchi, T. P., Gupta, J. N. D., & Sriskandarajah, C. (2006). A review of TSP based approaches for flowshop scheduling. European Journal of Operational Research, 169(3), 816–854. doi:10.1016/j.ejor.2004.06.040.
Article MATH MathSciNet Google Scholar
Kis, T., & Pesch, E. (2005). A review of exact solution methods for the non-preemptive multiprocessor flowshop problem. European Journal of Operational Research, 164(3), 592–608. doi:10.1016/j.ejor.2003.12.026.
Article MATH MathSciNet Google Scholar
Framinan, J. M., Gupta, J. N. D., & Leisten, R. (2004). A review and classification of heuristics for permutation flow-shop scheduling with makespan objective. Journal of the Operational Research Society, 55(12), 1243–1255. doi:10.1057/palgrave.jors.2601784.
Article MATH Google Scholar
Palmer, D. S. (1965). Sequencing jobs through a multistage process in the minimum total time: a quick method of obtaining a near optimum. Operations Research Quarterly, 16(1), 101–107.
Article MathSciNet Google Scholar
Carpaneto, G., & Toth, P. (1980). Some new branching and bounding criteria for the asymmetric traveling salesman problem. Management Science, 26(7), 736–743. doi:10.1287/mnsc.26.7.736.
Article MATH MathSciNet Google Scholar
Carpaneto, G., Dell Amico, M., & Toth, P. (1995). Exact solution of large-scale, asymmetric traveling salesman problems. ACM Transactions on Mathematical Software, 21, 394–409. doi:10.1145/212066.212081.
Article MATH MathSciNet Google Scholar
Wismer, D. A. (1972). Solution of the flowshop scheduling problem with no intermediate queues. Operations Research, 20, 689–697. doi:10.1287/opre.20.3.689.
Article MATH Google Scholar
Schumacher, P., Denolf, K., Chilira-Rus, A., Turney, R., Fedele, N., Vissers, K., et al. (2005). A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of the IEEE International Conference on Image Processing 2005, II, 886–889.
Google Scholar

Download references

Acknowledgments

This work has been partially funded by the Nokia Foundation, the US National Science Foundation (Grant number 0325119), Finnish Graduate School for Electronics, Telecommunication and Automation, and Tekes projects ECUUS and NECST.

Author information

Authors and Affiliations

Machine Vision Group, University of Oulu, P.O. Box 4500, Oulu, 90014, Finland
Jani Boutellier
Electrical and Computer Engineering Department, University of Maryland, College Park, MD, USA
Shuvra S. Bhattacharyya
Machine Vision Group, University of Oulu, P.O. Box 4500, Oulu, 90014, Finland
Olli Silvén

Authors

Jani Boutellier
View author publications
You can also search for this author in PubMed Google Scholar
Shuvra S. Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Olli Silvén
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jani Boutellier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutellier, J., Bhattacharyya, S.S. & Silvén, O. A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems. J Sign Process Syst 60, 333–343 (2010). https://doi.org/10.1007/s11265-009-0366-z

Download citation

Received: 31 October 2008
Revised: 03 March 2009
Accepted: 30 March 2009
Published: 24 April 2009
Issue Date: September 2010
DOI: https://doi.org/10.1007/s11265-009-0366-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems

Abstract

Access this article

Similar content being viewed by others

Dataflow-Based, Cross-Platform Design Flow for DSP Applications

FPGA-Based DSP

Accelerator Design with High-Level Synthesis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems

Abstract

Access this article

Similar content being viewed by others

Dataflow-Based, Cross-Platform Design Flow for DSP Applications

FPGA-Based DSP

Accelerator Design with High-Level Synthesis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation