Skip to main content
Log in

A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Fine-grained accelerators have the potential to deliver significant benefits in various platforms for embedded signal processing. Due to the moderate complexity of their targeted operations, these accelerators must be managed with minimal run-time overhead. In this paper, we present a methodology for applying flow-shop scheduling techniques to make effective, low-overhead use of fine-grained DSP accelerators. We formulate the underlying scheduling approach in terms of general flow-shop scheduling concepts, and demonstrate our methodology concretely by applying it to MPEG-4 video decoding. We present quantitative experiments on a soft processor that runs on a field-programmable gate array, and provide insight on trends and trade-offs among different flow-shop scheduling approaches when applied to run-time management of fine-grained acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. French, S. (1982). Sequencing and Scheduling: an Introduction to the mathematics of the job-shop. Chichester: Horwood.

    MATH  Google Scholar 

  2. Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Curtis, P., et al. (2008). Efficient embedded computing. Computer, 41(7), 27–32. doi:10.1109/MC.2008.224.

    Article  Google Scholar 

  3. Silvén, O., & Jyrkkä, K. (2005). Observations on power-efficiency trends in mobile communication devices. In Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2005 Proceedings. Lecture Notes in Computer Science, 3553, 142–151.

    Article  Google Scholar 

  4. Lucarz, C., Mattavelli, M., Thomas-Kerr, J., Janneck, J. (2007). Reconfigurable media coding: a new specification model for multimedia coders. Proceeding of the IEEE 2007 Workshop on Signal Processing Systems, 481–486.

  5. Rintaluoma, T., Silvén, O., & Raekallio, J. (2006). Interface overheads in embedded multimedia software. Lecture Notes in Computer Science, 4017, 5–14. doi:10.1007/11796435_3.

    Article  Google Scholar 

  6. Boutellier, J., Bhattacharyya, S.S., Silven, O. (2007). Low-overhead run-time scheduling for fine-grained acceleration of signal processing systems. Proceedings of the 2007 IEEE Workshop on Signal Processing Systems, 457–462.

  7. Dang, P.P. (2006). An efficient VLSI architecture for H.264 subpixel interpolation coprocessor. International Conference on Consumer Electronics 2006, Digest of Technical Papers, 87–88.

  8. Chen, T.-H. (1999). A cost-effective 8 × 8 2-D IDCT core processor with folded architecture. IEEE Transactions on Consumer Electronics, 45(2), 333–339. doi:10.1109/30.793417.

    Article  Google Scholar 

  9. de Goede, G. (2005). Accelerating the XViD IDCT on DAMP, Master’s Thesis, Delft University of Technology, 2005.

  10. Ma, Z., Wong, C., Yang, P., Vounckx, J., & Catthoor, F. (2005). Mapping the MPEG-4 visual texture decoder. IEEE Signal Processing Magazine, 22(3), 65–74. doi:10.1109/MSP.2005.1425899.

    Article  Google Scholar 

  11. Wang, S.-H., Peng, W.-H., He, Y., Lin, G.-Y., Lin, C.-Y., Chang, S.-C., et al. (2005). A software-hardware co-implementation of MPEG-4 Advanced Video Coding (AVC) decoder with block level pipelining. Journal of VLSI Signal Processing, 41(1), 93–110. doi:10.1007/s11265-005-6253-3.

    Article  Google Scholar 

  12. Ling, N., & Wang, N.-T. (2003). A real-time video decoder for digital HDTV. Journal of VLSI Signal Processing, 33(3), 295–306. doi:10.1023/A:1022179914445.

    Article  Google Scholar 

  13. Cortes, L. A., Eles, P., & Peng, Z. (2005). Quasi-static scheduling for multiprocessor real-time systems with hard and soft tasks. Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2005, 422–428.

    Google Scholar 

  14. Gupta, J. N. D., & Stafford, E. F. (2006). Flowshop scheduling research after five decades. European Journal of Operational Research, 169(3), 699–711. doi:10.1016/j.ejor.2005.02.001.

    Article  MATH  Google Scholar 

  15. Sriram, S., & Bhattacharyya, S. S. (2000). Embedded multiprocessors: Scheduling and synchronization. Basel: Dekker.

    Google Scholar 

  16. Kwok, Y.-K., & Ahmad, I. (1999). Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3), 381–422. doi:10.1006/jpdc.1999.1578.

    Article  MATH  Google Scholar 

  17. Stolberg, H.-J., Berekovic, M., Pirsch, P., Runge, H. (2001). The MPEG-4 advanced simple profile—a complexity study. Proc. Workshop Exhibition MPEG-4, 33–36.

  18. Bagchi, T. P., Gupta, J. N. D., & Sriskandarajah, C. (2006). A review of TSP based approaches for flowshop scheduling. European Journal of Operational Research, 169(3), 816–854. doi:10.1016/j.ejor.2004.06.040.

    Article  MATH  MathSciNet  Google Scholar 

  19. Kis, T., & Pesch, E. (2005). A review of exact solution methods for the non-preemptive multiprocessor flowshop problem. European Journal of Operational Research, 164(3), 592–608. doi:10.1016/j.ejor.2003.12.026.

    Article  MATH  MathSciNet  Google Scholar 

  20. Framinan, J. M., Gupta, J. N. D., & Leisten, R. (2004). A review and classification of heuristics for permutation flow-shop scheduling with makespan objective. Journal of the Operational Research Society, 55(12), 1243–1255. doi:10.1057/palgrave.jors.2601784.

    Article  MATH  Google Scholar 

  21. Palmer, D. S. (1965). Sequencing jobs through a multistage process in the minimum total time: a quick method of obtaining a near optimum. Operations Research Quarterly, 16(1), 101–107.

    Article  MathSciNet  Google Scholar 

  22. Carpaneto, G., & Toth, P. (1980). Some new branching and bounding criteria for the asymmetric traveling salesman problem. Management Science, 26(7), 736–743. doi:10.1287/mnsc.26.7.736.

    Article  MATH  MathSciNet  Google Scholar 

  23. Carpaneto, G., Dell Amico, M., & Toth, P. (1995). Exact solution of large-scale, asymmetric traveling salesman problems. ACM Transactions on Mathematical Software, 21, 394–409. doi:10.1145/212066.212081.

    Article  MATH  MathSciNet  Google Scholar 

  24. Wismer, D. A. (1972). Solution of the flowshop scheduling problem with no intermediate queues. Operations Research, 20, 689–697. doi:10.1287/opre.20.3.689.

    Article  MATH  Google Scholar 

  25. Schumacher, P., Denolf, K., Chilira-Rus, A., Turney, R., Fedele, N., Vissers, K., et al. (2005). A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of the IEEE International Conference on Image Processing 2005, II, 886–889.

    Google Scholar 

Download references

Acknowledgments

This work has been partially funded by the Nokia Foundation, the US National Science Foundation (Grant number 0325119), Finnish Graduate School for Electronics, Telecommunication and Automation, and Tekes projects ECUUS and NECST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jani Boutellier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutellier, J., Bhattacharyya, S.S. & Silvén, O. A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems. J Sign Process Syst 60, 333–343 (2010). https://doi.org/10.1007/s11265-009-0366-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0366-z

Keywords

Navigation