Abstract
Fine-grained accelerators have the potential to deliver significant benefits in various platforms for embedded signal processing. Due to the moderate complexity of their targeted operations, these accelerators must be managed with minimal run-time overhead. In this paper, we present a methodology for applying flow-shop scheduling techniques to make effective, low-overhead use of fine-grained DSP accelerators. We formulate the underlying scheduling approach in terms of general flow-shop scheduling concepts, and demonstrate our methodology concretely by applying it to MPEG-4 video decoding. We present quantitative experiments on a soft processor that runs on a field-programmable gate array, and provide insight on trends and trade-offs among different flow-shop scheduling approaches when applied to run-time management of fine-grained acceleration.
Similar content being viewed by others
References
French, S. (1982). Sequencing and Scheduling: an Introduction to the mathematics of the job-shop. Chichester: Horwood.
Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Curtis, P., et al. (2008). Efficient embedded computing. Computer, 41(7), 27–32. doi:10.1109/MC.2008.224.
Silvén, O., & Jyrkkä, K. (2005). Observations on power-efficiency trends in mobile communication devices. In Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2005 Proceedings. Lecture Notes in Computer Science, 3553, 142–151.
Lucarz, C., Mattavelli, M., Thomas-Kerr, J., Janneck, J. (2007). Reconfigurable media coding: a new specification model for multimedia coders. Proceeding of the IEEE 2007 Workshop on Signal Processing Systems, 481–486.
Rintaluoma, T., Silvén, O., & Raekallio, J. (2006). Interface overheads in embedded multimedia software. Lecture Notes in Computer Science, 4017, 5–14. doi:10.1007/11796435_3.
Boutellier, J., Bhattacharyya, S.S., Silven, O. (2007). Low-overhead run-time scheduling for fine-grained acceleration of signal processing systems. Proceedings of the 2007 IEEE Workshop on Signal Processing Systems, 457–462.
Dang, P.P. (2006). An efficient VLSI architecture for H.264 subpixel interpolation coprocessor. International Conference on Consumer Electronics 2006, Digest of Technical Papers, 87–88.
Chen, T.-H. (1999). A cost-effective 8 × 8 2-D IDCT core processor with folded architecture. IEEE Transactions on Consumer Electronics, 45(2), 333–339. doi:10.1109/30.793417.
de Goede, G. (2005). Accelerating the XViD IDCT on DAMP, Master’s Thesis, Delft University of Technology, 2005.
Ma, Z., Wong, C., Yang, P., Vounckx, J., & Catthoor, F. (2005). Mapping the MPEG-4 visual texture decoder. IEEE Signal Processing Magazine, 22(3), 65–74. doi:10.1109/MSP.2005.1425899.
Wang, S.-H., Peng, W.-H., He, Y., Lin, G.-Y., Lin, C.-Y., Chang, S.-C., et al. (2005). A software-hardware co-implementation of MPEG-4 Advanced Video Coding (AVC) decoder with block level pipelining. Journal of VLSI Signal Processing, 41(1), 93–110. doi:10.1007/s11265-005-6253-3.
Ling, N., & Wang, N.-T. (2003). A real-time video decoder for digital HDTV. Journal of VLSI Signal Processing, 33(3), 295–306. doi:10.1023/A:1022179914445.
Cortes, L. A., Eles, P., & Peng, Z. (2005). Quasi-static scheduling for multiprocessor real-time systems with hard and soft tasks. Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2005, 422–428.
Gupta, J. N. D., & Stafford, E. F. (2006). Flowshop scheduling research after five decades. European Journal of Operational Research, 169(3), 699–711. doi:10.1016/j.ejor.2005.02.001.
Sriram, S., & Bhattacharyya, S. S. (2000). Embedded multiprocessors: Scheduling and synchronization. Basel: Dekker.
Kwok, Y.-K., & Ahmad, I. (1999). Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59(3), 381–422. doi:10.1006/jpdc.1999.1578.
Stolberg, H.-J., Berekovic, M., Pirsch, P., Runge, H. (2001). The MPEG-4 advanced simple profile—a complexity study. Proc. Workshop Exhibition MPEG-4, 33–36.
Bagchi, T. P., Gupta, J. N. D., & Sriskandarajah, C. (2006). A review of TSP based approaches for flowshop scheduling. European Journal of Operational Research, 169(3), 816–854. doi:10.1016/j.ejor.2004.06.040.
Kis, T., & Pesch, E. (2005). A review of exact solution methods for the non-preemptive multiprocessor flowshop problem. European Journal of Operational Research, 164(3), 592–608. doi:10.1016/j.ejor.2003.12.026.
Framinan, J. M., Gupta, J. N. D., & Leisten, R. (2004). A review and classification of heuristics for permutation flow-shop scheduling with makespan objective. Journal of the Operational Research Society, 55(12), 1243–1255. doi:10.1057/palgrave.jors.2601784.
Palmer, D. S. (1965). Sequencing jobs through a multistage process in the minimum total time: a quick method of obtaining a near optimum. Operations Research Quarterly, 16(1), 101–107.
Carpaneto, G., & Toth, P. (1980). Some new branching and bounding criteria for the asymmetric traveling salesman problem. Management Science, 26(7), 736–743. doi:10.1287/mnsc.26.7.736.
Carpaneto, G., Dell Amico, M., & Toth, P. (1995). Exact solution of large-scale, asymmetric traveling salesman problems. ACM Transactions on Mathematical Software, 21, 394–409. doi:10.1145/212066.212081.
Wismer, D. A. (1972). Solution of the flowshop scheduling problem with no intermediate queues. Operations Research, 20, 689–697. doi:10.1287/opre.20.3.689.
Schumacher, P., Denolf, K., Chilira-Rus, A., Turney, R., Fedele, N., Vissers, K., et al. (2005). A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of the IEEE International Conference on Image Processing 2005, II, 886–889.
Acknowledgments
This work has been partially funded by the Nokia Foundation, the US National Science Foundation (Grant number 0325119), Finnish Graduate School for Electronics, Telecommunication and Automation, and Tekes projects ECUUS and NECST.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boutellier, J., Bhattacharyya, S.S. & Silvén, O. A Low-overhead Scheduling Methodology for Fine-grained Acceleration of Signal Processing Systems. J Sign Process Syst 60, 333–343 (2010). https://doi.org/10.1007/s11265-009-0366-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0366-z