Efficient programming paradigm for video streaming processing on TILE64 platform

Lin, Xuan-Yi; Lai, Kuan-Chou; Li, Kuan-Ching; Chung, Yeh-Ching

doi:10.1007/s11227-012-0867-6

Efficient programming paradigm for video streaming processing on TILE64 platform

Published: 24 January 2013

Volume 65, pages 823–847, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xuan-Yi Lin¹,
Kuan-Chou Lai²,
Kuan-Ching Li³ &
…
Yeh-Ching Chung¹

180 Accesses
Explore all metrics

Abstract

Advances at an unprecedented rate in computer hardware and networking technologies have made the many-core computing affordable and readily available in a matter of few years. Nonetheless, it incurs challenges to programmers to build scalable parallel software. Optimizations of parallel programs for a many-core platform are viewed as a multifaceted problem, where system and architectural factors should be taken into account. In this paper, we tackle this problem by implementing parallel programs with different available programming paradigms and evaluate application behaviors on TILE64 many-core platform. That is, we investigate a hybrid producer-write plus consumer-read shared memory programming paradigm for the implementation of master–worker video decoder and encoder in the referred many-core platform. Experimental results show that the proposed implementation has achieved competitive performance speedup, scaling well with the number of available cores and up to four times of performance improvement over other implementations on the decoding of sample 1080P video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Borkar S (2007) Thousand core chips: a technology perspective. In: Proceedings of the 44th design automation conf (DAC 07), pp 746–749. doi:10.1145/1278480.1278667
Chapter Google Scholar
Parkhurst J, Darringer J, Grundmann B (2006) From single core to multi-core: preparing for a new exponential. In: Proceedings of the IEEE/ACM int conf computer-aided design (ICCAD 06), pp 67–72. doi:10.1145/1233501.1233516
Google Scholar
Karam L, AlKamal I, Gatherer A, Frantz G, Anderson D, Evans B (2009) Trends in multicore DSP platforms. IEEE Signal Process Mag 26(6):38–49. doi:10.1109/MSP.2009.934113
Article Google Scholar
Sutter H (2005) The free lunch is over: a fundamental turn toward concurrency in software. Dr Dobb’s J 30(3):202–210
Google Scholar
Chen G, Li F, Son SW, Kandemir M (2008) Application mapping for chip multiprocessors. In: Proceedings of the 45th design automation conf (DAC 08), pp 620–625. doi:10.1145/1391469.1391628
Chapter Google Scholar
Tan G, Sun N, Gao GR (2007) A parallel dynamic programming algorithm on a multi-core architecture. In: Proceedings of the 19th ACM symp parallel algorithms and architectures (SPAA 07), vol 07, pp 135–144. doi:10.1145/1248377.1248399
Google Scholar
Bell S, Edwards B, Amann J, Conlin R, Joyce K, Leung V, MacKay J, Reif M, Liewei B, Brown J, Mattina M, Chyi-Chang M, Ramey C, Wentzlaff D, Anderson W, Berger E, Fairbanks N, Khan D, Montenegro F, Stickney J, Zook J (2008) TILE64 processor: a 64-core SoC with mesh interconnect. In: Proceedings of the IEEE intl solid-state circuits conf (ISSCC 08), pp 88–598. doi:10.1109/ISSCC.2008.4523070
Google Scholar
Chen S, Chen S, Gu H, Chen H, Yin Y, Chen X, Sun S, Liu S, Wang Y (2010) Mapping of H.264/AVC encoder on a hierarchical chip multicore DSP platform. In: Proceedings of the 12th IEEE int conf high performance computing and communications (HPCC 10), pp 465–470. doi:10.1109/HPCC.2010.82
Google Scholar
Boutellier J, Jaaskelainen P, Silven O (2007) Run-time scheduled hardware acceleration of MPEG-4 video decoding. In: Proceedings of the 2007 int symp system-on-chip, pp 1–4
Chapter Google Scholar
Yung NHC, Leung K-K (2001) Spatial and temporal data parallelization of the H.261 video coding algorithm. IEEE Trans Circuits Syst Video Technol 11(1):91–104
Article Google Scholar
Rodriguez-Fernandez D, Vilarino DL, Pardo XM (2009) A pixel-parallel moving object segmentation and tracking algorithm for video surveillance applications. In: Proceedings of the 6th int symp image and signal processing and analysis (ISPA 09), pp 614–619
Google Scholar
Berthold J, Dieterle M, Loogen R, Priebe S (2008) Hierarchical master–worker skeletons. In: Proceedings of the 10th int conf practical aspects of declarative languages (PADL 08). Lecture notes in computer science, pp 248–264
Chapter Google Scholar
Benoit A, Marchal L, Pineau JF, Robert Y, Vivien F (2010) Scheduling concurrent bag-of-tasks applications on heterogeneous platforms. IEEE Trans Comput 59(2):202–217. doi:10.1109/TC.2009.117
Article MathSciNet Google Scholar
Hoffmann H, Wentzlaff D, Agarwal A (2010) Remote store programming. In: Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers. Lecture notes in computer science, vol 5952. Springer, Berlin, pp 3–17. doi:10.1007/978-3-642-11515-8_3
Chapter Google Scholar
Awasthi M, Nellans DW, Sudan K, Balasubramonian R, Davis A (2010) Handling the problems and opportunities posed by multiple on-chip memory controllers. In: Proceedings of the 19th int conf parallel architectures and compilation techniques (PACT 10), pp 319–330. doi:10.1145/1854273.1854314
Chapter Google Scholar
Abts D, Jerger NDE, Kim J, Gibson D, Lipasti MH (2009) Achieving predictable performance through better memory controller placement in many-core CMPs. In: Proceedings of the 36th int symp computer architecture (ISCA 09), pp 451–461. doi:10.1145/1555754.1555810
Chapter Google Scholar
Lin X-Y, Huang C-Y, Yang P-M, Lung T-W, Tseng S-Y, Chung Y-C (2011) Parallelization of motion JPEG decoder on TILE64 many-core platform. In: Hsu C-H, Malyshkin V (eds) Methods and tools of parallel programming multicomputers. Lecture notes in computer science, vol 6083. Springer, Berlin, pp 59–68. doi:10.1007/978-3-642-14822-4_7
Chapter Google Scholar
Jackson JD, Hatcher PJ (2011) Efficient parallel execution of sequence similarity analysis via dynamic load balancing. In: Proceedings of the ISCA 3rd int conf bioinformatics and computational biology (BICoB 11), pp 219–224
Google Scholar
Goux JP, Kulkarni S, Linderoth J, Yoder M (2000) An enabling framework for master–worker applications on the computational grid. In: Proceedings of the 9th int symp high-performance distributed computing (HDPC 00), pp 43–50
Chapter Google Scholar
Fujimoto RM, Malik AW, Park A (2010) Parallel and distributed simulation in the cloud. SCS M&S Mag 1(3):1–10
Google Scholar
Rynge M, Callaghan S, Deelman E, Juve G, Mehta G, Vahi K, Maechling PJ (2012) Enabling large-scale scientific workflows on petascale resources using MPI master/worker. In: Proceedings of the 1st conf extreme science and engineering discovery environment (XSEDE 12), pp 1–8. doi:10.1145/2335755.2335846
Google Scholar
Blagojevic F, Nikolopoulos DS, Stamatakis A, Antonopoulos CD (2007) Dynamic multigrain parallelization on the cell broadband engine. In: Proceedings of the 12th ACM SIGPLAN symp principles and practice of parallel programming, pp 90–100. doi:10.1145/1229428.1229445
Google Scholar
Zheng G, Meneses E, Bhatelé A, Kalé LV (2010) Hierarchical load balancing for Charm++ applications on large supercomputers. In: Proceedings of the 39th int conf parallel processing workshops (ICPPW 10), pp 436–444. doi:10.1109/ICPPW.2010.65
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
Article Google Scholar
Giseok C, Jeongsoo Y, Jeonghoon C, Jongho N (2007) Design and implementation of a real-time video player on tiled-display system. In: Proceedings of the 7th IEEE int conf computer and information technology (CIT 07), pp 621–626
Google Scholar
Nunome T, Tasaka S (2004) Application-level QoS assessment of continuous media multicasting in a wireless ad hoc network. In: Proceedings of the 2004 IEEE int conf communications, pp 2047–2053
Google Scholar
Pereira R, Azambuja M, Breitman K, Endler M (2010) An architecture for distributed high performance video processing in the cloud. In: Proceedings of the 3rd IEEE int conf cloud computing (CLOUD 10), pp 482–489
Google Scholar
Ali U, Bilal M (2006) Video based parallel face recognition using Gabor filter on homogeneous distributed systems. In: Proceedings of the 2006 IEEE int conf engineering of intelligent systems, pp 1–5
Chapter Google Scholar
MJPEG Tools. http://mjpeg.sourceforge.net
Wang Z, Liang L, Yang G, Zhang X, Sun J, Zhao D, Gao W (2011) A novel macro-block group based AVS coding scheme for many-core processor. J Signal Process Syst 65(1):129–145. doi:10.1007/s11265-010-0543-0
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan, R.O.C.
Xuan-Yi Lin & Yeh-Ching Chung
Department of Computer Science, National Taichung University of Education, Taichung, 40306, Taiwan, R.O.C.
Kuan-Chou Lai
Department of Computer Science and Information Engineering, Providence University, Taichung, 43301, Taiwan, R.O.C.
Kuan-Ching Li

Authors

Xuan-Yi Lin
View author publications
You can also search for this author inPubMed Google Scholar
Kuan-Chou Lai
View author publications
You can also search for this author inPubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author inPubMed Google Scholar
Yeh-Ching Chung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yeh-Ching Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, XY., Lai, KC., Li, KC. et al. Efficient programming paradigm for video streaming processing on TILE64 platform. J Supercomput 65, 823–847 (2013). https://doi.org/10.1007/s11227-012-0867-6

Download citation

Published: 24 January 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s11227-012-0867-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient programming paradigm for video streaming processing on TILE64 platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Heterogeneous Scheduling Algorithms for Wavefront and Tile Parallelism in Video Coding

Parallel HEVC decoding with asymmetric mobile multicores

Accelerating video encoding using cluster computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Efficient programming paradigm for video streaming processing on TILE64 platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Heterogeneous Scheduling Algorithms for Wavefront and Tile Parallelism in Video Coding

Parallel HEVC decoding with asymmetric mobile multicores

Accelerating video encoding using cluster computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now