Skip to main content

Advertisement

Log in

Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper studies how to parallelize the emerging media mining workloads on existing small-scale multi-core processors and future large-scale platforms. Media mining is an emerging technology to extract meaningful knowledge from large amounts of multimedia data, aiming at helping end users search, browse, and manage multimedia data. Many of the media mining applications are very complicated and require a huge amount of computing power. The advent of multi-core architectures provides the acceleration opportunity for media mining. However, to efficiently utilize the multi-core processors, we must effectively execute many threads at the same time. In this paper, we present how to explore the multi-core processors to speed up the computation-intensive media mining applications. We first parallelize two media mining applications by extracting the coarse-grained parallelism and evaluate their parallel speedups on a small-scale multi-core system. Our experiment shows that the coarse-grained parallelization achieves good scaling performance, but not perfect. When examining the memory requirements, we find that these coarse-grained parallelized workloads expose high memory demand. Their working set sizes increase almost linearly with the degree of parallelism, and the instantaneous memory bandwidth usage prevents them from perfect scalability on the 8-core machine. To avoid the memory bandwidth bottleneck, we turn to exploit the fine-grained parallelism and evaluate the parallel performance on the 8-core machine and a simulated 64-core processor. Experimental data show that the fine-grained parallelization demonstrates much lower memory requirements than the coarse-grained one, but exhibits significant read-write data sharing behavior. Therefore, the expensive inter-thread communication limits the parallel speedup on the 8-core machine, while excellent speedup is observed on the large-scale processor as fast core-to-core communication is provided via a shared cache. Our study suggests that (1) extracting the coarse-grained parallelism scales well on small-scale platforms, but poorly on large-scale system; (2) exploiting the fine-grained parallelism is suitable to realize the power of large-scale platforms; (3) future many-core chips can provide shared cache and sufficient on-chip interconnect bandwidth to enable efficient inter-core communication for applications with significant amounts of shared data. In short, this work demonstrates proper parallelization techniques are critical to the performance of multi-core processors. We also demonstrate that one of the important factors in parallelization is the performance analysis. The parallelization principles, practice, and performance analysis methodology presented in this paper are also useful for everyone to exploit the thread-level parallelism in their applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Djeraba, C. (2002). Multimedia mining: A highway to intelligent multimedia documents. Kluwer: Norwell.

    Google Scholar 

  2. Li, W., Li, E., et al. (2006). Workload characterization of a parallel video mining application on a 16-way shared-memory multiprocessor system. In IEEE international symposium on workload characterization.

  3. Ekin, A., Tekalp, A. M., & Mehrotr, R. (2003). Automatic soccer video analysis and summarization. IEEE Trans. Image Process, 12(7), 796–807. doi:10.1109/TIP.2003.812758.

    Article  Google Scholar 

  4. Ahanger, G., & Little, T.D.C. (1996). A survey of technologies for parsing and indexing digital video. Journal of Visual Communication and Image Representation, 7(No. 1), 28–43. March 1996.

    Article  Google Scholar 

  5. Li, E., Li, W., et al. (2006). Towards the parallelization of shot detection—a typical video mining application study. ICPP 2006, Columbus, Ohio, USA, August 14–18, 2006.

  6. Hough, P. (1959). Machine analysis of bubble chamber pictures. In international conference on high energy accelerators and instrumentation. CERN.

  7. Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the Association for Computing Machinery, 24, 381–395. doi:10.1145/358669.358692.

    MathSciNet  Google Scholar 

  8. Ami, C.: Fast Atomic Counters With the x86 LOCK Prefix. http://www.codemaestro.com/reviews/review00000104.html.

  9. Chen, Y. -K., Li, W., Li, J., & Wang, T. (2008). “Novel parallel Hough transform on multi-core processors.” In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2008.

  10. Li, E., Li, W. et al. (2008). “Accelerating video-mining applications by using many, small, general-purpose cores”. In IEEE MICRO, sep/oct, 2008.

  11. Intel Corp. VTune performance analyzer. available at http://www.intel.com/software/products/VTune.

  12. Levinthal, D. Analyzing and resolving multi-core non scaling on Intel core 2 processors, available at http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/non_scaling.pdf.

  13. Williams, S., & Patterson, D. (2008). “The roofline model: A pedagogical tool for auto-tuning Kernels on multicore architectures”. Hot Chips 20.

  14. Gsrawal, 8 Simple rules for designing threaded applications, available at http://softwarecommunity.intel.com/articles/eng/1607.htm.

  15. Chen, Y. -K., Chhugani, J., Hughes, C. J., Kim, D., Kumar, S., Lee, V., et al. (2007). High-performance physical simulations on next-generation architecture with many cores. Intel technology journal, 11(3).

  16. Chen, Y., Li, E., Li, W., Wang, T., Li, J., Tong, X., et al. (2007). Media mining—emerging tera-scale computing applications. Intel technology journal, 11(3).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenlong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, W., Tong, X., Wang, T. et al. Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors. J Sign Process Syst Sign Image Video Technol 57, 213–228 (2009). https://doi.org/10.1007/s11265-008-0320-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0320-5

Keywords

Navigation