Skip to main content
Log in

Abstract

Pure software HDTV video decoding is still a challenging task on entry-level to mid-range desktop and notebook PCs, even with today’s microprocessors frequency measured in GHz. This paper shows that the performance bottleneck in a software MPEG-2 decoder has been shifted to memory operations, as microprocessor technologies including multimedia instruction extensions have been improving at a fast rate during the past years.

Our study exploits concurrencies at macroblock level to alleviate the performance bottleneck in a software MPEG-2 decoder. First, the paper introduces an interleaved block-order data layout to improve CPU cache performance. Second, the paper describes an algorithm to explicitly prefetch macroblocks for motion compensation. Finally, the paper presents an algorithm to schedule interleaved decoding and output at macroblock level. Our implementation and experiments show that these methods can effectively hide the latency of memory and frame buffer. The optimizations improve the performance of a multimedia-instruction-optimized software MPEG-2 decoder by a factor of about two. On a PC with a 933 MHz Pentium III CPU, the decoder can decode and display 1280 × 720-resolution HDTV streams at over 62 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. Patel, B.C. Smith, and L.A. Rowe, “Performance of a Software MPEG Video Decoder,” in Proceedings of the 1st ACM International Conference On Multimedia, 1993, pp. 75–82.

  2. M. Ikekawa, D. Ishii, E. Murata, K. Numata, Y. Takamizawa, and M. Tanaka, “A Real-time Software MPEG- 2 Decoder For Multimedia PCs,” in International Conference on Consumer Electronics, Digest of Technical Papers, 1997, pp. 2–3.

  3. R.B. Lee, “Realtime MPEG Video via Software Decompression on a PA-RISC Processor,” Compcon ‘95. “Technologies for the Information Superhighway,” 1995, pp. 186–192.

  4. Y. Tung, C. Ho, and J. Wu, “MMX-based DCT and MC Algorithms for Real-Time Pure Software MPEG Decoding,” in IEEE Intl. Conf. on Multimedia Computing and Systems, vol. 1, 1999, pp. 357–362.

  5. C. Zhou et al., “MPEG Video Decoding with the UltraSPARC Visual Instruction Set,” Compcon ‘95. “Technologies for the Information Superhighway”, 1995, pp. 470–477.

  6. D.A. Patterson and J.L. Hennessy, Computer Organization and Design, 2nd edn. Morgan Kaufmann Publishers, 1998.

  7. A. Peleg, S. Wilkie, and U. Weiser, “Intel MMX for Multimedia PCs,” Communications of the ACM, vol. 40, no. 1, 1997, pp. 25–38.

    Article  Google Scholar 

  8. D. LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, vol. 34, no. 4, 1991, pp. 46–58.

    Article  MathSciNet  Google Scholar 

  9. ISO/IEC 13818-2:2000. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video, 2nd edn. 2000.

  10. ISO/IEC 14496-2:2001. Coding of Audio-Visual Objects—Part 2: Visual, 2nd edn. 2001.

  11. M. Liou, “Overview of the p× 64 kbit/s Video Coding Standard,” Communications of the ACM, vol. 34, no. 4, 1991, pp. 59–63.

    Article  Google Scholar 

  12. ITU-T. Recommendation H.263: Video Coding for Low Bitrate Communication. ITU, 1995.

  13. ITU-T. Recommendation H.264: Advanced Video Coding for Generic Audiovisual Services. ITU, 2003.

  14. P. Ranganathan, S. Adve, and N.P. Jouppi, “Performance of Image and Video Processing with General- Purpose Processors and Media ISA Extensions,” in Proc. International Symposium on Computer Architecture, 1999, pp. 124–135.

  15. W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie, “Automatic Program Transformations for Virtual Memory Computers,” in Proceedings of the National Computer Conference, June 1979, pp. 969–974.

  16. J.L. Elshoff, “Some Programming Techniques for Processing Multi-Dimensional Matrices in a Paging Environment,” in Proceedings of the National Computer Conference, 1974.

  17. S. Coleman and K.S. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” in Proceedings of the Conference on Programming Language Design and Implementation, 1995, pp. 279–290.

  18. D. Gannon, W. Jalby, and K. Gallivan, “Strategies for Cache and Local Memory Management by Global Program Transformation,” Journal of Parallel and Distributed Computing, vol. 5, 1988, pp. 587–616.

    Article  Google Scholar 

  19. M.D. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 63–74.

  20. J. Philbin, J. Edler, O.J. Anshus, C.C. Douglas, and K. Li, “Thread Scheduling For Cache Locality,” in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, 1996, pp. 60–71.

  21. N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache Prefetch Buffers,” in Proceedings of the 17th Annual Symposium on Computer Architecture, 1990, pp. 364–375.

  22. A.J. Smith, “Cache Memories,” ACM Computing Surveys, vol. 14, no. 3, 1982, pp. 473–530.

    Article  Google Scholar 

  23. J.-L. Baer and T.-F. Chen, “An Effective On-chip Preloading Scheme to Reduce Data Access Penalty,” in Proceedings of the 1991 Conference on Supercomputing, 1991, pp. 176–186.

  24. T.-F. Chen and J.-L. Baer, “A Performance Study of Software and Hardware Data Prefetching Schemes,” in Proceedings of the 21st Annual International Symposium on Computer Architecture, 1994, pp. 223–232.

  25. A.C. Klaiber and H.M. Levy, “An Architecture for Software-Controlled Data Prefetching,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, pp. 43–53.

  26. D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 40–52.

  27. T.C. Mowry, “Tolerating Latency in Multiprocessors Through Compiler-inserted Prefetching,” ACM Transactions on Computer System, vol. 16, no. 1, 1998, pp. 55–92.

    Article  Google Scholar 

  28. T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1992, pp. 62–73.

  29. P. Ranganathan, V.S. Pai, H. Abdel-Shafi, and S.V. Adve, “The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems,” in Proceedings of the 24th International Symposium on Computer Architecture, 1997, pp. 144–156.

  30. P. Soderquist and M. Leeser, “Optimizing the Data Cache Performance of a Software MPEG-2 Video Decoder,” in Proc. International Conference on Multimedia, 1997, 291–301.

  31. D.F. Zucker, M.J. Flynn, and R.B. Lee, “A Comparison of Hardware Prefetching Techniques for Multimedia Benchmarks,” in Proc. of the Third IEEE International Conference on Multimedia Computing and Systems, 1996, pp. 236–244.

  32. D.F. Zucker, M.J. Flynn, and R.B. Lee, “Improving Performance for Software MPEG Players,” Compcon ‘96. Technologies for the Information Superhighway, 1996, pp. 327–332.

  33. D.F. Zucker, R.B. Lee, and M.J. Flynn, “An Automated Method for Software Controlled Cache Prefetching,” in Proceedings of the Thirty-First Hawaii International Conference on System Sciences, vol. 7, 1998, pp. 106–114.

  34. D.F. Zucker, R.B. Lee, and M.J. Flynn, “Hardware and Software Cache Prefetching Techniques for MPEG Benchmarks,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 5, 2000, pp. 782–796.

    Article  Google Scholar 

  35. R. Cucchiara, M. Piccardi, and A. Prati, “Exploiting Cache in Multimedia,” in IEEE International Conference on Multimedia Computing and System, vol. 1, 1999, pp. 345–350.

  36. R. Cucchiara, M. Piccardi, and A. Prati, “Hardware Prefetching Techniques for Cache Memories in Multimedia Applications,” in Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, 2000, pp. 311–319.

  37. Y.-K. Chen, E. Debes, R. Lienhart, M. Holliman, and M. Yeung, “Evaluating and Improving Performance of Multimedia Applications on Simultaneous Multi-Threading,” in Proceedings of International Conference on Parallel and Distributed Systems, 2002.

  38. L. Peng, J. Song, S. Ge, and Y.-K.Chen, “Case Studies: Memory Behavior of Multithreaded Multimedia and AI Applications,” in Proceedings of Workshop on Computer Architecture Evaluation using Commercial Workloads, 2004, pp. 33–40.

  39. Microsoft Corp. Visual C++ 6.0 with Service Pack 5. http://msdn.microsoft.com/visualc/

  40. Intel Corp. VTune Performance Analyzer,” http://developer.intel.com/software/products/vtune/

  41. S. Eckart and C.E. Fogg, “ISO/IEC MPEG-2 Software Video Codec,” in Proc. Digital Video Compression: Algorithms and Technologies 1995, SPIE, 1995, pp. 100–109.

  42. Y. Arai, T. Agui, and M. Nakajima, “A Fast DCT-SQ Scheme for Images,” in Transactions of the IEICE, no. 11, November 1988, pp. 1095–1097.

  43. Intel Corp, “Application Note AP-529: Using MMX Instructions to Implement Optimized Motion Compensation for MPEG1 Video Playback,” Archived at http://www.cae.wisc.edu/~ece734/mmx/AP-529.html.

  44. P. Denning, “Virtual Memory,” Computing Surveys, vol. 2, no. 3, 1970, pp. 169.

    Article  Google Scholar 

  45. M.J. Holliman, E.Q. Li, and Y.-K. Chen, “MPEG Decoding Workload Characterization,” in Proceedings of Workshop on Computer Architecture Evaluation using Commercial Workloads, Feb. 2003, pp. 23–34.

  46. Intel Corp, “Intel Architecture Optimization Reference Manual,” http://www.intel.com/design/pentiumii/manuals/245127.htm

  47. Intel Corp. Intel Architecture Software Developer’s Manual Volume 3: System Programming, Chapter 9, Memory Cache Control,” http://developer.intel.com/design/pentiumii/manuals/243192.htm

  48. M.D. Hill, “Aspects of Cache Memory and Instruction Buffer Performance,” PhD thesis, Computer Science Division, University of California at Berkeley, 1987.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Chen.

Additional information

This work was done while the author was a Ph.D. candidate in the Computer Science

Han Chen is a research staff member in IBM T.J. Watson Research Center. His research interests include distributed computing systems, scalable display system, and multimedia. He received his Ph.D. degree in 2003 and his M.A. degree in 1999 from Princeton University. He received his B.S. degree from Tsinghua University of Beijing, China in 1997.

Kai Li is a Charles Fitzmorris professor at the Computer Science Department of Princeton University. His research interests include operating systems, computer architecture, distributed systems, and scalable display systems. He received his Ph.D. degree from Yale University in 1986. Prior to that, he received his M.S. degree from University of Science and Technology of China, Academy of Sciences of China in 1981 and a B.S. degree from Jilin University in China in 1977. He was a visiting faculty member at University of Toronto in 1988 and a visiting professor at Stanford University during his sabbaticals in 1996 and 2000. He has served on dozens of program committees and served as chair or vice chair several times. He has been elected as an ACM fellow in 1998.

Bin Wei received a Ph.D. in Computer Science from Princeton University in 1998 and joined the research community at AT&T Shannon Laboratories since then. His research interests are in the areas of high-performance computer systems, multimedia, and service platforms for mobile users. He received a BS in Computer Science from Tianjin University, China in 1983 and an MS in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences, in 1989.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Li, K. & Wei, B. Memory Performance Optimizations For Real-Time Software HDTV Decoding. J VLSI Sign Process Syst Sign Image Video Technol 41, 193–207 (2005). https://doi.org/10.1007/s11265-005-6650-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-005-6650-7

Keywords

Navigation