Skip to main content
Log in

Unified UDispatch: A User Dispatching Tool for Multicore Systems

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In multicore environment, multithreading is often used to improve application performance. However, even in many simple applications, the performance might degrade when the number of threads increases. Users usually impute this phenomenon to the overhead of creation or termination of threads. In our observation, how the threads are dispatched to the multiple cores might have a more significant effect. We formally defined the problems on using threads as multithreading anomalies, and presented a novel user dispatching mechanism (UDispatch) which provides controllability in user space to improve application performance. Through modification of application source codes with the UDispatch application programming interface (API), the application performance can be improved significantly. However, since the application source codes might not be available or it might be too complicated to modify application source codes, we provided an extension, called UDispatch+, to dispatch threads without any modification of application source codes. In this paper, the UDispatch and UDispatch+ are integrated and wrapped for more portability and introduced as a tool called Unified UDispatch (UUD) with more detailed experiments and description. It can dispatch the application threads to specific cores at the discretion of users with up to 171.8% performance improvement on a 4-core machine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yang S S, Wang S W, Wu J L. A parallel algorithm for H.264/AVC deblocking filter based on limited error propagation effect. In Proc. IEEE Int. Conf. Multimedia and Expo, Beijing, China, Jul. 2–5, 2007, pp.1858–1861.

  2. Roitzsch M. Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proc. the 7th ACM & IEEE International Conference on Embedded Software, Salzburg, Austriov, Sept. 30–Oct. 5, 2007, pp.269–278.

  3. Chen Y K, Tian X, Ge S, Girkar M. Towards efficient multilevel threading of H.264 encoder on Intel hyper-threading architectures. In Proc. the 18th Int. Parallel and Distributed Processing Symp., Santa Fe, USA, Apr. 26–30, 2004, p.63.

  4. Quinn M J. Parallel Programming in C with MPI and OpenMP. McGraw-Hill, 2003.

  5. Dagum L, Menon R. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science & Engineering, Jan. 1998, 5(1): 46–55.

    Article  Google Scholar 

  6. Lim A W, Cheong G I, Lam M S. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proc. the 13th International Conference on Super-Computing, Rhodes, Greece, Jun. 20–25,1999, pp.228-237.

  7. Lim A W, Lam M S. Maximizing parallelism and minimizing synchronization with affine transforms. In Proc. the 24th ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, Paris, France, Jan. 15–17, 1997, pp.201-214.

  8. Graham R L. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics,1969, 17(2): 416–429.

    Article  MATH  Google Scholar 

  9. Intel® VTuneTM Performance Analyzer. http://www.intel.com/cd/software/products/asmo-na/eng/239144.htm, Oct. 2008.

  10. Tu T H, Hsueh C W, Chang R G. A portable and efficient user dispatching mechanism for multicore systems. In Proc. the 15th International Conference on Real-Time Computing Systems and Applications (RTCSA2009), Beijing, China, Aug. 24–26, 2009, pp.427–436.

  11. Mouw E. Linux kernel procfs guide. Delfty University of Technology an Systems, 2001.

  12. Mochel P. The sysfs filesystem. In Proc. Linux Symposium, Otlana, Canada, 2005, p.313.

  13. GNU C Library. http://www.gnu.org/software/libc, Nov. 2008.

  14. Tu T H, Lee Y C, Hsueh C W, Liu Y S. UDispatch+: A user dispatching tool with automatic binding. In Proc. International Computer Symposium (ICS), Tainan, China, Dec. 16–18, 2010.

  15. util-linux-ng. http://www.kernel.org/pub/linux/utils/utillinux-ng/, 2010.

  16. IBM: bindprocessor. http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp, 2010.

  17. Yanchik N, Cudmore A, Yeheskeli E, Molock D. OS Abstraction Layer (OSAL). http://opensource.gsfc.nasa.gov/projects/osal/index.php, Oct. 2007.

  18. Moinuddin A A, Khan E, Ghani F. An efficient technique for storage of two-tone images. IEEE Transactions on Consumer Electronics, Nov. 1997, 43(4): 1312–1319.

    Article  Google Scholar 

  19. ISO/IEC 14496–10, International standard of joint video specification. Coding of audiovisual objects — Part 10: Advanced video coding, 2003.

  20. Wang SW, Yang Ya-Ting, Li C Y, Tung Y S,Wu J L. The optimization of H.264/AVC baseline decoder on low-cost TriMedia DSP processor. In Proc. SPIE, Vol.5558, Denver, USA, Aug. 2004, pp.524–535.

  21. Zhou X, Li E Q, Chen Y K. Implementation of H.264 decoder on general-purpose processors with media instructions. In Proc. SPIE Conference on Image and Video Communication and Processing, Vol.5022, San Diego, USA, Jan. 2003, pp.224–235.

  22. Sung H. A skip-line with threshold technique for binary image compression [Master’s Thesis]. Fu Jen Catholic University, Jul. 2008.

  23. Corbet. Scheduling domains [LWN.net]. http://lwn.net/Articles/80911/, Apr. 2004.

  24. Virtual Device – Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Virtual device, Sep. 2008.

  25. Loadable Kernel Module – Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Loadable kernel module, Mar. 2009.

  26. POSIX threads (pthreads) forWin32. http://sourceware.org/pthreads-win32, 2005.

  27. procfs – Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Procfs, Mar. 2009.

  28. Richardson I E G. H.264 and MPEG-4 Video Compression. Wiley, 1st Edition, Aug. 2003, ISBN0-470-84837-5.

  29. H.264/AVC JM Reference Software. http://iphome.hhi.de/suehring/html, Jan. 2009.

  30. Wang S W, Yang S S, Chen H M, Yang C L, Wu J L. A multicore architecture based parallel framework for H.264/AVC deblocking filters. Journal of Signal Processing Systems,2009, 57(2): 195–211.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tang-Hsun Tu.

Additional information

This work was supported in part by the “National Science Council”, Taiwan, China, under Grant Nos. NSC-99-2628-E-002-027, NSC-99-2219-E-002-029 and the Excellent Research Projects of “National Taiwan University”, under Grant No. 99R80300.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 84.0 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, TH., Hsueh, CW. Unified UDispatch: A User Dispatching Tool for Multicore Systems. J. Comput. Sci. Technol. 26, 375–391 (2011). https://doi.org/10.1007/s11390-011-1141-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-011-1141-8

Keywords

Navigation