skip to main content
10.1145/1815961.1816000acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Evolution of thread-level parallelism in desktop applications

Published: 19 June 2010 Publication History

Abstract

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that software developers will write their applications with concurrency in mind to take advantage of this sudden change in direction. In this study we analyze whether software developers for laptop/desktop machines have followed the recent hardware trends by creating software for chip multi-processing. We conduct a study of a wide range of applications on Microsoft Windows 7 and Apple's OS X Snow Leopard, measuring Thread Level Parallelism on a high performance workstation and a low power desktop. In addition, we explore graphics processing units (GPUs) and their impact on chip multi-processing. We compare our findings to a study done 10 years ago which concluded that a second core was sufficient to improve system responsiveness. Our results on today's machines show that, 10 years later, surprisingly 2-3 cores are more than adequate for most applications and that the GPU often remains under-utilized. However, in some application specific domains an 8 core SMT system with a 240 core GPU can be effectively utilized. Overall these studies suggest that many-core architectures are not a natural fit for current desktop/laptop applications.

References

[1]
IEEE. Standard for Threads Interface to POSIX. P1003.1c, 1996.
[2]
Intel Pentium Processor. http://datasheets.chipdb.org/Intel/x86/Pentium/24199710.PDF, 1997.
[3]
AMD Athlon Processor Product Brief. http://www.amd.com/us-en/Processors/ProductInformation/0,30_118_1260_759%5E1151,00.html, 1999.
[4]
Intel Pentium III Processor. http://www.intel.com/design/intarch/pentiumiii/pentiumiii.htm, 1999.
[5]
NVIDIA GeForce 256. http://www.nvidia.com/page/geforce256.html, 1999.
[6]
Power4 system microarchitecture. http://www-03.ibm.com/systems/p/-hardware/whitepapers/power4.html, 2001.
[7]
AMD Announces World's First 64-Bit, x86 Multi-Core Processors For Servers And Workstations At Second-Anniversary Celebration Of AMD Opteron Processor. AMD News Room, 2005.
[8]
Intel Has Double Vision: First Multi-Core Silicon Production Begins. Intel Press Room, 2005.
[9]
AMD "Close to Metal" Technology Unleashes the Power of Stream Computing. AMD News Room, 2006.
[10]
DTrace User Guide. Sun Microsystems Inc., 2006.
[11]
NVIDIA Unveils CUDA - the GPU Computing Revolution Begins. NVIDIA News Releases, 2006.
[12]
Intel Atom Processor. http://www.intel.com/products/processor/atom/specifications.htm, 2008.
[13]
NVIDIA PerfKit. Nvidia Developer Zone, 2008.
[14]
The Direct3D11 Compute Shader. Microsoft WINHEC Session GRA-T517, 2008.
[15]
AMD Displays Llano Die: 4 x86 Cores, 480 Stream Processors. http://www.xbitlabs.com/news/cpu/display/20091111143547 AMD Displays Llano Die 4 x86 Cores 480 Stream Processors.html, 2009.
[16]
Grand Central Dispatch:A better way to do multicore. Apple Inc. Technical Breif, 2009.
[17]
Intel Previews Intel Xeon 'Nehalem-EX' Processor. Intel Press Room, 2009.
[18]
International Technology Roadmap For Semiconductors - System Drivers. Iternational Technology Roadmap for Semiconductors, 2009.
[19]
Leopard Reference Library. Apple Inc. Developer Connection, 2009.
[20]
NVIDIA GeForce GT 120 (OEM Product). http://www.nvidia.com/object/product geforce gt 120 us.html, 2009.
[21]
OMAP 4: Mobile applications platform. Texas Instruments Product Bullentin, 2009.
[22]
OpenCL:Parallel Computing for Hetergeneous Devices. http://www.khronos.org/developers/library/overview/opencl overview.pdf, 2009.
[23]
AMD Sets the New Standard for Price, Performance, and Power for the Datacenter. AMD Newsroom, 2010.
[24]
Intel Sandy Bridge. http://en.wikipedia.org/wiki/Intel Sandy Bridge%28microarchitecture%29, 2010.
[25]
Intel Spotlights New Extreme Edition Processor, Software Developer Resources at Game Conference. Intel Press Room, 2010.
[26]
Interactive TLP Bench. http://itlpbench.eecs.umich.edu, 2010.
[27]
The Snapdragon Platform. http://www.qctconnect.com/products/snapdragon.html, 2010.
[28]
Ultra-Thin Notebooks: Powered by ultra-low-voltage Intel Core processors. http://www.intel.com/in/irdonline/ultra low.htm, 2010.
[29]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: a scalable architecture based on single-chip multiprocessing. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 282--293, New York, NY, USA, 2000. ACM.
[30]
A. Berillo. Multi-Core Processors in 3D Games. http://ixbtlabs.com/articles3/video/quadcorep6.html, 2008.
[31]
B. Chen, Y. Endo, K. Chan, D. M. A. Dias, A. Dias, M. Seltzer, and M. D. Smith. The measured performance of personal computer operating systems. ACM Transactions on Computer Systems, 14:3-40, 1995.
[32]
Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In OSDI '96: Proceedings of the second USENIX symposium on Operating systems design and implementation, pages 185--199, New York, NY, USA, 1996. ACM.
[33]
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism and interactive performance of desktop applications. SIGARCH Comput. Archit. News, 28(5):129--138, 2000.
[34]
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism of desktop applications. Workshop on Multi-threaded Execution, Architecture and Compilation, 2000.
[35]
E. Frachtenberg. Process scheduling for the parallel desktop. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 132--139, Washington, DC, USA, 2005. IEEE Computer Society.
[36]
E. Frachtenberg and Y. Etsion. Hardware Parallelism: Are Operating Systems Ready? (Case Studies in Mis-Scheduling). Workshop on the Interaction between Operating System and Computer Architecture, 2006.
[37]
N. Giacaman, O. Sinnen, N. Giacaman, and O. Sinnen. Inhibitors for desktop parallelisation, 2006.
[38]
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The stanford hydra cmp. IEEE Micro, 20(2):71--84, 2000.
[39]
C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: a case study. SIGOPS Oper. Syst. Rev., 27(5):94--105, 1993.
[40]
L. D. Hung and S. Sakai. Dynamic estimation of task level parallelism with operating system support. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 358--363, Washington, DC, USA, 2005. IEEE Computer Society.
[41]
R. Isaacs, P. Barham, J. Bulpin, R. Mortier, and D. Narayanan. Request extraction in magpie: events, schemas and temporal joins. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 17, New York, NY, USA, 2004. ACM.
[42]
C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. Parallelizing the Web Browser. First USENIX Workshop on Hot Topics in Parallelism, 2009.
[43]
D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. SIGARCH Comput. Archit. News, 26(3):27--38, 1998.
[44]
M. K. McKusick, K. Bostic, M. J. Karels, and J. S. Quarterman. The design and implementation of the 4.4BSD operating system. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1996.
[45]
T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel application characterization for multiprocessor scheduling policy design. In of Lectures Notes in Computer Science, pages 105--118. Springer-Verlag, 1996.
[46]
I. Park and R. Buch. Improve Debugging And Performance Tuning With ETW. MSDN Magazine, 2007.
[47]
R. Rashid, R. Baron, R. Forin, D. Golub, and M. Jones. Mach: A system software kernel. In Proceedings of the 1989 IEEE International Conference, COMPCON, pages 176--178. Press, 1989.
[48]
M. Zhou and A. J. Smith. Analysis of personal computer workloads. In MASCOTS '99: Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, page 208, Washington, DC, USA, 1999. IEEE Computer Society.

Cited By

View all
  • (2023)Carbon-Efficient Design Optimization for Computing SystemsProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605712(1-7)Online publication date: 9-Jul-2023
  • (2022)Exploring Efficient Microservice Level Parallelism2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00030(223-233)Online publication date: May-2022
  • (2021)Reinforcement Learning-Based Power Management Policy for Mobile Device SystemsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.310350368:10(4156-4169)Online publication date: Oct-2021
  • Show More Cited By

Index Terms

  1. Evolution of thread-level parallelism in desktop applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
      June 2010
      520 pages
      ISBN:9781450300537
      DOI:10.1145/1815961
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
        ISCA '10
        June 2010
        508 pages
        ISSN:0163-5964
        DOI:10.1145/1816038
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • IEEE CS

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 June 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. benchmarking
      2. desktop applications
      3. multi-core
      4. thread level parallelism

      Qualifiers

      • Research-article

      Conference

      ISCA '10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Carbon-Efficient Design Optimization for Computing SystemsProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605712(1-7)Online publication date: 9-Jul-2023
      • (2022)Exploring Efficient Microservice Level Parallelism2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00030(223-233)Online publication date: May-2022
      • (2021)Reinforcement Learning-Based Power Management Policy for Mobile Device SystemsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.310350368:10(4156-4169)Online publication date: Oct-2021
      • (2020)Responsive parallelism with futures and stateProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386013(577-591)Online publication date: 11-Jun-2020
      • (2020)Thread Evolution Kit for Optimizing Thread Operations on CE/IoT DevicesIEEE Transactions on Consumer Electronics10.1109/TCE.2020.303332866:4(289-298)Online publication date: Nov-2020
      • (2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
      • (2019)Parallelism Analysis of Prominent Desktop Applications: An 18- Year Perspective2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00033(202-211)Online publication date: Mar-2019
      • (2019)Predicting performance in multi-core systems with shared reconfigurable acceleratorsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.01098:C(201-213)Online publication date: 1-Sep-2019
      • (2018)Properties and Design of Variable-to-Variable Length CodesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/323065314:3(1-19)Online publication date: 24-Jul-2018
      • (2018)Efficient Video Encoding for Automatic Video Analysis in Distributed Wireless Surveillance SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/322603614:3(1-24)Online publication date: 24-Jul-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media