research-article

Evolution of thread-level parallelism in desktop applications

Authors:

Geoffrey Blake,

Ronald G. Dreslinski,

Krisztián FlautnerAuthors Info & Claims

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Pages 302 - 313

https://doi.org/10.1145/1815961.1816000

Published: 19 June 2010 Publication History

Abstract

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that software developers will write their applications with concurrency in mind to take advantage of this sudden change in direction. In this study we analyze whether software developers for laptop/desktop machines have followed the recent hardware trends by creating software for chip multi-processing. We conduct a study of a wide range of applications on Microsoft Windows 7 and Apple's OS X Snow Leopard, measuring Thread Level Parallelism on a high performance workstation and a low power desktop. In addition, we explore graphics processing units (GPUs) and their impact on chip multi-processing. We compare our findings to a study done 10 years ago which concluded that a second core was sufficient to improve system responsiveness. Our results on today's machines show that, 10 years later, surprisingly 2-3 cores are more than adequate for most applications and that the GPU often remains under-utilized. However, in some application specific domains an 8 core SMT system with a 240 core GPU can be effectively utilized. Overall these studies suggest that many-core architectures are not a natural fit for current desktop/laptop applications.

References

[1]

IEEE. Standard for Threads Interface to POSIX. P1003.1c, 1996.

[2]

Intel Pentium Processor. http://datasheets.chipdb.org/Intel/x86/Pentium/24199710.PDF, 1997.

[3]

AMD Athlon Processor Product Brief. http://www.amd.com/us-en/Processors/ProductInformation/0,30_118_1260_759%5E1151,00.html, 1999.

[4]

Intel Pentium III Processor. http://www.intel.com/design/intarch/pentiumiii/pentiumiii.htm, 1999.

[5]

NVIDIA GeForce 256. http://www.nvidia.com/page/geforce256.html, 1999.

[6]

Power4 system microarchitecture. http://www-03.ibm.com/systems/p/-hardware/whitepapers/power4.html, 2001.

[7]

AMD Announces World's First 64-Bit, x86 Multi-Core Processors For Servers And Workstations At Second-Anniversary Celebration Of AMD Opteron Processor. AMD News Room, 2005.

[8]

Intel Has Double Vision: First Multi-Core Silicon Production Begins. Intel Press Room, 2005.

[9]

AMD "Close to Metal" Technology Unleashes the Power of Stream Computing. AMD News Room, 2006.

[10]

DTrace User Guide. Sun Microsystems Inc., 2006.

[11]

NVIDIA Unveils CUDA - the GPU Computing Revolution Begins. NVIDIA News Releases, 2006.

[12]

Intel Atom Processor. http://www.intel.com/products/processor/atom/specifications.htm, 2008.

[13]

NVIDIA PerfKit. Nvidia Developer Zone, 2008.

[14]

The Direct3D11 Compute Shader. Microsoft WINHEC Session GRA-T517, 2008.

[15]

AMD Displays Llano Die: 4 x86 Cores, 480 Stream Processors. http://www.xbitlabs.com/news/cpu/display/20091111143547 AMD Displays Llano Die 4 x86 Cores 480 Stream Processors.html, 2009.

[16]

Grand Central Dispatch:A better way to do multicore. Apple Inc. Technical Breif, 2009.

[17]

Intel Previews Intel Xeon 'Nehalem-EX' Processor. Intel Press Room, 2009.

[18]

International Technology Roadmap For Semiconductors - System Drivers. Iternational Technology Roadmap for Semiconductors, 2009.

[19]

Leopard Reference Library. Apple Inc. Developer Connection, 2009.

[20]

NVIDIA GeForce GT 120 (OEM Product). http://www.nvidia.com/object/product geforce gt 120 us.html, 2009.

[21]

OMAP 4: Mobile applications platform. Texas Instruments Product Bullentin, 2009.

[22]

OpenCL:Parallel Computing for Hetergeneous Devices. http://www.khronos.org/developers/library/overview/opencl overview.pdf, 2009.

[23]

AMD Sets the New Standard for Price, Performance, and Power for the Datacenter. AMD Newsroom, 2010.

[24]

Intel Sandy Bridge. http://en.wikipedia.org/wiki/Intel Sandy Bridge%28microarchitecture%29, 2010.

[25]

Intel Spotlights New Extreme Edition Processor, Software Developer Resources at Game Conference. Intel Press Room, 2010.

[26]

Interactive TLP Bench. http://itlpbench.eecs.umich.edu, 2010.

[27]

The Snapdragon Platform. http://www.qctconnect.com/products/snapdragon.html, 2010.

[28]

Ultra-Thin Notebooks: Powered by ultra-low-voltage Intel Core processors. http://www.intel.com/in/irdonline/ultra low.htm, 2010.

[29]

L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: a scalable architecture based on single-chip multiprocessing. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 282--293, New York, NY, USA, 2000. ACM.

Digital Library

[30]

A. Berillo. Multi-Core Processors in 3D Games. http://ixbtlabs.com/articles3/video/quadcorep6.html, 2008.

[31]

B. Chen, Y. Endo, K. Chan, D. M. A. Dias, A. Dias, M. Seltzer, and M. D. Smith. The measured performance of personal computer operating systems. ACM Transactions on Computer Systems, 14:3-40, 1995.

Digital Library

[32]

Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In OSDI '96: Proceedings of the second USENIX symposium on Operating systems design and implementation, pages 185--199, New York, NY, USA, 1996. ACM.

Digital Library

[33]

K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism and interactive performance of desktop applications. SIGARCH Comput. Archit. News, 28(5):129--138, 2000.

Digital Library

[34]

K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread-level parallelism of desktop applications. Workshop on Multi-threaded Execution, Architecture and Compilation, 2000.

[35]

E. Frachtenberg. Process scheduling for the parallel desktop. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 132--139, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[36]

E. Frachtenberg and Y. Etsion. Hardware Parallelism: Are Operating Systems Ready? (Case Studies in Mis-Scheduling). Workshop on the Interaction between Operating System and Computer Architecture, 2006.

[37]

N. Giacaman, O. Sinnen, N. Giacaman, and O. Sinnen. Inhibitors for desktop parallelisation, 2006.

[38]

L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The stanford hydra cmp. IEEE Micro, 20(2):71--84, 2000.

Digital Library

[39]

C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: a case study. SIGOPS Oper. Syst. Rev., 27(5):94--105, 1993.

Digital Library

[40]

L. D. Hung and S. Sakai. Dynamic estimation of task level parallelism with operating system support. In ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, pages 358--363, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[41]

R. Isaacs, P. Barham, J. Bulpin, R. Mortier, and D. Narayanan. Request extraction in magpie: events, schemas and temporal joins. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 17, New York, NY, USA, 2004. ACM.

Digital Library

[42]

C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. Parallelizing the Web Browser. First USENIX Workshop on Hot Topics in Parallelism, 2009.

Digital Library

[43]

D. C. Lee, P. J. Crowley, J.-L. Baer, T. E. Anderson, and B. N. Bershad. Execution characteristics of desktop applications on windows nt. SIGARCH Comput. Archit. News, 26(3):27--38, 1998.

Digital Library

[44]

M. K. McKusick, K. Bostic, M. J. Karels, and J. S. Quarterman. The design and implementation of the 4.4BSD operating system. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1996.

Digital Library

[45]

T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel application characterization for multiprocessor scheduling policy design. In of Lectures Notes in Computer Science, pages 105--118. Springer-Verlag, 1996.

[46]

I. Park and R. Buch. Improve Debugging And Performance Tuning With ETW. MSDN Magazine, 2007.

[47]

R. Rashid, R. Baron, R. Forin, D. Golub, and M. Jones. Mach: A system software kernel. In Proceedings of the 1989 IEEE International Conference, COMPCON, pages 176--178. Press, 1989.

[48]

M. Zhou and A. J. Smith. Analysis of personal computer workloads. In MASCOTS '99: Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, page 208, Washington, DC, USA, 1999. IEEE Computer Society.

Digital Library

Cited By

Elgamal MCarmean DAnsari EZed OPeri RManne SGupta UWei GBrooks DHills GWu CChien AEilam TPorter GAnderson TJosephson CPark J(2023)Carbon-Efficient Design Optimization for Computing SystemsProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605712(1-7)Online publication date: 9-Jul-2023
https://dl.acm.org/doi/10.1145/3604930.3605712
Wang XLi CZhang LHou XChen QGuo M(2022)Exploring Efficient Microservice Level Parallelism2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00030(223-233)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00030
Kwon EHan SPark YYoon JKang S(2021)Reinforcement Learning-Based Power Management Policy for Mobile Device SystemsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.310350368:10(4156-4169)Online publication date: Oct-2021
https://doi.org/10.1109/TCSI.2021.3103503
Show More Cited By

Index Terms

Evolution of thread-level parallelism in desktop applications
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Evolution of thread-level parallelism in desktop applications
ISCA '10

As the effective limits of frequency and instruction level parallelism have been reached, the strategy of microprocessor vendors has changed to increase the number of processing cores on a single chip each generation. The implicit expectation is that ...
A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

General-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-...
Thread shuffling: combining DVFS and thread migration toreduce energy consumptions for multi-core systems
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design

In recent years, multi-core systems have become mainstream in computer industry. The design of multi-cores takes advantage of thread-level parallelism in emerging applications that are computationally intensive and highly parallel. Energy efficiency is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

June 2010

520 pages

ISBN:9781450300537

DOI:10.1145/1815961

General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel

ACM SIGARCH Computer Architecture News Volume 38, Issue 3
ISCA '10
June 2010
508 pages
ISSN:0163-5964
DOI:10.1145/1816038
Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '10

Sponsor:

SIGARCH

ISCA '10: The 37th Annual International Symposium on Computer Architecture

June 19 - 23, 2010

Saint-Malo, France

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

94
Total Citations
View Citations
1,539
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Elgamal MCarmean DAnsari EZed OPeri RManne SGupta UWei GBrooks DHills GWu CChien AEilam TPorter GAnderson TJosephson CPark J(2023)Carbon-Efficient Design Optimization for Computing SystemsProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605712(1-7)Online publication date: 9-Jul-2023
https://dl.acm.org/doi/10.1145/3604930.3605712
Wang XLi CZhang LHou XChen QGuo M(2022)Exploring Efficient Microservice Level Parallelism2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00030(223-233)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00030
Kwon EHan SPark YYoon JKang S(2021)Reinforcement Learning-Based Power Management Policy for Mobile Device SystemsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2021.310350368:10(4156-4169)Online publication date: Oct-2021
https://doi.org/10.1109/TCSI.2021.3103503
Muller SSinger KGoldstein NAcar UAgrawal KLee IDonaldson ATorlak E(2020)Responsive parallelism with futures and stateProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386013(577-591)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386013
Lim GKang DEom Y(2020)Thread Evolution Kit for Optimizing Thread Operations on CE/IoT DevicesIEEE Transactions on Consumer Electronics10.1109/TCE.2020.303332866:4(289-298)Online publication date: Nov-2020
https://doi.org/10.1109/TCE.2020.3033328
Muller SWestrick SAcar U(2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3341685
Feng SPal SYang YDreslinski R(2019)Parallelism Analysis of Prominent Desktop Applications: An 18- Year Perspective2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00033(202-211)Online publication date: Mar-2019
https://doi.org/10.1109/ISPASS.2019.00033
Brandalero MSouto TCarro LBeck A(2019)Predicting performance in multi-core systems with shared reconfigurable acceleratorsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.01098:C(201-213)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.07.010
Kirchhoffer HMarpe DSchwarz HWiegand T(2018)Properties and Design of Variable-to-Variable Length CodesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/323065314:3(1-19)Online publication date: 24-Jul-2018
https://dl.acm.org/doi/10.1145/3230653
Kong LDai R(2018)Efficient Video Encoding for Automatic Video Analysis in Distributed Wireless Surveillance SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/322603614:3(1-24)Online publication date: 24-Jul-2018
https://dl.acm.org/doi/10.1145/3226036
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten