skip to main content
10.1145/2479871.2479883acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

When spatial and temporal locality collide: the case of the missing cache hits

Published:21 April 2013Publication History

ABSTRACT

Even the simplest hardware, running the simplest programs, can behave in the strangest of ways. Tracking down the cause of a performance anomaly without the complete hardware reference of a processor is a prime example of black-box architectural exploration. When doubling the work of a simple benchmark program, that was run on a single core of Tilera's TILEPro64 processor, did not double the number of consumed cycles, a mystery was unveiled. After ruling out different levels of optimization for the two programs, a cycle-accurate simulation attributed the sub-optimal performance to an abnormally high number of L1 data cache misses. Further investigation showed that the processor stalled on every Read-After-Write instruction sequence when the following two conditions were met: 1) there are 0 or 1 instructions between the write and the read instruction and 2) the read and the write instructions target distinct memory locations that share an L1 cache line. We call this performance pitfall a RAW hiccup. We describe two countermeasures, memory padding and the explicit introduction of pipeline bubbles, that sidestep the RAW hiccup.

This experience paper serves as a useful troubleshooting guide for uncovering anomalous performance issues when the hardware design under study is unavailable.

References

  1. S. J. Eggers and T. E. Jeremiassen. Eliminating false sharing. In ICPP (1), pages 377--381, 1991.Google ScholarGoogle Scholar
  2. J. S. Emer and D. W. Clark. A characterization of processor performance in the vax-11/780. SIGARCH Comput. Archit. News, 12(3):301--310, Jan. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Heineman. Common performance issues in game programming, June 2008.Google ScholarGoogle Scholar
  4. J. L. Hennessy and D. A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Hinton, D. Sager, M. Upton, D. Boggs, D. P. Group, and I. Corp. The microarchitecture of the pentium 4 processor. Intel Technology Journal, 1:2001, 2001.Google ScholarGoogle Scholar
  6. D. A. Patterson and J. L. Hennessy. Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 4th edition, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. J. A. Sloane. Tetrahedral (or triangular pyramidal) numbers, Oct. 2012.Google ScholarGoogle Scholar
  8. N. J. A. Sloane. Triangular numbers, Oct. 2012.Google ScholarGoogle Scholar
  9. Tilera. Tile Processor User Architecture Manual, June 2010.Google ScholarGoogle Scholar

Index Terms

  1. When spatial and temporal locality collide: the case of the missing cache hits

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
          April 2013
          446 pages
          ISBN:9781450316361
          DOI:10.1145/2479871

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICPE '13 Paper Acceptance Rate28of64submissions,44%Overall Acceptance Rate252of851submissions,30%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader