skip to main content
article

Compiler-based I/O prefetching for out-of-core applications

Authors Info & Claims
Published:01 May 2001Publication History
Skip Abstract Section

Abstract

Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme the compiler provides the crucial information on future access patterns without burdening the programmer; the operating system supports nonbinding prefetch and release hints for managing I/O; and the operating systems cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively insert prefetches ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We implemented our compiler analysis within the SUIF compiler, and used it to target implementations of our run-time and OS support on both research and commercial systems (Hurricane and IRIX 6.5, respectively). Our experimental results show large performance gains for out-of-core scientific applications on both systems: more than 50% of the I/O stall time has been eliminated in most cases, thus translating into overall speedups of roughly twofold in many cases.

References

  1. ARUNACHALAM, M., CHOUDHARY, A., AND RULLMAN, B. 1995. A prefetching prototype for the parallel file system on the Paragon. In Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMET- RICS '95/PERFORMANCE '95, Ottawa, Ontario, Canada, May 15-19), B. D. Gaither, Ed. ACM Press, New York, NY, 321-323. Extended abstract. Google ScholarGoogle Scholar
  2. BAILEY, D., BARTON, J., LASINSKI, T., AND SIMON, H. 1991. The NAS parallel benchmarks. RNR-91-002.Google ScholarGoogle Scholar
  3. BORDAWEKAR, R., CHOUDHARY, A., AND RAMANUJAM, J. 1996. Automatic optimization of communication in compiling out-of-core stencil codes. In Proceedings of the 1996 international conference on Supercomputing (ICS '96, Philadelphia, PA, May 25-28), P. C. Yew, Chair. ACM Press, New York, NY, 366-373. Google ScholarGoogle Scholar
  4. BROWN,A.D.AND MOWRY, T. C. 2000. Taming the memory hogs: Using compiler-inserted releases to manage physical memory intelligently. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (San Diego, CA). 31-44. Google ScholarGoogle Scholar
  5. CAO, P., FELTEN,E.W.,KARLIN,A.R.,AND LI, K. 1995. A study of integrated prefetching and caching strategies. In Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '95/PER-FORMANCE '95, Ottawa, Ontario, Canada, May 15-19), B. D. Gaither, Ed. ACM Press, New York, NY, 188-197. Google ScholarGoogle Scholar
  6. CHANG,F.AND GIBSON, G. 1999. Automatic I/O hint generation through speculative execution. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI '99, New Orleans, LA., Feb.). USENIX Assoc., Berkeley, CA. Google ScholarGoogle Scholar
  7. CHEN,P.M.,LEE,E.K.,GIBSON,G.A.,KATZ,R.H.,AND PATTERSON, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2 (June), 145-185. Google ScholarGoogle Scholar
  8. COLVIN,A.AND CORMEN, T. H. 1998. ViC*: A preprocessor for virtual-memory C*. In Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'98, Orlando, FL, Mar.). Google ScholarGoogle Scholar
  9. CRANDALL,P.E.,AYDT,R.A.,CHIEN,A.A.,AND REED, D. A. 1995. Input/output characteristics of scalable parallel applications. In Proceedings of the 1995 Conference on Supercomputing (CD-ROM) (San Diego, CA, Dec. 3-8), S. Karin, Chair. ACM Press, New York, NY. Google ScholarGoogle Scholar
  10. CUREWITZ, K., KRISHNAN, P., AND VITTER, J. 1993. Practical prefetching via data compression. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD '93, Washington, DC, May 26-28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY, 43-53. Google ScholarGoogle Scholar
  11. DEL ROSARIO,J.M.AND CHOUDHARY, A. N. 1994. High-performance I/O for massively parallel computers: Problems and prospects. IEEE Computer 27, 3 (Mar.), 59-68. Google ScholarGoogle Scholar
  12. GANNON, D., JALBY, W., AND GALLIVAN, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parallel Distrib. Comput. 5, 5 (Oct.), 587-616. Google ScholarGoogle Scholar
  13. GRIFFIOEN,J.AND APPLETON, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the Winter Conference on USENIX (Jan.). USENIX Assoc., Berkeley, CA, 197-208.Google ScholarGoogle Scholar
  14. GRIMSHAW,A.S.AND LOYOT,E.C.JR. 1991. ELFS: Object-oriented extensible file systems. In Proceedings of the First International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec.). 510-513. Google ScholarGoogle Scholar
  15. HUBER,J.V.,CHIEN,A.A.,ELFORD,C.L.,BLUMENTHAL,D.S.,AND REED, D. A. 1995. PPFS: A high performance portable parallel file system. In Proceedings of the 9th ACM International Conference on Supercomputing (ICS '95, Barcelona, Spain, July 3-7), M. Valero, Chair. ACM Press, New York, NY, 385-394. Google ScholarGoogle Scholar
  16. IEEE. 1992. Threads extension for portable operating systems (Draft 7).Google ScholarGoogle Scholar
  17. KENNEDY, K., KOELBEL, C., AND PALECZNY, M. 1993. Scalable I/O for out-of-core structures. CRPC-TR93357-S. Center for Research on Parallel Computation, Rice University, Houston, TX.Google ScholarGoogle Scholar
  18. KIMBREL, T., TOMKINS, A., PATTERSON, R., BERSHAD, B., CAO, P., FELTEN, E., GIBSON, G., KARLIN, A., AND LI, K. 1996. A trace-driven comparison of algorithms for parallel prefetching and caching. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (Seattle, WA, Oct.). 19-34. Google ScholarGoogle Scholar
  19. KOTZ,D.AND ELLIS, C. S. 1990. Prefetching in file systems for MIMD multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1, 2 (Apr.), 218-230. Google ScholarGoogle Scholar
  20. KOTZ,D.AND ELLIS, C. S. 1993. Practical prefetching techniques for multiprocessor file systems. Distrib. Parallel Databases 1, 1 (Jan.), 33-51. Google ScholarGoogle Scholar
  21. KRIEGER,O.AND STUMM, M. 1997. HFS: A performance-oriented flexible file system based on building-block compositions. ACM Trans. Comput. Syst. 15, 3, 286-321. Google ScholarGoogle Scholar
  22. KRIEGER, O., STUMM, M., AND UNRAU, R. 1992. Exploiting the advantages of mapped files for stream I/O. In Proceedings of the 1992 Winter USENIX Conference (San Francisco, CA, Jan.). USENIX Assoc., Berkeley, CA, 27-42.Google ScholarGoogle Scholar
  23. KROEGER,T.M.AND LONG, D. D. E. 1996. Predicting file system actions from prior events. In Proceedings of the 1996 Technical Conference on USENIX (San Diego, CA, Jan.). USENIX Assoc., Berkeley, CA, 319-328. Google ScholarGoogle Scholar
  24. LAM, M. S. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '88, Atlanta, GA, June 22-24), R. L. Wexelblat, Ed. ACM Press, New York, NY, 318-328. Google ScholarGoogle Scholar
  25. LAUDON,J.AND LENOSKI, D. 1997. The SGI Origin2000: A ccNUMA highly scalable server. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA '97, Denver, CO, June 2-4), A. R. Pleszkun and T. Mudge, Chairs. ACM Press, New York, NY, 241-251. Google ScholarGoogle Scholar
  26. MALKAWI,M.AND PATEL, J. 1985. Compiler directed management policy for numerical programs. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, Washington, Dec.). 97-106. Google ScholarGoogle Scholar
  27. MOWRY, T. C. 1994. Tolerating latency through software-controlled data prefetching. Ph.D. Dissertation. Stanford University, Stanford, CA. Google ScholarGoogle Scholar
  28. MOWRY,T.C.,LAM,M.S.,AND GUPTA, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V, Boston, MA, Oct. 12-15), S. Eggers, Chair. ACM Press, New York, NY, 62-73. Google ScholarGoogle Scholar
  29. PALECZNY, M., KENNEDY, K., AND KOELBEL, C. 1995. Compiler support for out-of-core arrays on data parallel machines. In Proceedings of the Fifth Symposium on Frontiers of Massively Parallel Computation (McLean, VA, Feb.). 110-118. Google ScholarGoogle Scholar
  30. PARK, Y., SCOTT, R., AND SACHREST, S. 1996. Virtual memory versus file interfaces for large, memory-intensive scientific applications. In Proceedings of the Conference on Supercomputing (Pittsburgh, PA, Nov.). 17-22. Google ScholarGoogle Scholar
  31. PATTERSON,R.H.,GIBSON,G.A.,GINTING, E., STODOLSKY, D., AND ZELENKA, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating System Principles (SOSP, Copper Mountain Resort, Colorado, U.S., 3-6 Dec.). ACM Press, New York, NY, 79-95. Google ScholarGoogle Scholar
  32. POOLE, J. T. 1994. Preliminary survey of I/O intensive applications. CCSF-38.Google ScholarGoogle Scholar
  33. SINGH,T.AND CHOUDHARY, A. 1994. ADOPT: A dynamic scheme for optimal prefetching in parallel file systems.Google ScholarGoogle Scholar
  34. SONG,I.AND CHO, Y. 1993. Page prefetching based on fault history. In Proceedings of the Third Mach Symposium on USENIX (Santa Fe, NM, Apr.). 203-213. Google ScholarGoogle Scholar
  35. SWEENEY, A., DOUCETTE, D., HU, W., ANDERSON, C., NISHIMOTO, M., AND PECK, G. 1996. Scalability in the XFS file system. In Proceedings of the 1996 Technical Conference on USENIX (San Diego, CA, Jan.). USENIX Assoc., Berkeley, CA, 1-14. Google ScholarGoogle Scholar
  36. THAKUR, R., BORDAWEKAR, R., AND CHOUDHARY, A. 1994. Compilation of out-of-core data parallel programs for distributed memory machines. In Proceedings of IPPS '94 Workshop on Input/Output in Parallel Computer Systems (IPPS '94, Cancun, Mexico, Apr.). Syracuse University, Syracuse, NY, 54-72.Google ScholarGoogle Scholar
  37. THAKUR, R., BORDAWEKAR, R., CHOUDHARY, A., PONNUSAMY, R., AND SINGH, T. 1993. PASSION runtime library for parallel I/O. In Proceedings of the Conference on Scalable Parallel Libraries (Mississippi State University, Oct.), A. Skjellum, Ed. IEEE Computer Society, Washington, DC, 119-128.Google ScholarGoogle Scholar
  38. TJIANG,S.W.K.AND HENNESSY, J. L. 1992. Sharlit: A tool for building optimizers. In Proceedings of the 5th ACM SIGPLAN Conference on Programming Language Design and Implementation (SIGPLAN '92, San Francisco, CA, June 17-19), R. L. Wexelblat, Ed. ACM Press, New York, NY. Google ScholarGoogle Scholar
  39. TRIVEDI, K. 1977. On the paging performance of array algorithms. IEEE Trans. Comput. C-26, 10 (Oct.), 938-947.Google ScholarGoogle Scholar
  40. UNRAU,R.C.,KRIEGER, O., GAMSA, B., AND STUMM, M. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. J. Supercomput. 9, 1/2 (), 105-134. Google ScholarGoogle Scholar
  41. VRANESIC,Z.G.,STUMM, M., LEWIS,D.M.,AND WHITE, R. 1991. Hector: A hierarchically structured shared-memory multiprocessor. IEEE Computer 24, 1 (Jan.), 72-79. Google ScholarGoogle Scholar
  42. WOLF,M.E.AND LAM, M. S. 1991. A data locality optimization algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation (SIGPLAN '91, Toronto, Ontario, Canada, June 26-28), D. S. Wise, Chair. ACM Press, New York, NY, 30-44. Google ScholarGoogle Scholar
  43. WOMBLE, D., GREENBERG, D., RIESEN, R., AND WHEAT, S. 1993. Out of core, out of mind: Practical parallel I/O. In Proceedings of the Conference on Scalable Parallel Libraries (Mississippi State University, Oct.), A. Skjellum, Ed. IEEE Computer Society, Washington, DC, 10-16.Google ScholarGoogle Scholar

Index Terms

  1. Compiler-based I/O prefetching for out-of-core applications

        Recommendations

        Reviews

        Ted Brown

        Applications that need very large arrays which the access to the elements of the array in mostly sequential order, can improve their run times by prefetching out-of-core pages into memory. These applications are often scientific numerical applications. This paper clearly lays out an automated aid built on top of a virtual paged memory. The problem is complex: prefetching pages too early reduces the size of effective memory, prefetching pages too late slows up the processing. It might be argued that the application programmer is in the best seat to write these commands. They make the case that it is not only onerous for a programmer to be responsible for adding prefetching instructions, but the size of main memory, speed of I/O devices, etc. cannot nor should be the programmer's concern, and just as important it makes the code less portable, as changes to the hardware can effect the efficiency of the prefetching. The authors' solution is to automate the insertion of prefetch commands into the application code and have the application program interface with operating system during run time for final decisions about whether to do the prefetching or not. Consequently the authors needed to make modifications to the compiler, the I/O part of the operating system, and the operating system's memory manager component. The paper, almost 60 pages, is exceptionally long for a journal. The reason that I quickly saw is this is a must read paper if one is doing work in the area. It is clearly written and has a number of nicely thought out practices. For example, the compiler provides a guess of the future access patterns of data and inserts prefetch and release. But these are nonbinding performance hints; at run time it is up to the operating system layer then to make what is thinks is most effective decisions at the time these occur. It must decide if by prefetching a page it could be removing a page that may still be needed and may even be needed before the requested page. In the authors' system the application works closely with the operating system to make prefetching decisions, as they point out it is the application itself that should be making these decisions as it (should) know this best, whereas the operating system knows memory usage. The paper is written in a layed format. Details are increased three times. First an outline is given, then a justification for the augmentation to a system and an overview of the components. Finally in the longest sections a detailed description of the components of the system. Each is well written. The authors evaluate their ideas on two operating systems using NAS Parallel benchmarks (nine applications) and find a large speedup of roughly two-fold in many cases.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 19, Issue 2
          May 2001
          171 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/377769
          Issue’s Table of Contents

          Copyright © 2001 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 2001
          Published in tocs Volume 19, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader