Skip to main content

Compiling Data Intensive Applications with Spatial Coordinates

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2017))

Abstract

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for efficient execution on cluster of workstations or distributed memory machines.

In this paper, we focus on data intensive applications with two important properties: 1) data elements have spatial coordinates associated with them and the distribution of the data is not regular with respect to these coordinates, and 2) the application processes only a subset of the available data on the basis of spatial coordinates. These applications arise in many domains like satellite data-processing and medical imaging. We present a general compilation and execution strategy for this class of applications which achieves high locality in disk accesses. We then present a technique for hoisting conditionals which further improves efficiency in execution of such compiled codes.

Our preliminary experimental results showtha t the performance from our proposed execution strategy is nearly two orders of magnitude better than a naive strategy. Further, up to 30% improvement in performance is observed by applying the technique for hoisting conditionals.

This work was supported by NSF grant ACR-9982087, NSF CAREER award ACI- 9733520, and NSF grant CCR-9808522.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology-the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.

    Google Scholar 

  2. Gagan Agrawal, Renato Ferreira, Joel Saltz, and Ruoming Jin. High-level programming methodologies for data intensive computing. In Proceedings of the Fifth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 2000.

    Google Scholar 

  3. Gagan Agrawal, Renato Ferriera, and Joel Saltz. Language extensions and compilation techniques for data intensive computations. In Proceedings of Workshop on Compilers for Parallel Computing, January 2000.

    Google Scholar 

  4. W. Blume and R. Eigenmann. Demand-driven, symbolic range propagation. Proceedings of the 8th Workshop on Languages and Compilers for Parallel Computing, pages 141–160, August 1995.

    Google Scholar 

  5. Rastislav Bodik, Rajiv Gupta, and Mary Lou Soffa. Interprocedural conditional branch elimination. In Proceedings of the SIGPLAN’ 97 Conference on Programming Language Design and Implementation, pages 146–158. ACM Press, June 1997.

    Google Scholar 

  6. R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny. A model and compilation strategy for out-of-core data parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 1–10. ACM Press, July 1995. ACM SIGPLAN Notices, Vol. 30, No. 8.

    Google Scholar 

  7. C. Chang, A. Acharya, A. Sussman, and J. Saltz. T2: A customizable parallel database for multi-dimensional data. ACM SIGMOD Record, 27(1):58–66, March 1998.

    Article  Google Scholar 

  8. Chialin Chang, Renato Ferreira, Alan Sussman, and Joel Saltz. Infrastructure for building parallel database systems for multi-dimensional data. In Proceedings of the Second Merged IPPS/SPDP (13th International Parallel Processing Symposium’ 10th Symposium on Parallel and Distributed Processing). IEEE Computer Society Press, April 1999.

    Google Scholar 

  9. Chialin Chang, Bongki Moon, Anurag Acharya, Carter Shock, Alan Sussman, and Joel Saltz. Titan: A high performance remote-sensing database. In Proceedings of the 1997 International Conference on Data Engineering, pages 375–384. IEEE Computer Society Press, April 1997.

    Google Scholar 

  10. Chialin Chang, Alan Sussman, and Joel Saltz. Scheduling in a high performance remote-sensing data server. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1997.

    Google Scholar 

  11. A.A. Chien and W.J. Dally. Concurrent aggregates (CA). In Proceedings of the Second ACM SIGPLAN Symposium on Principles’ Practice of Parallel Programming (PPOPP), pages 187–196. ACM Press, March 1990.

    Google Scholar 

  12. Renato Ferriera, Gagan Agrawal, and Joel Saltz. Compiling object-oriented data intensive computations. In Proceedings of the 2000 International Conference on Supercomputing, May 2000.

    Google Scholar 

  13. M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. In Proceedings of Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.

    Google Scholar 

  14. High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.

  15. M. Kandemir, J. Ramanujam, and A. Choudhary. Improving the performance of out-of-core computations. In Proceedings of International Conference on Parallel Processing, August 1997.

    Google Scholar 

  16. Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multilevel blocking. In Proceedings of the SIGPLAN’ 97 Conference on Programming Language Design and Implementation, pages 346–357, June 1997.

    Google Scholar 

  17. Tahsin M. Kurc, Alan Sussman, and Joel Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.

    Google Scholar 

  18. E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Communications of the ACM, 22(2):96–103, February 1979.

    Article  MATH  MathSciNet  Google Scholar 

  19. Todd C. Mowry, Angela K. Demke, and Orran Krieger. Automatic compiler-inserted i/o prefetching for out-of-core applications. In Proceedings of the Second Symposium on Operating Systems Design and plementation (OSDI’ 96), Nov 1996.

    Google Scholar 

  20. Frank Mueller and David B. Whalley. Avoiding conditional branches by code replication. In Proceedings of the ACM SIGPLAN’95 Conference on Programming Language Design and Implementation (PLDI), pages 56–66, La Jolla, California, 18-21 June 1995. SIGPLAN Notices 30(6), June 1995.

    Article  Google Scholar 

  21. NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN DOCS/ LAND BIO/origins.html.

  22. M. Paleczny, K. Kennedy, and C. Koelbel. Compiler support for out-of-core arrays on parallel machines. In Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation, pages 110–118. IEEE Computer Society Press, February 1995.

    Google Scholar 

  23. John P levyak and Andrew A. Chien. Precise concrete type inference for object-oriented languages. In Ninth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’ 94), pages 324–340, October 1994.

    Google Scholar 

  24. F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3(3):121–189, September 1995.

    Google Scholar 

  25. Peng Tu and David Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 1995 International Conference on Supercomputing, pages 414–423, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferreira, R., Agrawal, G., Jin, R., Saltz, J. (2001). Compiling Data Intensive Applications with Spatial Coordinates. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-45574-4_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42862-6

  • Online ISBN: 978-3-540-45574-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics