Skip to main content

An Out-of-Core Task-based Middleware for Data-Intensive Scientific Computing

  • Chapter
  • First Online:

Abstract

In datacenters, non-volatile memory storages are experiencing a fast adoption rate due to the high bandwidth and low latency advantages that they provide over the traditional disk-based storage systems in the management and analysis of large datasets. The drastic changes in system architecture will require rethinking systems software as well. Specifically, with improvements in hardware performance, software efficiency will become the next bottleneck. Here, we present an out-of-core task-based middleware together with a domain specific application interface, which will increase the programmer's productivity while still ensuring good performance and scalability by enabling the separation of computation and data movement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. P. Kogge and J. Shalf, “Exascale computing trends: Adjusting to the new normal in computer architecture,” Computing in Science Engineering, vol. PP, no. 99, pp. 1–1, 2013.

    Google Scholar 

  2. P. Ranganathan and J. Chang, “(Re)designing data-centric data centers,” Micro, IEEE, vol. 32, no. 1, pp. 66–70, 2012.

    Article  Google Scholar 

  3. E. Barragy, B. Brantley, S. Gurumurthi, M. Ignatowski, N. Jayasena, A. Lee, G. Loh, S. Manne, M. O’Connor, P. Popescu, S. Reinhardt, and M. Schulte, “Amd’s fastforward extreme-scale computing processor and memory research,” in US DOE Exascale Research Conference, Arlington, VA, USA, Oct. 2012.

    Google Scholar 

  4. R. Nair, J. Moreno, and D. Joseph, “Advanced memory concepts for exascale systems,” in US DOE Exascale Research Conference, Arlington, VA, USA, Oct. 2012.

    Google Scholar 

  5. Y.-K. Kwok and I. Ahmad, “Static scheduling algorithms for allocating directed task graphs to multiprocessors,” ACM Comput. Surv., vol. 31, no. 4, pp. 406–471, Dec. 1999.

    Google Scholar 

  6. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, “StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures,” Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, vol. 23, pp. 187–198, Feb. 2011.

    Google Scholar 

  7. G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier, and J. Dongarra, “DAGuE: A generic distributed DAG engine for high performance computing,” Parallel Computing, vol. 38, no. 1–2, pp. 37–51, 2012.

    Google Scholar 

  8. G. Bosilca, M. Faverge, X. Lacoste, I. Yamazaki, and P. Ramet, “Toward a supernodal sparse direct solver over DAG runtimes,” in Proceedings of PMAA'2012, London, UK, Jun. 2012.

    Google Scholar 

  9. A.-E. Hugo, A. Guermouche, R. Namyst, and P.-A. Wacrenier, “Composing multiple StarPU applications over heterogeneous machines: a supervised approach,” in Third International Workshop on Accelerators and Hybrid Exascale Systems, Boston, États-Unis, May 2013.

    Google Scholar 

  10. C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, “StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators,” in EuroMPI 2012, ser. LNCS, S. B. Jesper Larsson Träff and J. Dongarra, Eds., vol. 7490. Springer, Sep. 2012, poster Session.

    Google Scholar 

  11. M. Cosnard and M. Loi, “Automatic task graph genera tion techniques,” Parallel Processing Letters, vol. 5, no. 4, p. 527–538, 1995.

    Article  Google Scholar 

  12. M. Cosnard, E. Jeannot, and T. Yang, “Slc: Symbolic scheduling for executing parameterized task graphs on multiprocessors,” in Proc. ICPP, 1999.

    Google Scholar 

  13. S. Toledo, “A survey of out-of-core algorithms in numerical linear algebra,” in External memory algorithms, J. M. Abello and J. S. Vitter, Eds. Boston, MA, USA: American Mathematical Society, 1999, pp. 161–179.

    Google Scholar 

  14. J. K. Reid and J. A. Scott, “An out-of-core sparse cholesky solver,” ACM Trans. Math. Softw., vol. 36, no. 2, 2009.

    Google Scholar 

  15. V. Rotkin and S. Toledo, “The design and implementation of a new out-of-core sparse cholesky factorization method,” ACM Trans. Math. Softw., vol. 30, no. 1, pp. 19–46, 2004.

    Google Scholar 

  16. P. R. Amestoy, I. S. Duff, Y. Robert, F.-H. Rouet, and B. Ucar, “On computing inverse entries of a sparse matrix in an out-of-core environment,” CERFACS, Tech. Rep. TR/PA/10/59, 2010.

    Google Scholar 

  17. J. A. Scott, “Scaling and pivoting in an out-of-core sparse direct solver,” ACM Trans. Math. Softw., vol. 37, no. 2, 2010.

    Google Scholar 

  18. E. Agullo, A. Guermouche, and J.-Y. L’Excellent, “A parallel out-of-core multifrontal method: Storage of factors on disk and analysis of models for an out-of-core active memory,” Parallel Computing, Special Issue on Parallel Matrix Algorithms, no. 6–8, 2008.

    Google Scholar 

  19. E. Agullo, A. Guermouche, and J.-Y. L’Excellent, “Reducing the I/O Volume in Sparse Out-of-core Multifrontal Methods,” SIAM Journal on Scientific Computing, no. 6, 2010.

    Google Scholar 

  20. W. J. Knottenbelt and P. G. Harrison, “Distributed disk-based solution techniques for large markov models,” in Proc. of Numerical Solution of Markov Chains, 1999.

    Google Scholar 

  21. Y.-Y. Chen, Q. Gan, and T. Suel, “Local methods for estimating pagerank values,” in Proceedings of the thirteenth ACM international conference on Information and knowledge management, ser. CIKM '04. New York, NY, USA: ACM, 2004, pp. 381–389.

    Google Scholar 

  22. E. Saule, P.-F. Dutot, and G. Mounié, “Scheduling With Storage Constraints,” in Proc of IPDPS'08, Apr. 2008, conference, acceptance rate: 25.6%.

    Google Scholar 

  23. S. S. Tse, “Online bicriteria load balancing using object reallocation,” IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 3, pp. 379–388, 2009.

    Google Scholar 

  24. Ü. V. Çatalyürek, K. Kaya, and B. Uçar, “Integrated data placement and task assignment for scientific workflows in clouds,” in The Fourth International Workshop on Data Intensive Distributed Computing (DIDC 2011), in conjunction with the 20th International Symposium on High Performance Distributed Computing (HPDC 2011), Jun 2011.

    Google Scholar 

  25. R. Sethi, “Pebble games for studying storage sharing.” Theor. Comput. Sci., vol. 19, pp. 69–84, 1982.

    Google Scholar 

  26. S. Biswas and S. Kannan, “Minimizing space usage in evaluation of expression trees,” in Foundations of Software Technology and Theoretical Computer Science, ser. Lecture Notes in Computer Science, P. Thiagarajan, Ed. Springer Berlin Heidelberg, 1995, vol. 1026, pp. 377–390.

    Google Scholar 

  27. C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, “Memory-optimal evaluation of expression trees involving large objects,” in High Performance Computing – HiPC'99, ser. Lecture Notes in Computer Science, P. Banerjee, V. Prasanna, and B. Sinha, Eds. Springer Berlin Heidelberg, 1999, vol. 1745, pp. 103–110.

    Google Scholar 

  28. V. Rehn-Sonigo, D. Trystram, F. Wagner, H. Xu, and G. Zhang, “Offline scheduling of multi-threaded request streams on a caching server,” in IPDPS, 2011, pp. 1167–1176.

    Google Scholar 

  29. M. Jacquelin, L. Marchal, Y. Robert, and B. Uçar, “On optimal tree traversals for sparse matrix factorization,” in Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, 2011, pp. 556–567.

    Google Scholar 

  30. L. Marchal, O. Sinnen, and F. Vivien, “Scheduling tree-shaped task graphs to minimize memory and makespan,” INRIA, Rapport de recherche RR-8082, Oct. 2012.

    Google Scholar 

  31. Z. Zhou, E. Saule, H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, J. P. Vary, and Ü. V. Çatalyürek, “An out-of-core dataflow middleware to reduce the cost of large scale iterative solvers,” in 2012 International Conference on Parallel Processing (ICPP) Workshops, Fifth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Sep 2012.

    Google Scholar 

  32. M. D. Beynon, T. Kurc, Ü. V. Çatalyürek, C. Chang, A. Sussman, and J. Saltz, “Distributed processing of very large datasets with DataCutter,” Parallel Computing, vol. 27, no. 11, pp. 1457–1478, Oct. 2001.

    Google Scholar 

  33. Z. Zhou, E. Saule, H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, J. P. Vary, and Ü. V. Çatalyürek, “An out-of-core eigensolver on SSD-equipped clusters,” in Proc. of IEEE Cluster, Sep. 2012.

    Google Scholar 

  34. J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Apra, “Advances, applications and performance of the global arrays shared memory programming toolkit,” International Journal of High Performance Computing Applications, vol. 20, pp. 203–231, 2006.

    Google Scholar 

  35. P. Maris, H. M. Aktulga, M. A. Caprio, Ü. V. Çatalyürek, E. G. Ng, D. Oryspayev, H. Potter, E. Saule, M. Sosonkina, J. P. Vary et al., “Large-scale ab initio configuration interaction calculations for light nuclei,” Journal of Physics: Conference Series, vol. 403, no. 1, p. 012019, 2012.

    Google Scholar 

  36. P. Maris, H. M. Aktulga, S. Binder, A. Calci, Ü. V. Çatalyürek, J. Langhammer, E. Ng, E. Saule, R. Roth, J. P. Vary, and C. Yang, “No-Core CI calculations for light nuclei with chiral 2- and 3-body forces,” Journal of Physics: Conference Series, vol. 454, no. 1, p. 012063, 2013.

    Google Scholar 

  37. H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, “Improving the scalability of a symmetric iterative eigensolver for multi-core platforms,” Concurrency and Computation: Practice and Experience, p. in press, 2013.

    Google Scholar 

  38. P. Sternberg, E. G. Ng, C. Yang, P. Maris, J. P. Vary, M. Sosonkina, and H. V. Le, “Accelerating configuration interaction calculations for nuclear structure,” in Proc. of SC08, 2008.

    Google Scholar 

  39. A. V. Knyazev, “Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method,” SIAM Journal on Scientific Computing, vol. 23, no. 2, pp. 517–541, 2001.

    Google Scholar 

  40. F. B. Schmuck and R. L. Haskin, “GPFS: A shared-disk file system for large computing clusters,” in Proc. of FAST'02, 2002, pp. 231–244.

    Google Scholar 

  41. M. Jung, E. H. W. III, W. Choi, J. Shalf, H. M. Aktulga, C. Yang, E. Saule, Ü. V. Çatalyürek, and M. Kandemir, “Exploring the future of out-of-core computing with compute-local non-volatile memory,” in Proc. of Conference on High Performance Computing Networking, Storage and Analysis (SC '13), Nov 2013.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Saule .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Saule, E., Aktulga, H., Yang, C., Ng, E., Çatalyürek, Ü. (2015). An Out-of-Core Task-based Middleware for Data-Intensive Scientific Computing. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2092-1_22

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-2091-4

  • Online ISBN: 978-1-4939-2092-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics