Skip to main content

Cooperative Preprocessing at Petabytes on High Performance Computing System

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Abstract

With the explosion of data, we have an urgent demand for data throughput in high performance computing systems. Data-intensive applications are becoming increasingly common in HPC environments. As data scale increases faster than systems, it’s time to fully utilize resources in every aspect, including computing power, storage capacity and data throughput. We can no longer ignore data preprocessing since it’s an important procedure, especially when dealing with large amount of data. How to efficiently perform data preprocessing in current HPC systems? How to make full use of system resources on data-intensive applications? What should be valued when designing new HPC architectures? All these questions need answers. In this paper, we drew a sketch for procedure of data-intensive applications, which lead to an adaptive resource allocation scheme according to procedure requirements. We analyzed characters of preprocessing and designed a preprocessing model for data-intensive applications in HPC systems. It has not only fulfilled the demand for computing but also meet the need of throughput, with cooperative work in storage system and storage management system. Experiments were done on Sunway TaihuLight, one of the world’s fastest supercomputers. The whole procedure of preprocessing at Petabytes can be done in hours without interfering other ongoing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chodorow, K.: MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly. Media Inc., Newton (2013)

    Google Scholar 

  2. Fu, H., et al.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001 (2016)

    Article  Google Scholar 

  3. Huang, H., Lin, J., Chen, C., Fan, M.: Review of outlier detection. Appl. Res. Comput. 8, 002 (2006)

    Google Scholar 

  4. Islam, N.S., Lu, X., Wasi-ur Rahman, M., Shankar, D., Panda, D.K.: Triple-h: a hybrid approach to accelerate hdfs on hpc clusters with heterogeneous storage architecture. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 101–110. IEEE (2015)

    Google Scholar 

  5. Islam, N.S., Shankar, D., Lu, X., Wasi-Ur-Rahman, M., Panda, D.K.: Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store. In: 44th International Conference on Parallel Processing (ICPP), pp. 280–289. IEEE (2015)

    Google Scholar 

  6. Jian, Z., Jin, X.: Research on data preprocess in data mining and its application. Appl. Res. Comput. 7(117–118), 157 (2004)

    Google Scholar 

  7. Kalmegh, P., Navathe, S.B.: Graph database design challenges using hpc platforms. In: High Performance. Computing, Networking, Storage and Analysis (SCC), SC Companion, pp. 1306–1309. IEEE (2012)

    Google Scholar 

  8. Miller, J.J.: Graph database applications and concepts with neo4j. In: Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, vol. 2324, p. 36 (2013)

    Google Scholar 

  9. Miyoshi, T., Kondo, K., Terasaki, K.: Big ensemble data assimilation in numerical weather prediction. Computer 48(11), 15–21 (2015)

    Article  Google Scholar 

  10. Miyoshi, T., et al.: “Big data assimilation” revolutionizing severe weather prediction. Bull. Am. Meteorol. Soc. 97(8), 1347–1354 (2016)

    Article  Google Scholar 

  11. Wenguang, C.: Big data and high performance computing, 003, pp. 1–6 (2015)

    Google Scholar 

  12. Team at the University of Wisconsin Madison, H.: High Throughput Computing, June 2015. http://research.cs.wisc.edu/htcondor/htc.html

  13. Yi, Z., Peng, Z., Xuebin, C., Tie, N., Zongyan, C.: A brief view on requirements and development of high performance computing application. J. Comput. Res. Dev. 10, 001 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rujun Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, R., Zhang, L., Wang, X. (2018). Cooperative Preprocessing at Petabytes on High Performance Computing System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05054-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05053-5

  • Online ISBN: 978-3-030-05054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics