We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Skip to main content

Advertisement

Log in

Improving bioinformatics applications performance via active storage systems

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems should embrace a dedicated processor to reduce the computational load of host machines and may have hybrid combinations of diverse storage devices. We present a pipelining technique for active storage systems and evaluate the design on a widely used bioinformatic application called pp-mpiBLAST. Experimental results indicate that the proposed technique can reduce overall execution time by up to 50% and retain better scalability of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

  • Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 1–14 (2009)

  • Chen, C., Chen, Y.: Dynamic active storage for high performance i/o. In: 2012 41st International Conference on Parallel Processing, pp. 379–388. IEEE (2012)

  • Felix, E.J., Fox, K., Regimbal, K., Nieplocha, J.: Active storage processing in a parallel file system. In: Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, pp. 85 (2006)

  • Fitch, B.G., Rayshubskiy, A., Pitman, M.C., Christopher Ward, T.J., Germain, R.S.: Using the active storage fabrics model to address petascale storage challenges. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 47–54 (2009)

  • Hajibaba, M., Sharifi, M., Gorgin, S.: Data-parallel computational model for next generation sequencing on commodity clusters. In: International Conference on Parallel Computing Technologies, pp. 273–288. Springer, New York (2019)

  • He, J., Bennett, J., Snavely, A.: Dash-io: an empirical study of flash-based io for hpc. In: Proceedings of the 2010 TeraGrid Conference, pp. 1–8 (2010)

  • Heshan, L., Xiaosong, M., Wuchun, F., Nagiza, F.S.: Coordinating computation and i/o in massively parallel sequence search. IEEE Trans. Parallel Distrib. Syst. 22(4), 529–543 (2010)

  • Dan Huang, Dezhi Han, Jun Wang, Jiangling Yin, Xunchao Chen, Xuhong Zhang, Jian Zhou, and Mao Ye. Achieving load balance for parallel data access on distributed file systems. IEEE Transactions on Computers, 67(3), 388–402, 2017

    Article  MathSciNet  Google Scholar 

  • Lin, H., Ma, X., Chandramohan, P., Geist, A., Samatova, N.: Efficient data access for parallel blast. In: 19th IEEE International Parallel and Distributed Processing Symposium, pp. 10, IEEE (2005)

  • Piernas, J., Nieplocha, J., Felix, E.J.: Evaluation of active storage strategies for the lustre parallel file system. In: SC’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–10. IEEE (2007)

  • Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13–24. Ieee (2007)

  • Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: Proceedings of the Tenth International Conference on Data Engineering, pp. 328–336, Washington, DC (1994). IEEE Computer Society

  • Sivathanu, M., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Evolving rpc for active storage. In: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pp. 264–276 (2002)

  • Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: The panasas activescale storage cluster-delivering scalable high bandwidth storage. In: SC’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, pp. 53–53. IEEE (2004)

  • Tian, Y., Klasky, S., Yu, W., Abbasi, H., Wang, B., Podhorszki, N., Grout, R., Wolf, M.: Smart-io: System-aware two-level data organization for efficient scientific analytics. In: 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 181–188 (2012)

  • Xie, Y., Feng, D., Li, Y., Long, D.D.E.: Oasis: an active storage framework for object storage platform. Fut. Gen. Comput. Syst. 56(Supplement C), 746–758 (2016)

  • Xu, Q., Aung, K.M., Zhu, Y., Yong, K.L.: Building a large-scale object-based active storage platform for data analytics in the internet of things. J. Supercomput. 72(7), 2796–2814 (2016)

  • Yang, X., Yin, Y., Jin, H., Sun, X.-H.: Scaler: Scalable parallel file write in hdfs. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 203–211. IEEE (2014)

  • Zhang, J., Xie, T., Jing, Y., Song, Y., Hu, G., Chen, S., Yin, S.: Bora: a bag optimizer for robotic analysis. In: 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 144–158. IEEE Computer Society (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Yin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, Z., Qin, X. & Yin, S. Improving bioinformatics applications performance via active storage systems. CCF Trans. HPC 3, 242–251 (2021). https://doi.org/10.1007/s42514-021-00073-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-021-00073-w

Keywords

Navigation