Skip to main content

Learning I/O Variables from Scientific Software’s User Manuals

  • Conference paper
  • First Online:
Computational Science – ICCS 2022 (ICCS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13353))

Included in the following conference series:

Abstract

Scientific software often involves many input and output variables. Identifying these variables is important for such software engineering tasks as metamorphic testing. To reduce the manual work, we report in this paper our investigation of machine learning algorithms in classifying variables from software’s user manuals. We identify thirteen natural-language features, and use them to develop a multi-layer solution where the first layer distinguishes variables from non-variables and the second layer classifies the variables into input and output types. Our experimental results on three scientific software systems show that random forest and feedforward neural network can be used to best implement the first layer and second layer respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abualhaija, S., Arora, C., Sabetzadeh, M., Briand, L.C., Vaz, E.: A machine learning-based approach for demarcating requirements in textual specifications. In: International Requirements Engineering Conference, pp. 51–62 (2019)

    Google Scholar 

  2. Aghajani, E., et al.: Software documentation: the practitioners’ perspective. In: International Conference on Software Engineering, pp. 590–601 (2020)

    Google Scholar 

  3. Arnold, J.G., Kiniry, J.R., Srinivasan, R., Williams, J.R., Haney, E.B., Neitsch, S.L.: Soil & Water Assessment Tool (SWAT) Input/Output Documentation (Version 2012). https://swat.tamu.edu/media/69296/swat-io-documentation-2012.pdf. Accessed 06 Mar 2022

  4. Bhowmik, T., Niu, N., Wang, W., Cheng, J.-R.C., Li, L., Cao, X.: Optimal group size for software change tasks: a social information foraging perspective. IEEE Trans. Cybern. 46(8), 1784–1795 (2016)

    Article  Google Scholar 

  5. Burungale, A.A., Zende, D.A.: Survey of large-scale hierarchical classification. Int. J. Eng. Res. Gen. Sci. 2(6), 917–921 (2014)

    Google Scholar 

  6. Challa, H., Niu, N., Johnson, R.: Faulty requirements made valuable: on the role of data quality in deep learning. In: International Workshop on Artificial Intelligence and Requirements Engineering, pp. 61–69 (2020)

    Google Scholar 

  7. Chattopadhyay, A., Niu, N., Peng, Z., Zhang, J.: Semantic frames for classifying temporal requirements: an exploratory study. In: Workshop on Natural Language Processing for Requirements Engineering (2021)

    Google Scholar 

  8. Chen, T.Y., Poon, P.-L., Xie, X.: METamorphic relation identification based on the category-choice framework (METRIC). J. Syst. Softw. 116, 177–190 (2016)

    Google Scholar 

  9. Clarno, K., de Almeida, V., d’Azevedo, E., de Oliveira, C., Hamilton, S.: GNES-R: global nuclear energy simulator for research task 1: high-fidelity neutron transport. In: American Nuclear Society Topical Meeting on Reactor Physics: Advances in Nuclear Analysis and Simulation (2006)

    Google Scholar 

  10. Dalpiaz, F., Dell’Anna, D., Aydemir, F.B., Çevikol, S.: Requirements classification with interpretable machine learning and dependency parsing. In: International Requirements Engineering Conference, pp. 142–152 (2019)

    Google Scholar 

  11. Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(3), 613–619 (1973)

    Article  Google Scholar 

  12. Gudaparthi, H., Johnson, R., Challa, H., Niu, N.: Deep learning for smart sewer systems: assessing nonfunctional requirements. In: International Conference on Software Engineering: Software Engineering in Society, pp. 35–38 (2020)

    Google Scholar 

  13. Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-based resampling: building robust consolidated decision trees. Knowl. Based Syst. 79, 51–67 (2015)

    Article  Google Scholar 

  14. Kanewala, U., Chen, T.Y.: Metamorphic testing: a simple yet effective approach for testing scientific software. Comput. Sci. Eng. 21(1), 66–72 (2019)

    Article  Google Scholar 

  15. Khatwani, C., Jin, X., Niu, N., Koshoffer, A., Newman, L., Savolainen, J.: Advancing viewpoint merging in requirements engineering: a theoretical replication and explanatory study. Requir. Eng. 22(3), 317–338 (2017). https://doi.org/10.1007/s00766-017-0271-0

    Article  Google Scholar 

  16. Li, Y., Guzman, E., Tsiamoura, K., Schneider, F., Bruegge, B.: Automated requirements extraction for scientific software. In: International Conference on Computational Science, pp. 582–591 (2015)

    Google Scholar 

  17. Lin, X., Peng, Z., Niu, N., Wang, W., Liu, H.: Finding metamorphic relations for scientific software. In: International Conference on Software Engineering (Companion Volume), pp. 254–255 (2021)

    Google Scholar 

  18. Lin, X., Simon, M., Peng, Z., Niu, N.: Discovering metamorphic relations for scientific software from user forums. Comput. Sci. Eng. 23(2), 65–72 (2021)

    Article  Google Scholar 

  19. Lin, X., Simon, M., Niu, N.: Releasing scientific software in GitHub: a case study on SWMM2PEST. In: International Workshop on Software Engineering for Science, pp. 47–50 (2019)

    Google Scholar 

  20. Lin, X., Simon, M., Niu, N.: Scientific software testing goes serverless: creating and invoking metamorphic functions. IEEE Softw. 38(1), 61–67 (2021)

    Article  Google Scholar 

  21. Maarek, Y.S., Berry, D.M., Kaiser, G.E.: An information retrieval approach for automatically constructing software libraries. IEEE Trans. Softw. Eng. 17(8), 800–813 (1991)

    Article  Google Scholar 

  22. Maltbie, N., Niu, N., Van Doren, M., Johnson, R.: XAI tools in the public sector: a case study on predicting combined sewer overflows. In: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1032–1044 (2021)

    Google Scholar 

  23. Nguyen-Hoan, L., Flint, S., Sankaranarayana, R.: A survey of scientific software development. In: International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2010)

    Google Scholar 

  24. Niu, N., Koshoffer, A., Newman, L., Khatwani, C., Samarasinghe, C., Savolainen, J.: Advancing repeated research in requirements engineering: a theoretical replication of viewpoint merging. In: International Requirements Engineering Conference, pp. 186–195 (2016)

    Google Scholar 

  25. Niu, N., Yu, Y., González-Baixauli, B., Ernst, N., Leite, J., Mylopoulos, J.: Aspects across software life cycle: a goal-driven approach. Trans. Aspect-Orient. Softw. Develop. V1, 83–110 (2009)

    Article  Google Scholar 

  26. NLTK. Natural Language Toolkit. https://www.nltk.org. Accessed 06 Mar 2022

  27. Pawlik, A., Segal, J., Petre, M.: Documentation practices in scientific software development. In: International Workshop on Cooperative and Human Aspects of Software Engineering, pp. 113–119 (2012)

    Google Scholar 

  28. Peng, Z., Kanewala, U., Niu, N.: Contextual understanding and improvement of metamorphic testing in scientific software development. In: Int. Symp. Emp. Softw. Eng. Measur. pp. 28:1–28:6 (2021)

    Google Scholar 

  29. Peng, Z., Lin, X., Niu, N.: Data of Classifying I/O Variables via Machine Learning. https://doi.org/10.7945/85j1-qf68. Accessed 06 Mar 2022

  30. Peng, Z., Lin, X., Niu, N.: Unit tests of scientific software: a study on SWMM. In: International Conference on Computational Science, pp. 413–427 (2020)

    Google Scholar 

  31. Peng, Z., Lin, X., Niu, N., Abdul-Aziz, O.I.: I/O associations in scientific software: a study of SWMM. In: International Conference on Computational Science, pp. 375–389 (2021)

    Google Scholar 

  32. Peng, Z., Lin, X., Simon, M., Niu, N.: Unit and regression tests of scientific software: a study on SWMM. J. Comput. Sci. 53, 101347:1–101347:13 (2021)

    Google Scholar 

  33. Peng, Z., Niu, N.: Co-AI: a Colab-based tool for abstraction identification. In: International Requirements Engineering Conference, pp. 420–421 (2021)

    Google Scholar 

  34. Rossman, L.A.: Storm Water Management Model User’s Manual Version 5.1. https://www.epa.gov/water-research/storm-water-management-model-swmm-version-51-users-manual. Accessed 06 Mar 2022

  35. Sanders, R., Kelly, D.: Dealing with risk in scientific software development. IEEE Softw. 25(4), 21–28 (2008)

    Article  Google Scholar 

  36. Scikit-learn. Machine Learning in Python. https://scikit-learn.org/stable/ Accessed 06 Mar 2022

  37. Spikerog SAS. ExtractPDF. https://www.extractpdf.com. Accessed 06 Mar 2022

  38. Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classification. ISIS, vol. 36. Springer, Boston (2016). https://doi.org/10.1007/978-1-4899-7641-3

    Book  MATH  Google Scholar 

  39. TextBlob. Simplified Text Processing. https://textblob.readthedocs.io. Accessed 06 Mar 2022

  40. United States Department of Agriculture. Soil & Water Assessment Tool (SWAT). https://data.nal.usda.gov/dataset/swat-soil-and-water-assessment-tool. Accessed 06 Mar 2022

  41. United States Department of the Interior & United States Geological Survey. Modular Hydrologic Model (MODFLOW) Description of Input and Output (Version 6.0.0). https://water.usgs.gov/ogw/modflow/mf6io.pdf. Accessed 06 Mar 2022

  42. United States Environmental Protection Agency. Agency-wide Quality System Documents. https://www.epa.gov/quality/agency-wide-quality-system-documents. Accessed 06 Mar 2022

  43. United States Environmental Protection Agency. Storm Water Management Model (SWMM). https://www.epa.gov/water-research/storm-water-management-model-swmm. Accessed 06 Mar 2022

  44. United States Geological Survey. Modular Hydrologic Model (MODFLOW). https://www.usgs.gov/software/software-modflow. Accessed 06 Mar 2022

  45. United States Geological Survey. Review and Approval of Scientific Software for Release (IM OSQI 2019–01). https://www.usgs.gov/about/organization/science-support/survey-manual/im-osqi-2019-01-review-and-approval-scientific. Accessed 06 Mar 2022

  46. Vilkomir, S.A., Swain, W.T., Poore, J.H., Clarno, K.T.: Modeling input space for testing scientific computational software: a case study. In: International Conference on Computational Science, pp. 291–300 (2008)

    Google Scholar 

  47. Wang, W., Niu, N., Liu, H., Niu, Z.: Enhancing automated requirements traceability by resolving polysemy. In: International Requirements Engineering Conference, pp. 40–51 (2018)

    Google Scholar 

  48. Wikipedia. Storm Water Management Model. https://en.wikipedia.org/wiki/Storm_Water_Management_Model. Accessed 06 Mar 2022

  49. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)

    Google Scholar 

  50. Zhou, Z., Xiang, S., Chen, T.Y.: Metamorphic testing for software quality assessment: a study of search engines. IEEE Trans. Softw. Eng. 42(3), 264–284 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

We thank the EPA SWMM team, especially Michelle Simon, for the research collaborations. We also thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Niu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, Z., Lin, X., Santhoshkumar, S.N., Niu, N., Kanewala, U. (2022). Learning I/O Variables from Scientific Software’s User Manuals. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13353. Springer, Cham. https://doi.org/10.1007/978-3-031-08760-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08760-8_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08759-2

  • Online ISBN: 978-3-031-08760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics