Skip to main content

Classification of Software Artifacts Based on Structural Information

  • Conference paper
Knowledge-Based and Intelligent Information and Engineering Systems (KES 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6279))

Abstract

Classification of software artifacts, in particularly the source code files, are currently performed by administrator of a repository. Even though there exist automated classification on these repositories, nevertheless existing approach focuses on semantic analysis of keywords found in the artifact. This paper presents the use of structural information, that is the software metrics, in determining the appropriate application domain for a particular artifact. Results obtained from the study show that there is a difference in the metrics’ trend between files of different application domain. It is also learned that results obtained using k-nearest neighborhood outperformed C4.5 decision tree and the one generated based on Discriminant Analysis in classifying files of database and graphics domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C and c++ code counter, http://sourceforge.net/projects/cccc/ (last accessed on April 15, 2010)

  2. Freshmeat, http://freshmeat.net/ (last accessed on May 10, 2010)

  3. Sourceforge, http://sourceforge.net/ (last accessed on May 10, 2010)

  4. Spss, http://www.spss.com (last accessed on April 15, 2010)

  5. Chung, K.-P., Fun, C.C.: A hierarchical nonparametric discriminant analysis approach for a content-based image retrieval system. In: ICEBE 2005: Proceedings of the IEEE International Conference on e-Business Engineering, Washington, DC, USA, pp. 346–351. IEEE Computer Society, Los Alamitos (2005)

    Chapter  Google Scholar 

  6. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: Proceedings of the 29th VLDB Conference, Berlin, Germany (2003)

    Google Scholar 

  7. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)

    Article  Google Scholar 

  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 391–407 (1990)

    Google Scholar 

  9. Fuchs, N.E.: Specifications are (preferably) executable. Software Engineering Journal 7(5), 323–334 (1992)

    Article  Google Scholar 

  10. Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining very large databases. Computer 32(8), 38–45 (1999)

    Article  Google Scholar 

  11. Kawaguchi, S., Garg, P.K., Makoto, M., Inoue, K.: Automatic categorization algorithm for evolvable software archive. In: Proceedings of the Six International Workshop on Principles of Software Evolution, pp. 195–200 (2002)

    Google Scholar 

  12. Kawaguchi, S., Garg, P.K., Makoto, M., Inoue, K.: Mudablue: An automatic categorization system for open source repositories. In: Proceedings of the 11th Asia-Pacific Software Engineering Conference, pp. 184–193 (2004)

    Google Scholar 

  13. Klecka, W.R.: Discriminant Analysis, 1st edn. Sage Publications, Thousand Oaks (1980)

    Google Scholar 

  14. Kwon, O.-W., Lee, J.-H.: Text categorization based on k-nearest neighbor approach for web site classification. Information Processing Management 39(1), 25–44 (2003)

    Article  MATH  Google Scholar 

  15. Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40(3), 203–228 (2000)

    Article  MATH  Google Scholar 

  16. Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: WCRE 2004: Proceedings of the 11th Working Conference on Reverse Engineering (WCRE 2004), Washington, DC, USA, pp. 214–223. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  17. DSFP Modeling and Forecasting. Svm - support vector machines, http://www.dtreg.com/svm.htm (last accessed on April 15, 2010)

  18. Nagappan, N.: Toward a software testing and reliability early warning metric suite. In: Proceedings of International Conference on Software Engineering, pp. 60–62 (2004)

    Google Scholar 

  19. U. of Waikato. Weka, http://www.cs.waikato.ac.nz/ml/weka (last accessed on April 15, 2010)

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  21. Ruggieri, S.: Efficient c4.5. IEEE Transactions on Knowledge and Data Engineering 14(2), 438–444 (2002)

    Article  Google Scholar 

  22. Shafia, Mustafa, T., Raza, A., Jamil, U., Shahzad, F.: A classification model for software workbenches. European Journal of Scientific Research 41(1), 109–121 (2010)

    Google Scholar 

  23. Ugurel, S., Krovetz, R., Giles, C.L.: What’s the code?: Automatic classification of source code archives. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–638. ACM Press, New York (2002)

    Google Scholar 

  24. Walters, S., Rajashekhar, T.B.: Mapping of two schemes of classification for software classification. Cataloging and Classification Quarterly 41(1), 163–182 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yusof, Y., Rana, O.F. (2010). Classification of Software Artifacts Based on Structural Information. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15384-6_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15383-9

  • Online ISBN: 978-3-642-15384-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics