Abstract
Many data mining techniques are these days in use for ontology learning – text mining, Web mining, graph mining, link analysis, relational data mining, and so on. In the current state-of-the-art bundle there is a lack of “software mining” techniques. This term denotes the process of extracting knowledge out of source code. In this paper we approach the software mining task with a combination of text mining and link analysis techniques. We discuss how each instance (i.e. a programming construct such as a class or a method) can be converted into a feature vector that combines the information about how the instance is interlinked with other instances, and the information about its (textual) content. The so-obtained feature vectors serve as the basis for the construction of the domain ontology with OntoGen, an existing system for semi-automatic data-driven ontology construction.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Batagelj, V., Mrvar, A., de Nooy, W.: Exploratory Network Analysis with Pajek. Cambridge University Press, Cambridge (2004)
Brank, J., Leskovec, J.: The Download Estimation Task on KDD Cup 2003. In: ACM SIGKDD Explorations Newsletter, vol. 5(2), pp. 160–162. ACM Press, New York (2003)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics ACL 2002 (2002)
Fortuna, B., Grobelnik, M., Mladenic, D.: Semi-automatic Data-driven Ontology Construction System. In: Proceedings of the 9th International Multi-conference Information Society IS-2006, Ljubljana, Slovenia (2006)
Fortuna, B., Mladenic, D., Grobelnik, M.: Visualization of Text Document Corpus. Informatica 29, 497–502 (2005)
Grcar, M., Mladenic, D., Grobelnik, M., Bontcheva, K.: D2.1: Data Source Analysis and Method Selection. Project report IST-2004-026460 TAO, WP 2, D2.1 (2006)
Grcar, M., Mladenic, D., Grobelnik, M., Fortuna, B., Brank, J.: D2.2: Ontology Learning Implementation. Project report IST-2004-026460 TAO, WP 2, D2.2 (2006)
Maedche, A., Staab, S.: Discovering Conceptual Relations from Text. In: Proc. of ECAI 2000, pp. 321–325 (2001)
Helm, R., Maarek, Y.: Integrating Information Retrieval and Domain Specific Approaches for Browsing and Retrieval in Object-oriented Class Libraries. In: Proceedings of Object-oriented Programming Systems, Languages, and Applications, pp. 47–61. ACM Press, New York, USA (1991)
Mladenic, D., Grobelnik, M.: Visualizing Very Large Graphs Using Clustering Neighborhoods. In: Local Pattern Detection, Dagstuhl Castle, Germany, April 12–16, 2004 (2004)
Mladenic, D., Grobelnik, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electrotechnical and Computer Science Conference ERK 1998, Ljubljana, Slovenia (1998)
Olston, C., Chi, H.E.: ScentTrails: Integrating Browsing and Searching on the Web. In: ACM Transactions on Computer-human Interaction TOCHI, vol. 10(3), pp. 177–197. ACM Press, New York (2003)
Sabou, M.: Building Web Service Ontologies. In: SIKS Dissertation Series No. 2004-4 (2006) ISBN 90-9018400-7
Kamada, T., Kawai, S.: An Algorithm for Drawing General Undirected Graphs. Information Processing Letters 31, 7–15 (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grcar, M., Grobelnik, M., Mladenic, D. (2008). Using Text Mining and Link Analysis for Software Mining. In: Raś, Z.W., Tsumoto, S., Zighed, D. (eds) Mining Complex Data. MCD 2007. Lecture Notes in Computer Science(), vol 4944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68416-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-68416-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68415-2
Online ISBN: 978-3-540-68416-9
eBook Packages: Computer ScienceComputer Science (R0)