Discovering the Structures of Open Source Programs from Their Developer Mailing Lists

Nguyen, Dinh Anh; Doi, Koichiro; Yamamoto, Akihiro

doi:10.1007/978-3-642-04747-3_19

Dinh Anh Nguyen²³,
Koichiro Doi²³ &
Akihiro Yamamoto²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5808))

Included in the following conference series:

International Conference on Discovery Science

1911 Accesses

Abstract

This paper presents a method which discovers the structure of given open source programs from their developer mailing lists. Our goal is to help successive developers understand the structures and the components of open source programs even if documents about them are not provided sufficiently. Our method consists of two phases: (1) producing a mapping between the source files and the emails, and (2) constructing a lattice from the produced mapping and then reducing it with a novel algorithm, called PRUNIA (PRUNing Algorithm Based on Introduced Attributes), in order to obtain a more compact structure. We performed experiments with some open source projects which are originally from or popular in Japan such as Namazu and Ruby. The experimental results reveal that the extracted structures reflect very well important parts of the hidden structures of the programs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chasen, http://chasen.naist.jp/hiki/ChaSen/
Cimitile, A., Visaggio, G.: Software salvaging and the call dominance tree. Journal of Systems and Software 28(2), 117–127 (1995)
Article Google Scholar
Ganter, B., Wille, R.: Applied lattice theory–Formal concept analysis. In: Gratzer, G. (ed.) General Lattice Theory. Birkhauser, Basel (1997)
Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis–Mathematical Foundations. Springer, Heidelberg (1999)
Book MATH Google Scholar
HOS, http://sourceforge.jp/projects/hos/
Lindig, C.: Colibri–command line tool for concept analysis, http://www.st.cs.uni-saarland.de/~lindig/
Lindig, C., Snelting, G.: Assessing modular structure of legacy code based on mathematical concept analysis. In: Proceedings of the 19th International Conference on Software Engineering (ICSE 1997), pp. 349–359 (1997)
Google Scholar
Namazu, http://www.namazu.org/
Nicolas, P., Yves, B., Rafik, T., Lotfi, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24, 25–46 (1999)
Article Google Scholar
Rasinen, A., Hollmen, J., Mannila, H.: Analysis of Linux evolution using aligned source code segments. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 209–218. Springer, Heidelberg (2006)
Chapter Google Scholar
Ruby, http://www.ruby-lang.org/
Schwanke, R.W.: An intelligent tool for re-engineering software modularity. In: Proceedings of the 13th International Conference on Software Engineering (ICSE 1991), pp. 83–92. IEEE Computer Society Press, Los Alamitos (1991)
Chapter Google Scholar
Snelting, G.: Concept analysis–A new framework for program understanding. In: Proceedings of the 1998 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 1998), pp. 1–10. ACM, New York (1998)
Chapter Google Scholar
Tanaka, K., Akaishi, M., Takasu, A.: Topic change extraction and reorganization from problem-solving records. In: Proceedings of International Conference on Software Knowledge Information Management and Applications, pp. 153–158 (2006)
Google Scholar
Tang, J., Li, H., Cao, Y., Tang, Z.: Email data cleaning. In: Proceedings of the 11th International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 489–498 (2005)
Google Scholar
Washizaki, H., Fukazawa, Y.: A technique for automatic component extraction from object-oriented programs by refactoring. Sci. Comput. Program. 56(1-2), 99–116 (2005)
Article MathSciNet MATH Google Scholar
Wille, R.: Restructuring lattice theory–An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht (1982)
Chapter Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Dinh Anh Nguyen, Koichiro Doi & Akihiro Yamamoto

Authors

Dinh Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Koichiro Doi
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics; Rua Dr. Roberto Frias, University of Porto, 4200-465, Porto, Portugal
João Gama
DCC-FC, Universidade do Porto, Portugal
Vítor Santos Costa
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D.A., Doi, K., Yamamoto, A. (2009). Discovering the Structures of Open Source Programs from Their Developer Mailing Lists. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-04747-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics