skip to main content
10.1145/3106237.3119875acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
short-paper

On the similarity of software development documentation

Published: 21 August 2017 Publication History

Abstract

Software developers spent 20% of their time on information seeking on Stack Overflow, YouTube or an API reference documentation. Software developers can search within Stack Overflow for duplicates or similar posts. They can also take a look on software development documentations that have similar and additional information included as a Stack Overflow post or a development screencast in order to get new inspirations on how to solve their current development problem. The linkage of same and different types of software development documentation might safe time to evolve new software solutions and might increase the productivity of the developer’s work day. In this paper we will discuss our approach to get a broader understanding of different similarity types (exact, similar and maybe) within and between software documentation as well as an understanding of how different software documentations can be extended.

References

[1]
2007. Duplicate Bugs. (2007). https://blogs.msdn.microsoft.com/alanpa/2007/08/ 01/duplicate-bugs/ 2016. Duplicate Bugs. (2016). https://meta.stackexchange.com/questions/10841/ how-should-duplicate-questions-be-handled 2017. Definition of an artefact. (2017). https://en.oxforddictionaries.com/ definition/artefact
[2]
Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K Roy, and Kevin A Schneider. 2016. Mining duplicate questions in stack overflow. In Proceedings of the 13th International Conference on Mining Software Repositories. ACM, 402–412.
[3]
Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013.
[4]
Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76–81.
[5]
Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2012. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 850–858.
[6]
Jeff Atwood. 2009. Handling Duplicate Questions. (2009). http://blog. stackoverflow.com/2009/04/handling-duplicate-questions/
[7]
Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmfulâĂę really?. In Software maintenance, 2008. ICSM 2008. IEEE international conference on. IEEE, 337–345.
[8]
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
[9]
Roger B Bradford. 2008. An empirical study of required dimensionality for large-scale latent semantic indexing applications. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 153–162.
[10]
Bernd Bruegge and Allen H Dutoit. 2004. Object-Oriented Software Engineering Using UML, Patterns and Java-(Required). Prentice Hall.
[11]
Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 74–77.
[12]
Jack G Conrad, Xi S Guo, and Cindy P Schriber. 2003. Online duplicate document detection: signature reliability in a dynamic retrieval environment. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 443–452.
[13]
Denzil Correa and Ashish Sureka. 2013. Fit or unfit: analysis and prediction of’closed questions’ on stack overflow. ACM.
[14]
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391.
[15]
Mathias Ellmann, Alxander Oeser, Davide Fucci, and Walid Maalej. 2017. Find, Understand, and Extend Development Screencasts on YouTube. In Proceedings of the 3rd International Workshop on Software Analytics. ACM.
[16]
Thomas Fritz and Gail C Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 175–184.
[17]
Anna Huang. 2008. Similarity measures for text document clustering. In Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand. 49–56.
[18]
Mik Kersten and Gail C Murphy. 2006. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 1–11.
[19]
Andrew J Ko, Robert DeLine, and Gina Venolia. 2007. Information needs in collocated software development teams. In Software Engineering, 2007. ICSE 2007.
[20]
29th International Conference on. IEEE, 344–353.
[21]
E Kodhai, S Kanmani, A Kamatchi, R Radhika, and B Vijaya Saranya. 2010. Detection of type-1 and type-2 code clones using textual analysis and metrics. In Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on. IEEE, 241–243.
[22]
Klaus Krippendorff. 2012. Content analysis: An introduction to its methodology. Sage.
[23]
Timothy C Lethbridge, Janice Singer, and Andrew Forward. 2003. How software engineers use documentation: The state of the practice. IEEE software 20, 6 (2003), 35–39.
[24]
Walid Maalej. 2009. Task-first or context-first? tool integration revisited. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 344–355.
[25]
Walid Maalej, Mathias Ellmann, and Romain Robbes. 2016. Using contexts similarity to predict relationships between tasks. Journal of Systems and Software (2016).
[26]
Walid Maalej and Hans-Jörg Happel. 2010. Can development work describe itself?. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on. IEEE, 191–200.
[27]
Walid Maalej and Martin P Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264– 1282.
[28]
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 4 (2014), 31.
[29]
Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, camera, action: how software developers document and share program knowledge using YouTube. In Program Comprehension (ICPC), 2015 IEEE 23rd International Conference on. IEEE, 104–114.
[30]
Tim Menzies, Laurie Williams, and Thomas Zimmermann. 2016. Perspectives on Data Science for Software Engineering. Morgan Kaufmann.
[31]
Seung-Taek Park, David M Pennock, C Lee Giles, and Robert Krovetz. 2002. Analysis of lexical signatures for finding lost or related documents. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 11–18.
[32]
pyLDAvis. 2014. Python library for interactive topic model visualization. (2014). https://github.com/bmabey/pyLDAvis
[33]
Martin P Robillard, Walid Maalej, Robert J Walker, and Thomas Zimmermann. 2014. Recommendation systems in software engineering. Springer Science & Business.
[34]
Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming 74, 7 (2009), 470–495.
[35]
Carson Sievert and Kenneth E Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces. 63–70.
[36]
Janice Singer, Timothy Lethbridge, Norman Vinson, and Nicolas Anquetil. 2010. An examination of software engineering work practices. In CASCON First Decade High Impact Papers. IBM Corp., 174–188.
[37]
Rebecca Tiarks and Walid Maalej. 2014. How does a typical tutorial for mobile development look like?. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 272–281.
[38]
Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: Nier track. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 804–807.

Cited By

View all
  • (2023)What kinds of contracts do ML APIs need?Empirical Software Engineering10.1007/s10664-023-10320-z28:6Online publication date: 17-Oct-2023
  • (2018)Natural language processing (NLP) applied on issue trackersProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering10.1145/3283812.3283825(38-41)Online publication date: 4-Nov-2018
  • (2018)API DocumentationTrends and Advances in Information Systems and Technologies10.1007/978-3-319-77712-2_22(229-239)Online publication date: 17-May-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
August 2017
1073 pages
ISBN:9781450351058
DOI:10.1145/3106237
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Similarity Types
  2. Software Analytics
  3. Software Development Documentation

Qualifiers

  • Short-paper

Conference

ESEC/FSE'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)What kinds of contracts do ML APIs need?Empirical Software Engineering10.1007/s10664-023-10320-z28:6Online publication date: 17-Oct-2023
  • (2018)Natural language processing (NLP) applied on issue trackersProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering10.1145/3283812.3283825(38-41)Online publication date: 4-Nov-2018
  • (2018)API DocumentationTrends and Advances in Information Systems and Technologies10.1007/978-3-319-77712-2_22(229-239)Online publication date: 17-May-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media