ABSTRACT
While performing a task, software developers interact with a myriad of natural language documents. Not all information in these documents is relevant to a developer's task forcing them to filter relevant information from large amounts of irrelevant information. If a developer misses some of the necessary information for her task, she will have an incomplete or incorrect basis from which to complete the task. Many approaches mine relevant text fragments from natural language artifacts. However, existing approaches mine information for pre-defined tasks and from a restricted set of artifacts. I hypothesize that it is possible to design a more generalizable approach that can identify, for a particular task, relevant text across different artifact types establishing relationships between them and facilitating how developers search and locate task-relevant information. To investigate this hypothesis, I propose to match a developer's task to text fragments in natural language artifacts according to their semantics. By semantically matching textual pieces to a developer's task we aim to more precisely identify fragments relevant to a task. To help developers in thoroughly navigating through the identified fragments I also propose to synthesize and group them. Ultimately, this research aims to help developers make more informed decisions regarding their software development task. Dr. Gail C. Murphy supervises this work.
- Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proc. of the 17th Int’l Conf. on Computational Linguistics - Volume 1 (COLING’98). Stroudsburg, PA, USA, 86–90. Google ScholarDigital Library
- G. Bavota. 2016. Mining Unstructured Data in Software Repositories: Current and Future Trends. In 2016 IEEE 23rd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 5. 1–12.Google Scholar
- Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting Missing Information in Bug Descriptions. In Proc. of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). New York, NY, USA, 396–407. Google ScholarDigital Library
- D. Cubranic, G. C. Murphy, J. Singer, and K. S. Booth. 2005. Hipikat: a project memory for software development. IEEE TSE 31, 6 (June 2005), 446–465. Google ScholarDigital Library
- Dipanjan Das, Desai Chen, André FT Martins, Nathan Schneider, and Noah A Smith. 2014. Frame-semantic parsing. Computational linguistics 40, 1 (2014), 9–56. Google ScholarDigital Library
- Klaas Andries de Graaf, Peng Liang, Antony Tang, and Hans van Vliet. 2014. The Impact of Prior Knowledge on Searching in Software Documentation. In Proc. of the 2014 ACM DocEng (DocEng’14). New York, NY, USA, 189–198. Google ScholarDigital Library
- Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999. Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proc. of the 22nd SIGIR. New York, NY, USA, 121–128. Google ScholarDigital Library
- Reid Holmes and Andrew Begel. 2008. Deep Intellisense: A Tool for Rehydrating Evaporated Information. In Proc. of the 2008 Int’l Working Conf. on Mining Software Repositories (MSR’08). New York, NY, USA, 23–26. Google ScholarDigital Library
- Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API Method Recommendation Without Worrying About the task-API Knowledge Gap. In Proc. of the 33rd ACM/IEEE Int’l Conf. on Automated Software Engineering (ASE’18). New York, NY, USA, 293–304. Google ScholarDigital Library
- H. Jiang, J. Zhang, X. Li, Z. Ren, and D. Lo. 2016. A More Accurate Model for Finding Tutorial Segments Explaining APIs. In 2016 IEEE 23rd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. 157–167.Google Scholar
- H. Jiang, J. Zhang, Z. Ren, and T. Zhang. 2017. An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs. In 2017 IEEE/ACM 39th Int’l Conf. on Software Engineering (ICSE’17). 38–48. Google ScholarDigital Library
- A. J. Ko and B. A. Myers and. 2006. A Linguistic Analysis of How People Describe Software Problems. In Visual Languages and Human-Centric Computing (VL/HCC’06). 127–134. Google ScholarDigital Library
- H. Li, Z. Xing, X. Peng, and W. Zhao. 2013. What help do developers seek, when and how?. In 2013 20th Working Conf. on Reverse Engineering (WCRE’13). 142–151.Google Scholar
- R. Lotufo, Z. Malik, and K. Czarnecki. 2012. Modelling the ‘Hurried’ bug report reading process to summarize bug reports. In 2012 28th IEEE Int’l Conf. on Software Maintenance (ICSM’12). 430–439. Google ScholarDigital Library
- Gail C. Murphy, Mik Kersten, Martin P. Robillard, and Davor Čubranić. 2005. The Emergent Structure of Development Tasks. In European Conference on Object-Oriented Programming (ECOOP’05). Berlin, Heidelberg, 33–48. Google ScholarDigital Library
- G. Petrosyan, M. P. Robillard, and R. De Mori. 2015. Discovering Information Explaining API Types Using Text Classification. In 2015 IEEE/ACM 37th IEEE Int’l Conf. on Software Engineering (ICSE’15), Vol. 1. 869–879. Google ScholarDigital Library
- Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643–675.Google Scholar
- Luca Ponzanelli, Simone Scalabrino, Gabriele Bavota, Andrea Mocci, Rocco Oliveto, Massimiliano Di Penta, and Michele Lanza. 2017. Supporting Software Developers with a Holistic Recommender System. In Proc. of the 39th Int’l Conf. on Software Engineering (ICSE’17). Piscataway, NJ, USA, 94–105. Google ScholarDigital Library
- S. Rastkar, G. C. Murphy, and G. Murray. 2010. Summarizing software artifacts: a case study of bug reports. In 2010 ACM/IEEE 32nd Int’l Conf. on Software Engineering (ICSE’10), Vol. 1. 505–514. Google ScholarDigital Library
- Martin P. Robillard and Yam B. Chhetri. 2015. Recommending Reference API Documentation. Empirical Softw. Engg. 20, 6 (Dec. 2015), 1558–1586. Google ScholarDigital Library
- Tefko Saracevic. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology 58, 13 (2007), 1915–1933. Google ScholarCross Ref
- Pete Sawyer, Paul Rayson, and Roger Garside. 2002. REVERE: Support for Requirements Synthesis from Documents. Information Systems Frontiers 4, 3 (Sept. 2002), 343–353. Google ScholarDigital Library
- J. Singer and T. Lethbridge. 1998. Studying Work Practices to Assist Tool Design in Software Engineering. In Proc. of the 6th IWPC. Washington, DC, USA, 173–. Google ScholarDigital Library
- Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In Proc. of the 38th Int’l Conf. on Software Engineering (ICSE’16). New York, NY, USA, 404–415. Abstract 1 Introduction 2 Related Work 3 Proposed Research 3.1 Constructing an Annotated Corpus 3.2 Characterizing Task-relevant Textual Pieces 3.3 Automatically Identifying Task-relevant Information 3.4 Improving Developers' Searches with Task-specific Recommendations 4 Summary References Google ScholarDigital Library
Index Terms
- Helping developers search and locate task-relevant information in natural language documents
Recommendations
Exploring design principles of task elicitation systems for unrestricted natural language documents
EICS '12: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systemsDuring the design of interactive systems, user tasks need to be identified within natural language documents (like interview transcripts, support messages or workshop memos) and be transformed into task models. This time-consuming and error-prone ...
On the Role of Relevance in Natural Language Processing Tasks
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalMany recent Natural Language Processing (NLP) task formulations, such as question answering and fact verification, are implemented as a two-stage cascading architecture. In the first stage an IR system retrieves "relevant'' documents containing the ...
Comments