short-paper

Helping developers search and locate task-relevant information in natural language documents

Author:
Arthur Marques

University of British Columbia, Canada

University of British Columbia, Canada
View Profile

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2019Pages 1168–1171https://doi.org/10.1145/3338906.3341459

Published:12 August 2019Publication History

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1168–1171

ABSTRACT

While performing a task, software developers interact with a myriad of natural language documents. Not all information in these documents is relevant to a developer's task forcing them to filter relevant information from large amounts of irrelevant information. If a developer misses some of the necessary information for her task, she will have an incomplete or incorrect basis from which to complete the task. Many approaches mine relevant text fragments from natural language artifacts. However, existing approaches mine information for pre-defined tasks and from a restricted set of artifacts. I hypothesize that it is possible to design a more generalizable approach that can identify, for a particular task, relevant text across different artifact types establishing relationships between them and facilitating how developers search and locate task-relevant information. To investigate this hypothesis, I propose to match a developer's task to text fragments in natural language artifacts according to their semantics. By semantically matching textual pieces to a developer's task we aim to more precisely identify fragments relevant to a task. To help developers in thoroughly navigating through the identified fragments I also propose to synthesize and group them. Ultimately, this research aims to help developers make more informed decisions regarding their software development task. Dr. Gail C. Murphy supervises this work.

References

Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proc. of the 17th Int’l Conf. on Computational Linguistics - Volume 1 (COLING’98). Stroudsburg, PA, USA, 86–90. Google ScholarDigital Library
G. Bavota. 2016. Mining Unstructured Data in Software Repositories: Current and Future Trends. In 2016 IEEE 23rd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 5. 1–12.Google Scholar
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting Missing Information in Bug Descriptions. In Proc. of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). New York, NY, USA, 396–407. Google ScholarDigital Library
D. Cubranic, G. C. Murphy, J. Singer, and K. S. Booth. 2005. Hipikat: a project memory for software development. IEEE TSE 31, 6 (June 2005), 446–465. Google ScholarDigital Library
Dipanjan Das, Desai Chen, André FT Martins, Nathan Schneider, and Noah A Smith. 2014. Frame-semantic parsing. Computational linguistics 40, 1 (2014), 9–56. Google ScholarDigital Library
Klaas Andries de Graaf, Peng Liang, Antony Tang, and Hans van Vliet. 2014. The Impact of Prior Knowledge on Searching in Software Documentation. In Proc. of the 2014 ACM DocEng (DocEng’14). New York, NY, USA, 189–198. Google ScholarDigital Library
Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999. Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proc. of the 22nd SIGIR. New York, NY, USA, 121–128. Google ScholarDigital Library
Reid Holmes and Andrew Begel. 2008. Deep Intellisense: A Tool for Rehydrating Evaporated Information. In Proc. of the 2008 Int’l Working Conf. on Mining Software Repositories (MSR’08). New York, NY, USA, 23–26. Google ScholarDigital Library
Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API Method Recommendation Without Worrying About the task-API Knowledge Gap. In Proc. of the 33rd ACM/IEEE Int’l Conf. on Automated Software Engineering (ASE’18). New York, NY, USA, 293–304. Google ScholarDigital Library
H. Jiang, J. Zhang, X. Li, Z. Ren, and D. Lo. 2016. A More Accurate Model for Finding Tutorial Segments Explaining APIs. In 2016 IEEE 23rd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. 157–167.Google Scholar
H. Jiang, J. Zhang, Z. Ren, and T. Zhang. 2017. An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs. In 2017 IEEE/ACM 39th Int’l Conf. on Software Engineering (ICSE’17). 38–48. Google ScholarDigital Library
A. J. Ko and B. A. Myers and. 2006. A Linguistic Analysis of How People Describe Software Problems. In Visual Languages and Human-Centric Computing (VL/HCC’06). 127–134. Google ScholarDigital Library
H. Li, Z. Xing, X. Peng, and W. Zhao. 2013. What help do developers seek, when and how?. In 2013 20th Working Conf. on Reverse Engineering (WCRE’13). 142–151.Google Scholar
R. Lotufo, Z. Malik, and K. Czarnecki. 2012. Modelling the ‘Hurried’ bug report reading process to summarize bug reports. In 2012 28th IEEE Int’l Conf. on Software Maintenance (ICSM’12). 430–439. Google ScholarDigital Library
Gail C. Murphy, Mik Kersten, Martin P. Robillard, and Davor Čubranić. 2005. The Emergent Structure of Development Tasks. In European Conference on Object-Oriented Programming (ECOOP’05). Berlin, Heidelberg, 33–48. Google ScholarDigital Library
G. Petrosyan, M. P. Robillard, and R. De Mori. 2015. Discovering Information Explaining API Types Using Text Classification. In 2015 IEEE/ACM 37th IEEE Int’l Conf. on Software Engineering (ICSE’15), Vol. 1. 869–879. Google ScholarDigital Library
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological review 106, 4 (1999), 643–675.Google Scholar
Luca Ponzanelli, Simone Scalabrino, Gabriele Bavota, Andrea Mocci, Rocco Oliveto, Massimiliano Di Penta, and Michele Lanza. 2017. Supporting Software Developers with a Holistic Recommender System. In Proc. of the 39th Int’l Conf. on Software Engineering (ICSE’17). Piscataway, NJ, USA, 94–105. Google ScholarDigital Library
S. Rastkar, G. C. Murphy, and G. Murray. 2010. Summarizing software artifacts: a case study of bug reports. In 2010 ACM/IEEE 32nd Int’l Conf. on Software Engineering (ICSE’10), Vol. 1. 505–514. Google ScholarDigital Library
Martin P. Robillard and Yam B. Chhetri. 2015. Recommending Reference API Documentation. Empirical Softw. Engg. 20, 6 (Dec. 2015), 1558–1586. Google ScholarDigital Library
Tefko Saracevic. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology 58, 13 (2007), 1915–1933. Google ScholarCross Ref
Pete Sawyer, Paul Rayson, and Roger Garside. 2002. REVERE: Support for Requirements Synthesis from Documents. Information Systems Frontiers 4, 3 (Sept. 2002), 343–353. Google ScholarDigital Library
J. Singer and T. Lethbridge. 1998. Studying Work Practices to Assist Tool Design in Software Engineering. In Proc. of the 6th IWPC. Washington, DC, USA, 173–. Google ScholarDigital Library
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In Proc. of the 38th Int’l Conf. on Software Engineering (ICSE’16). New York, NY, USA, 404–415. Abstract 1 Introduction 2 Related Work 3 Proposed Research 3.1 Constructing an Annotated Corpus 3.2 Characterizing Task-relevant Textual Pieces 3.3 Automatically Identifying Task-relevant Information 3.4 Improving Developers' Searches with Task-specific Recommendations 4 Summary References Google ScholarDigital Library

Index Terms

Helping developers search and locate task-relevant information in natural language documents
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation

Recommendations

Exploring cognitive structures of information systems developers
Read More
Exploring design principles of task elicitation systems for unrestricted natural language documents
EICS '12: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems

During the design of interactive systems, user tasks need to be identified within natural language documents (like interview transcripts, support messages or workshop memos) and be transformed into task models. This time-consuming and error-prone ...
Read More
On the Role of Relevance in Natural Language Processing Tasks
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Many recent Natural Language Processing (NLP) task formulations, such as question answering and fact verification, are implemented as a two-stage cascading architecture. In the first stage an IR system retrieves "relevant'' documents containing the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
General Chairs:
Marlon Dumas
University of Tartu, Estonia
,
Dietmar Pfahl
University of Tartu, Estonia
,
Program Chairs:
Sven Apel
Saarland University, Germany
,
Alessandra Russo
Imperial College, UK
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Information Overload
Natural Language Artifacts
Relevance
Semantics
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 155
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Helping developers search and locate task-relevant information in natural language documents

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring cognitive structures of information systems developers

Exploring design principles of task elicitation systems for unrestricted natural language documents

On the Role of Relevance in Natural Language Processing Tasks