An approach to generate the bug report summaries using two-level feature extraction
Introduction
Changing requirements in this agile world demand for the continuous maintenance and the evolution of a software. During the software development, coding accounts for just the 20 percent of the efforts while the other activities account for about 80 percent of the efforts. Testing, maintenance and evolution are the most important activities which require a lot of effort. During every phase of the software development, some artifacts are created which serves as the information management components of the software like requirement document during the requirement analysis phase, design documents during designing phase, test cases, bug reports, etc. Online Software Repositories are used to manage and maintain this information.
Bug Report is one of the artifacts which are created during the testing, and the maintenance phase. Bug reports are usually stored in the software bug repositories. Bug report is a very versatile artifact which may contain a very structured data or may have a lot of technical dumps, stack traces, code, etc. It also contains opinions and ideas. Bug reports help developers understand how a similar bug has been fixed and how the changes were made in the system in the past. Few of the open software bug repositories include Bugzilla.2 Bug Report consists of the information like title, description, comments, authorship information and the timestamp related information. It is a conversational artifact and resembles emails and discussion threads. Usually the comments are written based on the previous contexts. (Rastkar, Murphy, & Murray, 2014) considered the bug reports similar to email conversations and thus applied the classifiers used for email conversations and noticed that the classifiers work equally well in both the artifacts. Bug reports not only contains the bug resolving information, but sometimes also the insights about the enhancements and improvements for the system. In the informal conversation section of the bug report, the developers also gives the suggestions or the alternatives to the existing approaches. The comments are many times the agreement to the previous comment, sometimes the disagreement and sometimes the alternative to the existing approach. Thus, they are one of the most valuable software artifacts and one of the most mined artifact to get the interesting insights. To resolve a bug and understand the bug report, a developer has to go through all the comments which is a very time-consuming and a tedious task (Kukkar & Mohana, 2019). (Lotufo, Malik, & Czarnecki, 2015) mentioned in their paper how bug reports refer to other bug reports also. Especially during regression, the bug reports need to be consulted a lot. Comments help improve the knowledge about the bug report. But at the same time, comments increase the difficulty in understanding the bug report as different contributors may discuss the bug in different contexts which makes the discussion multi-threaded.
One of the solution to reduce the effort and time for performing this task is automatic bug report summarization. It is to find the summary of a bug report automatically in order to help the developers understand the bug report quickly. Summarizing a Bug Report is not an easy task. (Yang, Cheng-Min, & Chung, 2018) et al. have also mentioned that the presence of technical terms poses a big challenge for a developer to understand the bug report and thus make this process a very tedious and time-consuming. Even the searching of a right bug report takes a lot of time as there are usually number of duplicate bug reports in the bug repositories. Even though the bug reports can be so valuable for developers and maintenance team but they are usually written without the intention of easy follow-up (Lotufo et al., 2015). Thus, the automatic bug report summarization faces the challenges of huge data space of comments as they contain lot of words, sentences and the selection of the appropriate sentence is not an easy task. (Mani, Sankaran, & Aralikatte, 2019) mentioned that a bug report on an average contains at least more than 60 to sometimes more than 300 sentences. Sentence selection and scoring also poses another challenge of managing the accuracy and speed of summarization process. Including the semantic context along with the sentence selection is also one of the challenge during the summarization process. Sparsity of data and reduction of data are also few of the challenges for the summarization. Resolving the ambiguity and noisiness of the data in bug reports is again a very big challenge. From the various researches it has been found that one out of five bug reports, is duplicate. Automatic Summarization of Bug Reports not only help in increasing the comprehensibility of the document but also helps in many activities related to it like identification of duplicate Bug Reports, classification of Bug Reports, etc. Analyzing the bug reports, also help improve the quality of software.
Following are the main points which have been observed about the Bug Reports by the research community:
- •
In a paper by (Lotufo et al., 2015), it has been observed that most of the Bug Reports refer to the previous Bug Reports or they are referred in other Bug Reports which are generated because of the existing Bug Report
- •
In case of open source projects, the collaborators are random which leads to the issues of the quality of Bug Reports and usually there is no proper structure followed in the Bug Report.
- •
In the Bug Reports there is a provision for anybody to comment and thus the Bug Report becomes more like an informal chat and resembles more like a meeting conversations or email thread conversations or a chat.
The hypothesis on which we propose the approach are.
- •
Timestamp is a key factor for deciding the important comments from the Bug Report
- •
The sentences possessing more similarity with the title and the description of the Bug Report are important
- •
Noise filtering is very important to remove the irrelevant comments from the Bug Reports
- •
Authorship plays a significant role in determining the relevance of the comment
In order to create a summary which includes the information from all title, description and comments together, we propose a solution which uses the features and the unsupervised approach PageRank along with the natural language generation to create a flexible and informative summary. In our approach, we have not used the Machine Learning techniques as (Mani et al., 2019) in their paper showed that unsupervised approaches have outperformed the BRC Classifier. Moreover the availability of large dataset for training is still an open issue in the field of summarization.
The Main contributions of the paper includes:
- •
Understanding of the Bug Report and its structure
- •
Discussion and the classification of the features which describe the Bug Report. We have classified the features into the comment-specific and the sentence-specific features.
- •
Rather than using all the features together for all the sentences of bug report; finding of the important comments first and then applying few selective sentence-specific features to find the important sentences in those comments.
- •
Incorporating the domain-specific information and the semantic information to the Bug Report Summarization
- •
Combining the description summary and comments summary to enrich the summaries.
The paper is organised as follows: Section 2 states the related works in this field. Section 3 discusses our approach. Section 4 discusses the results and discussion of the approach. Finally the conclusion and the future work that needs to be done.
Section snippets
Related work
Summarization techniques are divided into single document and multi-document summarization when number of documents considered for summarization are taken into the consideration. Summarization is divided into extractive and abstractive summarization when the way to select the important sentences and representation is considered. Extractive summarization is when the important sentences are extracted from the text and arranged to form the summary. Abstractive summarization is when the sentences
Proposed approach
The extractive summarization approaches are based upon the extraction of the important sentences. We have used the extractive approach for our work due to the fact that abstractive summarization is a costly and a complex process.
The proposed approach is an extension of the concept used by (Murray & Carenini, 2008) which (Rastkar et al., 2010), (He et al., 2017), (Yang et al., 2018) have used in their papers for implementing the bug report summarization. (Rastkar et al., 2010) used the machine
Dataset
For the experiments purpose, we have used the modified BRC dataset5 as used in (He et al., 2017; Lotufo et al., 2015; Rastkar et al., 2014). The dataset consists of reports from four open source projects which includes Eclipse, Mozilla, Gnome and KDE. Here the bugs are from four open source projects namely Eclipse, Mozilla, KDE and Gnome. The corpus consists of 28 bug reports. There are nine master bug reports and 19 are duplicate bug reports. Along
Conclusion and future work
Bug Reports are the valuable resources for improving the quality of software and for resolving the bugs during the software evolution and maintenance. Reading the complete bug report fully and locating the bug report is one of the very time consuming and tedious task. Automatic summarization helps understand the bug report quickly and help in locating the bug and finding the duplicate bugs. We have created a two level feature extraction approach where we have used selective features for comment
CRediT authorship contribution statement
Som Gupta: Conceptualization, Methodology, Writing - original draft, Visualization, Investigation, Writing - review & editing. Sanjai Kumar Gupta: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (26)
- et al.
Assessing sentence scoring techniques for extractive text summarization
Expert Systems with Applications
(2013) - Carenini, G., Ng, R. T., & Zhou, X. (2008). Summarizing emails with conversational cohesion and subjectivity. In...
New methods in automatic extracting
Journal of the ACM (JACM)
(1969)- et al.
Recent automatic text summarization techniques: A survey
(2017) - Gupta, S., & Gupta, S. (2018). Abstractive summarization: An overview of the state of the art. Expert Systems with...
- et al.
Prst: A pagerank-based summarization technique for summarizing bug reports with duplicates
International Journal of Software Engineering and Knowledge Engineering
(2017) - Huai, B., Li, W., Wu, Q., & Wang, M. (2018). Mining intentions to improve bug report summarization (pp....
- Jafari, M., Wang, J., Qin, Y., Gheisari, M., Shahabi, A., & Tao, X. (2016). Automatic text summarization using fuzzy...
- Jagadeesh, J., Varma, V., & Pingali, P. (2005). Sentence extraction based single document summarization. Workshop on...
- Kukkar, A., & Mohana, R. (2019). Bug report summarization by using swarm intelligence approaches. Recent Patents on...
An optimization technique for unsupervised automatic extractive bug report summarization
Text summarization using enhanced mmr technique
Cited by (12)
A network-based feature extraction model for imbalanced text data
2022, Expert Systems with ApplicationsCitation Excerpt :The features of text data refer to the language units (words) or phrases (Collobert et al., 2011), and even characters, which also perform good results in specific tasks (Zhang et al., 2015). There are numerous machine learning methods to achieve feature extraction, and their common advantage is that they are well adaptive to different tasks (Hassan et al., 2015; Junejo et al., 2016; Prihatini et al., 2018; Zhao, & Mao, 2018; Gupta, & Gupta, 2021; Yan et al., 2020). In recent years, the success of distributed representation of words (Mikolov et al., 2013; Le, & Mikolov, 2014; Pennington et al., 2014; Devlin et al., 2019) has inspired researchers to learn the features of text through training on the neural network (Kim, 2014; Hu et al., 2014; Yin et al., 2016; Foland, & Martin, 2017, Liang et al, 2017; Young et al., 2018).
A knowledge-graph based text summarization scheme for mobile edge computing
2024, Journal of Cloud ComputingNRPredictor: an ensemble learning and feature selection based approach for predicting the non-reproducible bugs
2023, International Journal of System Assurance Engineering and ManagementA Combination of Classification and Summarization Techniques for Bug Report Summarization
2023, CEUR Workshop ProceedingsImprovement of maintenance-based Product-Service System offering through field data: a case study
2023, Production and Manufacturing Research
- 1
ORCID ID: https://orcid.org/0000-0002-1476-084X