Abstract
Bug triaging of deciding whom to fix the bug has been studied actively. However, existing work does not consider varying cost of the same bug over developers with diverse backgrounds and experiences. In clear contrast, we argue the “cost” of one bug can be low for one developer, while high for another. Based on this view, we study an automatic triaging system considering both accuracy and cost. Our preliminary solution, CosTriage, models user-specific experiences and estimated cost on each bug category, obtained from topic modeling, and assigns the bug to the developer who not only can, but also is expected to fix fast. For user-specific cost modeling, we are inspired by recommender system work, of estimating user-specific rating of items, e.g., movies. With this view, existing triaging work of categorizing bugs and assigning developers with experiences in the category falls into content-based recommendation (CBR). However, CBR is well known to cause overspecialization because it recommends only the types of bugs that each developer has solved before. This problem is critical because the experienced developers can become overloaded with bugs they hate to fix, though there exist other categories he can fix faster. CosTriage adopts content-boosted collaborative filtering (CBCF), considering not only similar bugs (content-based) but similar developers (collaborative) for estimating user-specific cost. In this paper, we extend to include special scenarios. First, bug may not have textual report (e.g., crash report) or textual report may lack a topic word (e.g., 1957 of 48,424 in Mozilla reports) Mozilla reports. Second, in some scenarios, developer profiles may change over time. For these scenarios, we extend CosTriage to support non-textual description and dynamic profiles, which we denote as CosTriage+. Our experimental evaluation shows that our solution reduces the cost efficiently by 30 % without seriously compromising accuracy in comparison with the baseline only considering accuracy.






Similar content being viewed by others
Notes
Tai’s algorithm has \(O(V_1\times V_2\times D_1^2\times D_2^2)\) time complexity, where \(V_i\) is the number of nodes in \(T(\mathcal {I}_{\mathcal {B}_i})\), and \(D_i\) is the depth of \(T(\mathcal {I}_{\mathcal {B}_i})\).
Zhang’s algorithm has \(O(V_1\times V_2\times \hbox {min}(L_1,D_1)\times \hbox {min}(L_2,D_2)\) time complexity, where \(L_i\) denotes the number of leaves in \(T_i\). An implementation is available at http://web.science.mq.edu.au/~swan/howtos/treedistance/.
Apache, https://issues.apache.org/bugzilla/.
Eclipse, https://bugs.eclipse.org/bugs/.
Linux kernel, https://bugzilla.kernel.org/.
Mozilla, https://bugzilla.mozilla.org/.
Since the test set of Apache only has 131 bugs, several bug types have no or few bugs. In Fig. 6, we only show the bug types with more than 2.5 %.
References
Park J, Lee M-W, Kim J, Hwang S, Kim S (2011) Costriage: a cost-aware triage algorithm for bug reporting systems. In: AAAI
Anvik J (2007) Assisting bug report triage through recommendation. PhD thesis, University of British Columbia
Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: ICSE
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: ICSE
Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol 20(3):10
Čubranić D (2004) Automatic bug triage using text categorization. In: SEKE
Canfora G, Cerulo L (2006) Supporting change request assignment in open source development. In: Proceedings of the 2006 ACM symposium on applied computing
Canfora G, Cerulo L (2005) How software repositories can help in resolving a new change request. In: Workshop on empirical studies in reverse engineering
di Lucca G (2002) An approach to classify software maintenance requests. In: ICSM
Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: MSR
Tamrawi A, Nguyen TT, Al-Kofahi JM, Nguyen TN (2011) Fuzzy set and cache-based approach for bug triaging. In: ESEC/FSE
Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: MSR
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: MSR
Rahman MM, Ruhe G, Zimmermann T (2009) Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects. In: ESEM
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bettenburg N, Premraj R, Zimmermann T, Kim Sunghun (2008) Duplicate bug reports considered harmful... really? In: ICSM
Chen L, Wang X, Liu C (2011) An approach to improving bug assignment with bug tossing graphs and bug similarities. J Softw 6(3):421–427
Xuan J, Jiang H, Ren Z, Yan J, Luo Z (2010) Automatic bug triage using semi-supervised text classification. In: SEKE
Bhattacharya P, Neamtiu I (2010) Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: ICSM
Lin Z, Shu F, Yang Y, Hu C, Wang Q (2009) An empirical study on bug assignment automation using chinese bug data. In: ESEM
Kim J, Lee S, Hwang S, Kim S (2009) Adding examples into java documents. In: ASE
Kim J, Lee S, Hwang S, Kim S (2010) Towards an intelligent code search engine. In: AAAI
Kim J, Lee S, Hwang S, Kim S (2013) Enriching documents with examples: a corpus mining approach. ACM Trans Inf Syst 31(1):1
Lee M-W, Roh J-W, Hwang S, Kim S (2010) Instant code clone search. In: FSE
Park J, Lee M-W, Roh J-W, Hwang S, Kim S (2014) Surfacing code in the dark: an instant clone search approach. Knowl Inf Syst 41(3):727–759
Melville P, Mooney RJ, Nagarajan R (2002) Content-boosted collaborative filtering for improved recommendations. In: AAAI
Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN (2010) On finding the natural number of topics with latent dirichlet allocation: some observations. In: PAKDD
Cao J, Xia T, Li J, Zhang Y, Tang S (2009) A density-based method for adaptive lda model selection. Neurocomputing 72(7–9):1775–1781
Zavitsanos E, Petridis S, Paliouras G, Vouros GA (2008) Determining automatically the size of learned ontologies. In: ECAI
Herlocker J, Konstan JA, Riedl J (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf Retr 5(4):287–310
Ma H, King I, Lyu MR (2007) Effective missing data prediction for collaborative filtering. In: SIGIR
Allan J (1996) Incremental relevance feedback for information filtering. In: SIGIR
Chen Z, Jiang Y, Zhao Y (2010) A collaborative filtering recommendation algorithm based on user interest change and trust evaluation. In: JDCTA
Lathia N, Hailes S, Capra L, Amatriain X (2010) Temporal diversity in recommender systems. In: SIGIR
Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: SDAIR
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structural information from bug reports. In: MSR
Microsoft (2010) Windows error reporting: getting started. http://www.microsoft.com/whdc/winlogo/maintain/StartWER.mspx
Mozilla (2010) Crash stats. http://crash-stats.mozilla.com
Apple (2010) Technical note TN2123: CrashReporter
Tai K-C (1979) The tree-to-tree correction problem. J Assoc Comput Mach
Chen W (2001) New algorithm for ordered tree-to-tree correction problem. J Algorithms 40(2):135–158
Demaine ED, Mozes S, Rossman B, Weimann O (2009) An optimal decomposition algorithm for tree edit distance. ACM Trans Algorithms 6(1):2
Dulucq S, Touzet H (2003) Analysis of tree edit distance algorithms. In: CPM
Klein PN (1998) Computing the edit-distance between unrooted ordered trees. In: Proceedings of the 6th annual European symposium on algorithms
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262
Bremner D, Demaine E, Erickson J, Iacono J, Langerman S, Morin P, Toussaint G (2005) Output-sensitive algorithms for computing nearest-neighbour decision boundaries. In: Algorithms and Data Structures. Proceedings of 8th International Workshop, WADS 2003, Ottawa, Ontario, Canada, July 30-August 1,2003. Springer, Heidelberg, pp 451–461
Coomans D, Massart DL (1982) Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. k-Nearest neighbour classification by using alternative voting rules. Anal Chim Acta
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Brown RG (1963) Smoothing, forecasting and prediction of discrete time series. Prentice-Hall, Englewood Cliffs
Weron R, Weron K, Weron A (1999) A conditionally exponential decay approach to scaling in finance. Phys A Stat Theor Phys 264(3–4):551–561
Han J, Kamber M (2006) Data mining: concepts and techniques
Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: ICSE
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: SIGSOFT FSE
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: ASE
Aranda J, Venolia G (2009) The secret life of bugs: going past the errors and omissions in software repositories. In: ICSE
Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: RSSE
Acknowledgments
This work was supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIP) (No. 10041244, SmartTV 2.0 Software Platform).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work builds on and extends our preliminary work [1].
Rights and permissions
About this article
Cite this article
Park, Jw., Lee, MW., Kim, J. et al. Cost-aware triage ranking algorithms for bug reporting systems. Knowl Inf Syst 48, 679–705 (2016). https://doi.org/10.1007/s10115-015-0893-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0893-9