research-article

Tell them apart: distilling technology differences from crowd-scale comparison discussions

Authors:

Zhenchang Xing,

Yang LiuAuthors Info & Claims

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Pages 214 - 224

https://doi.org/10.1145/3238147.3238208

Published: 03 September 2018 Publication History

Abstract

Developers can use different technologies for many software development tasks in their work. However, when faced with several technologies with comparable functionalities, it is not easy for developers to select the most appropriate one, as comparisons among technologies are time-consuming by trial and error. Instead, developers can resort to expert articles, read official documents or ask questions in QA sites for technology comparison, but it is opportunistic to get a comprehensive comparison as online information is often fragmented or contradictory. To overcome these limitations, we propose the diffTech system that exploits the crowdsourced discussions from Stack Overflow, and assists technology comparison with an informative summary of different comparison aspects. We first build a large database of comparable technologies in software engineering by mining tags in Stack Overflow, and then locate comparative sentences about comparable technologies with natural language processing methods. We further mine prominent comparison aspects by clustering similar comparative sentences and representing each cluster with its keywords. The evaluation demonstrates both the accuracy and usefulness of our model and we implement our approach into a practical website for public use.

References

[1]

Keith Adams and Ole Agesen. 2006. A comparison of software and hardware techniques for x86 virtualization. ACM SIGARCH Computer Architecture News 34, 5 (2006), 2–13.

Digital Library

[2]

Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, Xin Xia, and Bo Zhou. 2017. Extracting and analyzing time-series HCI data from screen-captured task videos. Empirical Software Engineering 22, 1 (2017), 134–174.

Digital Library

[3]

Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654.

Digital Library

[4]

Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Association for Computational Linguistics, 31.

Digital Library

[5]

Chunyang Chen, Sa Gao, and Zhenchang Xing. 2016. Mining analogical libraries in q&a discussions–incorporating relational and categorical knowledge into word embedding. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, Vol. 1. IEEE, 338–348.

[6]

Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14.

Digital Library

[7]

Chunyang Chen and Zhenchang Xing. 2016. Similartech: automatically recommend analogical libraries across different programming languages. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 834–839.

Digital Library

[8]

Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83–92.

[9]

Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 356–366.

[10]

Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites. Proceedings of the ACM on Human-Computer Interaction 1, 32 (2017), 1–32.

Digital Library

[11]

Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450–461.

Digital Library

[12]

Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 744–755.

Digital Library

[13]

Ning Chen, Steven CH Hoi, Shaohua Li, and Xiaokui Xiao. 2015. SimApp: A framework for detecting similar mobile applications by online kernel learning. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 305–314.

Digital Library

[14]

Edward B Fowlkes and Colin L Mallows. 1983. A method for comparing two hierarchical clusterings. Journal of the American statistical association 78, 383 (1983), 553–569.

[15]

Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the national academy of sciences 99, 12 (2002), 7821–7826.

[16]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2017. DeepAM: Migrate APIs with multi-modal sequence to sequence learning. arXiv preprint arXiv:1704.07734 (2017).

Digital Library

[17]

John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100–108.

Digital Library

[18]

Nicholas J Horton and Stuart R Lipsitz. 2001. Multiple imputation in practice: comparison of software packages for regression models with missing variables. The American Statistician 55, 3 (2001), 244–254.

[19]

Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of classification 2, 1 (1985), 193–218.

[20]

Steven L Jones, Andrew J Sullivan, Naveen Cheekoti, Michael D Anderson, and D Malave. 2004. Traffic simulation software comparison study. UTCA report 2217 (2004).

[21]

JunâĂŹichi Kazama and Kentaro Torisawa. 2007. Exploiting Wikipedia as external knowledge for named entity recognition. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 698–707.

[22]

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957–966.

Digital Library

[23]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188–1196.

Digital Library

[24]

Yuanchun Li, Baoxiong Jia, Yao Guo, and Xiangqun Chen. 2017. Mining User Reviews for Mobile App Comparisons. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 75.

Digital Library

[25]

Mario Linares-Vásquez, Andrew Holtzhauer, and Denys Poshyvanyk. 2016. On automatically detecting similar android apps. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. IEEE, 1–10.

[26]

Haibin Ling and Kazunori Okada. 2007. An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29, 5 (2007), 840–853.

Digital Library

[27]

Collin McMillan, Mark Grechanik, and Denys Poshyvanyk. 2012. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 364–374.

Digital Library

[28]

Amir Michail and David Notkin. 1999. Assessing software libraries by browsing similar classes, functions and relationships. In Proceedings of the 21st international conference on Software engineering. ACM, 463–472.

Digital Library

[29]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.

Digital Library

[31]

Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API embedding for API usages and applications. In Software Engineering (ICSE), 2017 IEEE/ACM 39th International Conference on. IEEE, 438– 449.

Digital Library

[32]

Ofir Pele and Michael Werman. 2009. Fast and robust earth mover’s distances. In Computer vision, 2009 IEEE 12th international conference on. IEEE, 460–467.

[33]

Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropybased external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).

[34]

Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11–21.

[35]

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, and Simha Sethumadhavan. 2016. Identifying functionally similar code in complex codebases. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. IEEE, 1–10.

[36]

Cédric Teyton, Jean-Rémy Falleri, and Xavier Blanc. 2013. Automatic discovery of function mappings between similar libraries. In Reverse Engineering (WCRE), 2013 20th Working Conference on. IEEE, 192–201.

[37]

Ferdian Thung, David Lo, and Lingxiao Jiang. 2012. Detecting similar applications with collaborative tagging. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 600–603.

Digital Library

[38]

Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: Nier track. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 804–807.

Digital Library

[39]

Gias Uddin and Foutse Khomh. 2017. Automatic summarization of API reviews. In Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on. IEEE, 159–170.

Digital Library

[40]

Gias Uddin and Foutse Khomh. 2017. Opiner: an opinion search and summarization engine for APIs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 978–983.

Digital Library

[41]

Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837–2854.

Digital Library

[42]

Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, and Nachiket Kapre. 2016. Software-specific named entity recognition in software engineering social content. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, Vol. 1. IEEE, 90–101.

Cited By

Huang QSun YXing ZCao YChen JXu XJin HLu J(2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1145/3680469
Tanzil MUddin GBarcomb AGraziotin DNolte A(2024)"How do people decide?": A Model for Software Library SelectionProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering10.1145/3641822.3641865(1-12)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3641822.3641865
Tanzil MKhan JUddin GRoychoudhury APaiva AAbreu RStorey M(2024)ChatGPT Incorrectness Detection in Software ReviewsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639194(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639194
Show More Cited By

Index Terms

Tell them apart: distilling technology differences from crowd-scale comparison discussions
1. Information systems
  1. Information systems applications
    1. Data mining
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

DiffTech: a tool for differencing similar technologies from question-and-answer discussions
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Developers can use different technologies for different software development tasks in their work. However, when faced with several technologies with comparable functionalities, it can be challenging for developers to select the most appropriate one, as ...
SVM Based Part of Speech Tagger for Malayalam
ITC '10: Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and Computing

This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information ...
A Memory-Based Lemmatizer for Ancient Greek
DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage

In this paper we present the lemmatizer that we developed for Ancient Greek: GLEM. As far as we know, GLEM is the first publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

September 2018

955 pages

ISBN:9781450359375

DOI:10.1145/3238147

General Chair:
Marianne Huchard
University of Montpellier, France
,
Program Chairs:
Christian Kästner
Carnegie Mellon University, USA
,
Gordon Fraser
University of Passau, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
CNRS: Centre National De La Rechercue Scientifique
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '18

Sponsor:

SIGAI
CNRS
SIGSOFT
IEEE-CS

ASE '18: 33rd ACM/IEEE International Conference on Automated Software Engineering

September 3 - 7, 2018

Montpellier, France

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang QSun YXing ZCao YChen JXu XJin HLu J(2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1145/3680469
Tanzil MUddin GBarcomb AGraziotin DNolte A(2024)"How do people decide?": A Model for Software Library SelectionProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering10.1145/3641822.3641865(1-12)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3641822.3641865
Tanzil MKhan JUddin GRoychoudhury APaiva AAbreu RStorey M(2024)ChatGPT Incorrectness Detection in Software ReviewsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639194(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639194
Gu HHe HZhou M(2023)Self-Admitted Library Migrations in Java, JavaScript, and Python Packaging Ecosystems: A Comparative Study2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00064(627-638)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00064
Nam DMyers BVasilescu BHellendoorn V(2023)Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00161(1890-1906)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00161
Wang YWang JZhang HWang KWang Q(2023)What are Pros and Cons? Stance Detection and Summarization on Feature Request2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304865(1-12)Online publication date: 26-Oct-2023
https://doi.org/10.1109/ESEM56168.2023.10304865
Conceição LRodrigues VMeira JMarreiros GNovais P(2022)Supporting Argumentation Dialogues in Group Decision Support Systems: An Approach Based on Dynamic ClusteringApplied Sciences10.3390/app12211089312:21(10893)Online publication date: 27-Oct-2022
https://doi.org/10.3390/app122110893
Yan LKim MHartmann BZhang TGlassman E(2022)Concept-Annotated Examples for Library ComparisonProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545647(1-16)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545647
Wang YWang JZhang HMing XShi LWang QDwyer MDamian DZeller A(2022)Where is your app frustrating users?Proceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510189(2427-2439)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510189
Zhu SZhang ZQin BXiong ASong LDwyer MDamian DZeller A(2022)Learning and programming challenges of rustProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510164(1269-1281)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510164
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten