skip to main content
10.1145/3238147.3238208acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Tell them apart: distilling technology differences from crowd-scale comparison discussions

Published: 03 September 2018 Publication History

Abstract

Developers can use different technologies for many software development tasks in their work. However, when faced with several technologies with comparable functionalities, it is not easy for developers to select the most appropriate one, as comparisons among technologies are time-consuming by trial and error. Instead, developers can resort to expert articles, read official documents or ask questions in QA sites for technology comparison, but it is opportunistic to get a comprehensive comparison as online information is often fragmented or contradictory. To overcome these limitations, we propose the diffTech system that exploits the crowdsourced discussions from Stack Overflow, and assists technology comparison with an informative summary of different comparison aspects. We first build a large database of comparable technologies in software engineering by mining tags in Stack Overflow, and then locate comparative sentences about comparable technologies with natural language processing methods. We further mine prominent comparison aspects by clustering similar comparative sentences and representing each cluster with its keywords. The evaluation demonstrates both the accuracy and usefulness of our model and we implement our approach into a practical website for public use.

References

[1]
Keith Adams and Ole Agesen. 2006. A comparison of software and hardware techniques for x86 virtualization. ACM SIGARCH Computer Architecture News 34, 5 (2006), 2–13.
[2]
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, Xin Xia, and Bo Zhou. 2017. Extracting and analyzing time-series HCI data from screen-captured task videos. Empirical Software Engineering 22, 1 (2017), 134–174.
[3]
Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654.
[4]
Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Association for Computational Linguistics, 31.
[5]
Chunyang Chen, Sa Gao, and Zhenchang Xing. 2016. Mining analogical libraries in q&a discussions–incorporating relational and categorical knowledge into word embedding. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, Vol. 1. IEEE, 338–348.
[6]
Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14.
[7]
Chunyang Chen and Zhenchang Xing. 2016. Similartech: automatically recommend analogical libraries across different programming languages. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 834–839.
[8]
Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83–92.
[9]
Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 356–366.
[10]
Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites. Proceedings of the ACM on Human-Computer Interaction 1, 32 (2017), 1–32.
[11]
Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450–461.
[12]
Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 744–755.
[13]
Ning Chen, Steven CH Hoi, Shaohua Li, and Xiaokui Xiao. 2015. SimApp: A framework for detecting similar mobile applications by online kernel learning. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 305–314.
[14]
Edward B Fowlkes and Colin L Mallows. 1983. A method for comparing two hierarchical clusterings. Journal of the American statistical association 78, 383 (1983), 553–569.
[15]
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the national academy of sciences 99, 12 (2002), 7821–7826.
[16]
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2017. DeepAM: Migrate APIs with multi-modal sequence to sequence learning. arXiv preprint arXiv:1704.07734 (2017).
[17]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100–108.
[18]
Nicholas J Horton and Stuart R Lipsitz. 2001. Multiple imputation in practice: comparison of software packages for regression models with missing variables. The American Statistician 55, 3 (2001), 244–254.
[19]
Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of classification 2, 1 (1985), 193–218.
[20]
Steven L Jones, Andrew J Sullivan, Naveen Cheekoti, Michael D Anderson, and D Malave. 2004. Traffic simulation software comparison study. UTCA report 2217 (2004).
[21]
JunâĂŹichi Kazama and Kentaro Torisawa. 2007. Exploiting Wikipedia as external knowledge for named entity recognition. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 698–707.
[22]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957–966.
[23]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188–1196.
[24]
Yuanchun Li, Baoxiong Jia, Yao Guo, and Xiangqun Chen. 2017. Mining User Reviews for Mobile App Comparisons. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 75.
[25]
Mario Linares-Vásquez, Andrew Holtzhauer, and Denys Poshyvanyk. 2016. On automatically detecting similar android apps. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. IEEE, 1–10.
[26]
Haibin Ling and Kazunori Okada. 2007. An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29, 5 (2007), 840–853.
[27]
Collin McMillan, Mark Grechanik, and Denys Poshyvanyk. 2012. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 364–374.
[28]
Amir Michail and David Notkin. 1999. Assessing software libraries by browsing similar classes, functions and relationships. In Proceedings of the 21st international conference on Software engineering. ACM, 463–472.
[29]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
[31]
Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API embedding for API usages and applications. In Software Engineering (ICSE), 2017 IEEE/ACM 39th International Conference on. IEEE, 438– 449.
[32]
Ofir Pele and Michael Werman. 2009. Fast and robust earth mover’s distances. In Computer vision, 2009 IEEE 12th international conference on. IEEE, 460–467.
[33]
Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropybased external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).
[34]
Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11–21.
[35]
Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, and Simha Sethumadhavan. 2016. Identifying functionally similar code in complex codebases. In Program Comprehension (ICPC), 2016 IEEE 24th International Conference on. IEEE, 1–10.
[36]
Cédric Teyton, Jean-Rémy Falleri, and Xavier Blanc. 2013. Automatic discovery of function mappings between similar libraries. In Reverse Engineering (WCRE), 2013 20th Working Conference on. IEEE, 192–201.
[37]
Ferdian Thung, David Lo, and Lingxiao Jiang. 2012. Detecting similar applications with collaborative tagging. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 600–603.
[38]
Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: Nier track. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 804–807.
[39]
Gias Uddin and Foutse Khomh. 2017. Automatic summarization of API reviews. In Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on. IEEE, 159–170.
[40]
Gias Uddin and Foutse Khomh. 2017. Opiner: an opinion search and summarization engine for APIs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 978–983.
[41]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837–2854.
[42]
Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, and Nachiket Kapre. 2016. Software-specific named entity recognition in software engineering social content. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, Vol. 1. IEEE, 90–101.

Cited By

View all
  • (2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
  • (2024)"How do people decide?": A Model for Software Library SelectionProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering10.1145/3641822.3641865(1-12)Online publication date: 14-Apr-2024
  • (2024)ChatGPT Incorrectness Detection in Software ReviewsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639194(1-12)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. Tell them apart: distilling technology differences from crowd-scale comparison discussions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
      September 2018
      955 pages
      ISBN:9781450359375
      DOI:10.1145/3238147
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 September 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. NLP
      2. Stack Overflow
      3. differencing similar technology

      Qualifiers

      • Research-article

      Conference

      ASE '18
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 82 of 337 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
      • (2024)"How do people decide?": A Model for Software Library SelectionProceedings of the 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering10.1145/3641822.3641865(1-12)Online publication date: 14-Apr-2024
      • (2024)ChatGPT Incorrectness Detection in Software ReviewsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639194(1-12)Online publication date: 20-May-2024
      • (2023)Self-Admitted Library Migrations in Java, JavaScript, and Python Packaging Ecosystems: A Comparative Study2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00064(627-638)Online publication date: Mar-2023
      • (2023)Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00161(1890-1906)Online publication date: May-2023
      • (2023)What are Pros and Cons? Stance Detection and Summarization on Feature Request2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304865(1-12)Online publication date: 26-Oct-2023
      • (2022)Supporting Argumentation Dialogues in Group Decision Support Systems: An Approach Based on Dynamic ClusteringApplied Sciences10.3390/app12211089312:21(10893)Online publication date: 27-Oct-2022
      • (2022)Concept-Annotated Examples for Library ComparisonProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545647(1-16)Online publication date: 29-Oct-2022
      • (2022)Where is your app frustrating users?Proceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510189(2427-2439)Online publication date: 21-May-2022
      • (2022)Learning and programming challenges of rustProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510164(1269-1281)Online publication date: 21-May-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media