skip to main content
10.1145/2554850.2555117acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Automatic categorization of questions from Q&A sites

Published: 24 March 2014 Publication History

Abstract

Q&A sites are attracting growing interest of software developers. The categorization of questions in terms of user concerns would open new opportunities to extract valuable information from millions of posts.
This paper presents a comparison between different classification algorithms to find the one that best classifies questions from Q&A sites, such as, Stack Overflow. In the classification process, we used the following classification algorithms: Naive Bayes, Multilayer Perceptron, Support Vector Machine, K-Nearest Neighbors, J4.8 Decision Tree and Random Forests.
We conducted an experimental study with Stack Overflow questions with posts equally divided into three domain categories: How-to-do-it, Need-to-know and Seeking-something. The attributes were extracted from a textual analysis of the title and body of each question. We considered a total of 8 attributes to get the data for each question. We found a classifier with an overall success rate of 84.16% and 92.5% on How-to-do-it category.

References

[1]
G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003.
[2]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, pages 10--18, 2009.
[3]
S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, PTR Upper Saddle River, NJ, USA, 1998.
[4]
V. Lempitsky, M. Verhoek, A. Noble, and A. Blake. Random forest classification for automatic delineation of myocardium in real-time 3d echocardiography. pages 449--449, 2009. Springer Verlag.
[5]
M. Linares-Vasquez, C. McMillan, D. Poshyvanyk, and M. Grechanik. On using machine learning to automatically classify software applications into domain categories. In Empirical Software Engineering, pages 7--8, 2009. Springer US.
[6]
S. Nasehi, J. Sillito, F. Maurer, and C. Burns. What makes a good code example?: A study of programming q&a in stackoverflow. In ICSM, pages 25--34. IEEE Computer Society, 2012.
[7]
C. Park and D. Kim. Cross-validation. In Progress in neurological surgery, pages 1--12, 2012.
[8]
L. Sehgal, N. Mohan, and P. S. Sandhu. Quality prediction of function based software using decision tree approach. In ICCEMT, pages 43--44, 2012.

Cited By

View all
  • (2024)Semantic Web Approaches in Stack OverflowInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35861720:1(1-61)Online publication date: 9-Nov-2024
  • (2020)A Strategy to Enhance Computer Science Teaching Material Using Topic ModellingProceedings of the 51st ACM Technical Symposium on Computer Science Education10.1145/3328778.3366858(366-371)Online publication date: 25-Feb-2020
  • (2019)Categorizing the Content of GitHub README FilesEmpirical Software Engineering10.1007/s10664-018-9660-324:3(1296-1327)Online publication date: 1-Jun-2019
  • Show More Cited By

Index Terms

  1. Automatic categorization of questions from Q&A sites

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied Computing
    March 2014
    1890 pages
    ISBN:9781450324694
    DOI:10.1145/2554850
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 March 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Q&A sites
    2. bayes
    3. classification algorithms
    4. data mining
    5. pattern recognition

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SAC 2014
    Sponsor:
    SAC 2014: Symposium on Applied Computing
    March 24 - 28, 2014
    Gyeongju, Republic of Korea

    Acceptance Rates

    SAC '14 Paper Acceptance Rate 218 of 939 submissions, 23%;
    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semantic Web Approaches in Stack OverflowInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35861720:1(1-61)Online publication date: 9-Nov-2024
    • (2020)A Strategy to Enhance Computer Science Teaching Material Using Topic ModellingProceedings of the 51st ACM Technical Symposium on Computer Science Education10.1145/3328778.3366858(366-371)Online publication date: 25-Feb-2020
    • (2019)Categorizing the Content of GitHub README FilesEmpirical Software Engineering10.1007/s10664-018-9660-324:3(1296-1327)Online publication date: 1-Jun-2019
    • (2019)Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity RecognitionProgress in Artificial Intelligence10.1007/978-3-030-30244-3_60(731-743)Online publication date: 30-Aug-2019
    • (2018)A survey on mining stack overflow: question and answering (Q&A) communityData Technologies and Applications10.1108/DTA-07-2017-005452:2(190-247)Online publication date: 3-Apr-2018
    • (2018)Discovering Top Experts for Trending Domains on Stack OverflowProcedia Computer Science10.1016/j.procs.2018.10.404143(333-340)Online publication date: 2018
    • (2017)On-demand Developer Documentation2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME.2017.17(479-483)Online publication date: Sep-2017
    • (2016)Redocumenting APIs with crowd knowledge: a coverage analysis based on question typesJournal of the Brazilian Computer Society10.1186/s13173-016-0049-022:1Online publication date: 9-Dec-2016
    • (2016)Automated API Documentation with Tutorials Generated From Stack OverflowProceedings of the XXX Brazilian Symposium on Software Engineering10.1145/2973839.2973847(33-42)Online publication date: 19-Sep-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media