skip to main content
10.1145/3508230.3508258acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

A Study of Predicting the Sincerity of a Question Asked Using Machine Learning

Authors Info & Claims
Published:08 March 2022Publication History

ABSTRACT

The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.

References

  1. [1] Ohbyung Kwon, Namyeon Lee, Bongsik Shin, Data quality management data usage experience and acquisition intention of big data analytics, International Journal of Information managemnet, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Mohammed AI-Ramani, Izzat Alsmadi,Using data analytics to filter insincere post from online social networks, a case study: Quora insincere question, Computer information system falcuty publications, 2020.Google ScholarGoogle Scholar
  3. [3] Richard A. Plunza, Yijia Zhoua, Maria Isabel Carrasco Vintimillab, Kathleen Mckeownc, Tao Yud, Laura Uguccionia, Maria Paola Sutto, Twitter sentiment in New York City parks as measure of well-being, Science Direct, 2019.Google ScholarGoogle Scholar
  4. [4] Tao Yu, Christopher Hidey, Owen Rambow, Kathleen McKeown, Leveraging sparse and dense feature combinations for sentiment classification, Science Direct, 2017.Google ScholarGoogle Scholar
  5. [5] Maryam Khanian Najafabadi, Mohd Naz’ri Mahrin, Suriayati Chuprat, Haslina Md Sarkan, Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data, 2016.Google ScholarGoogle Scholar
  6. [6] Maity, Suman Kalyan, Kharb, Aman,Mukherjee, Animesh, Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora, 2017, http://arxiv.org/abs/1703.04001.Google ScholarGoogle Scholar
  7. [7] Pradeep Kumar Roy, Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora, Neural Processing Letters, 2020.Google ScholarGoogle Scholar
  8. [8]Zhou Kaixin, Nai Wei, Zhu Shuoxian, Zhang Shupei, Xing Ying, Yang, Zan, Li Dan, Logistic Regression Based on Bat-Inspired Algorithm with Gaussian Initialization, IEEE 5th Advanced Information Technology Electronic and Automation Control Conference (IAEAC), 2021.Google ScholarGoogle Scholar
  9. [9]Du Qiming, Li Nan, Yang Shudan, Sun Daozhu, Liu Wenfu, Integrating KNN and Gradient Boosting Decision Tree for Recommendation, IEEE 5th Advanced Information Technology Electronic and Automation Control Conference (IAEAC), 2021.Google ScholarGoogle Scholar
  10. [10]Gardner Charles; Lo Dan Chia-Tien, PCA Embedded Random Forest, SoutheastCon, 2021.Google ScholarGoogle Scholar
  11. [11]Haoran Xu, Prediction on Bundesliga Games Based on Decision Tree Algorithm, IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), 2021.Google ScholarGoogle Scholar
  12. [12]Lei Ren, Jiabao Dong, Xiaokang Wang, Zihao Meng,Li Zhao, and M. Jamal Deen, A Data-Driven Auto-CNN-LSTM Prediction Model for Lithium-Ion Battery Remaining Useful Life, IEEE Transactions Industrial Informatics, 2021.Google ScholarGoogle Scholar
  13. [13]Vikash Kumar Sainia, Bhawana Bhardwajb, Vishu Guptac, Rajesh Kumarc, Akhilesh Mathurc, Gated Recurrent Unit (GRU) Based Short Term Forecasting for Wind Energy Estimation, International Conference on Power, Energy, Control and Transmission Systems, 2020.Google ScholarGoogle Scholar
  14. [14]Kelly Anthony; Johnson Marc Anthony, Investigating the Statistical Assumptions of Naïve Bayes Classifiers, Annual Conference on Information Sciences and Systems, 2021.Google ScholarGoogle Scholar
  15. [15]Pang Dong, Le Xinyi, Indoor Localization Using Bidirectional LSTM Networks, 13th International Conference on Advanced Computational Intelligence,2021.Google ScholarGoogle Scholar
  16. [16]Cheng Y, Sun H, Chen H, Li M, Cai Y, Cai Z, Huang J, Sentiment Analysis Using Multi-Head Attention Capsules With Multi-Channel CNN and Bidirectional GRU, IEEE Access, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Liu Lei, Research on Logistic Regression Algorithm of Breast Cancer Diagnose Data by Machine Learning, International Conference on Robots & Intelligent System (ICRIS), 2018.Google ScholarGoogle Scholar
  18. [18] Sakai Yu, Yang Chen, Kihira Shingo, Tsankova Nadejda, Khan Fahad, Hormigo Adilia, Lai Albert, Cloughesy Timothy, Nael Kambiz, MRI Radiomic Features to Predict IDH1 Mutation Status in Gliomas: A Machine Learning Approach using Gradient Tree Boosting, International Journal of Molecular Sciences, 2020.Google ScholarGoogle Scholar
  19. [19]Mishra Shivam, Shukla Aakash, Arora Sandeep, Kathuria Himandhu, Singh Mandeep, Controlling Weather Dependent Tasks Using Random Forest Algorithm, Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), 2020.Google ScholarGoogle Scholar
  20. [20]Do Nascimento Priscilla Machado, Medeiros Inácio Gomes, Falcão Raul Maia, Stransky, Beatriz, de Souza Jorge Estefano Santana, A decision tree to improve identification of pathogenic mutations in clinical practice, BMC Medical Informatics & Decision Making, 2020.Google ScholarGoogle Scholar
  21. [21] B S Sharmila, Rohini Nagapadma, Intrusion Detection System using Naive Bayes algorithm, IEEE International WIE Conference on Electrical and Computer Engineering, 2019.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval
    December 2021
    175 pages
    ISBN:9781450387354
    DOI:10.1145/3508230

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 March 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)3

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format