research-article

A Study of Predicting the Sincerity of a Question Asked Using Machine Learning

Authors:
Tuan Nguyen

Department of Information Technology, King Mongkut's University of Technology North Bangkok, Thailand

Department of Information Technology, King Mongkut's University of Technology North Bangkok, Thailand
View Profile

,
Phayung Meesad

Department of Information Technology Management, King Mongkut's University of Technology North Bangkok, Thailand

Department of Information Technology Management, King Mongkut's University of Technology North Bangkok, Thailand
View Profile

NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information RetrievalDecember 2021Pages 129–134https://doi.org/10.1145/3508230.3508258

Published:08 March 2022Publication History

NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval

Pages 129–134

ABSTRACT

The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.

References

[1] Ohbyung Kwon, Namyeon Lee, Bongsik Shin, Data quality management data usage experience and acquisition intention of big data analytics, International Journal of Information managemnet, 2014.Google ScholarCross Ref
[2] Mohammed AI-Ramani, Izzat Alsmadi,Using data analytics to filter insincere post from online social networks, a case study: Quora insincere question, Computer information system falcuty publications, 2020.Google Scholar
[3] Richard A. Plunza, Yijia Zhoua, Maria Isabel Carrasco Vintimillab, Kathleen Mckeownc, Tao Yud, Laura Uguccionia, Maria Paola Sutto, Twitter sentiment in New York City parks as measure of well-being, Science Direct, 2019.Google Scholar
[4] Tao Yu, Christopher Hidey, Owen Rambow, Kathleen McKeown, Leveraging sparse and dense feature combinations for sentiment classification, Science Direct, 2017.Google Scholar
[5] Maryam Khanian Najafabadi, Mohd Naz’ri Mahrin, Suriayati Chuprat, Haslina Md Sarkan, Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data, 2016.Google Scholar
[6] Maity, Suman Kalyan, Kharb, Aman,Mukherjee, Animesh, Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora, 2017, http://arxiv.org/abs/1703.04001.Google Scholar
[7] Pradeep Kumar Roy, Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora, Neural Processing Letters, 2020.Google Scholar
[8]Zhou Kaixin, Nai Wei, Zhu Shuoxian, Zhang Shupei, Xing Ying, Yang, Zan, Li Dan, Logistic Regression Based on Bat-Inspired Algorithm with Gaussian Initialization, IEEE 5th Advanced Information Technology Electronic and Automation Control Conference (IAEAC), 2021.Google Scholar
[9]Du Qiming, Li Nan, Yang Shudan, Sun Daozhu, Liu Wenfu, Integrating KNN and Gradient Boosting Decision Tree for Recommendation, IEEE 5th Advanced Information Technology Electronic and Automation Control Conference (IAEAC), 2021.Google Scholar
[10]Gardner Charles; Lo Dan Chia-Tien, PCA Embedded Random Forest, SoutheastCon, 2021.Google Scholar
[11]Haoran Xu, Prediction on Bundesliga Games Based on Decision Tree Algorithm, IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), 2021.Google Scholar
[12]Lei Ren, Jiabao Dong, Xiaokang Wang, Zihao Meng,Li Zhao, and M. Jamal Deen, A Data-Driven Auto-CNN-LSTM Prediction Model for Lithium-Ion Battery Remaining Useful Life, IEEE Transactions Industrial Informatics, 2021.Google Scholar
[13]Vikash Kumar Sainia, Bhawana Bhardwajb, Vishu Guptac, Rajesh Kumarc, Akhilesh Mathurc, Gated Recurrent Unit (GRU) Based Short Term Forecasting for Wind Energy Estimation, International Conference on Power, Energy, Control and Transmission Systems, 2020.Google Scholar
[14]Kelly Anthony; Johnson Marc Anthony, Investigating the Statistical Assumptions of Naïve Bayes Classifiers, Annual Conference on Information Sciences and Systems, 2021.Google Scholar
[15]Pang Dong, Le Xinyi, Indoor Localization Using Bidirectional LSTM Networks, 13th International Conference on Advanced Computational Intelligence,2021.Google Scholar
[16]Cheng Y, Sun H, Chen H, Li M, Cai Y, Cai Z, Huang J, Sentiment Analysis Using Multi-Head Attention Capsules With Multi-Channel CNN and Bidirectional GRU, IEEE Access, 2021.Google ScholarCross Ref
[17] Liu Lei, Research on Logistic Regression Algorithm of Breast Cancer Diagnose Data by Machine Learning, International Conference on Robots & Intelligent System (ICRIS), 2018.Google Scholar
[18] Sakai Yu, Yang Chen, Kihira Shingo, Tsankova Nadejda, Khan Fahad, Hormigo Adilia, Lai Albert, Cloughesy Timothy, Nael Kambiz, MRI Radiomic Features to Predict IDH1 Mutation Status in Gliomas: A Machine Learning Approach using Gradient Tree Boosting, International Journal of Molecular Sciences, 2020.Google Scholar
[19]Mishra Shivam, Shukla Aakash, Arora Sandeep, Kathuria Himandhu, Singh Mandeep, Controlling Weather Dependent Tasks Using Random Forest Algorithm, Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), 2020.Google Scholar
[20]Do Nascimento Priscilla Machado, Medeiros Inácio Gomes, Falcão Raul Maia, Stransky, Beatriz, de Souza Jorge Estefano Santana, A decision tree to improve identification of pathogenic mutations in clinical practice, BMC Medical Informatics & Decision Making, 2020.Google Scholar
[21] B S Sharmila, Rohini Nagapadma, Intrusion Detection System using Naive Bayes algorithm, IEEE International WIE Conference on Electrical and Computer Engineering, 2019.Google Scholar

Recommendations

Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Read More
Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case
SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications

Machine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering ...
Read More
Dropout prediction in Moocs using deep learning and machine learning
Abstract
The nature of teaching and learning has evolved over the years, especially as technology has evolved. Innovative application of educational analytics has gained momentum. Indeed, predictive analytics have become increasingly salient in education. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval
December 2021
175 pages
ISBN:9781450387354
DOI:10.1145/3508230

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 March 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Quora
insincere
machine learning
question
sincere
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 55
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Study of Predicting the Sincerity of a Question Asked Using Machine Learning

NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Recommendations

Machine Learning: The State of the Art

Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case

Dropout prediction in Moocs using deep learning and machine learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Study of Predicting the Sincerity of a Question Asked Using Machine Learning

NLPIR '21: Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Recommendations

Machine Learning: The State of the Art

Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case

Dropout prediction in Moocs using deep learning and machine learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media