A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations

Sutoyo, Edi; Rifai, Achmad Pratama; Risnumawan, Anhar; Saputra, Muhardi

doi:10.1007/s11042-022-11900-9

A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations

Published: 12 January 2022

Volume 81, pages 6413–6431, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Edi Sutoyo ORCID: orcid.org/0000-0002-8413-5070¹,
Achmad Pratama Rifai²,
Anhar Risnumawan³ &
…
Muhardi Saputra¹

546 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The National Examination (UN) is a system of evaluation of education standards for elementary and secondary schools conducted nationally and is also used to equalize the quality between education levels. The national examination aims to determine graduation, national education quality mapping, and also for selection to higher education levels. Over the years the UN has become a benchmark for the standardization of education in Indonesia, meaning that the UN is very much needed to find out the size of the quality of student education and the quality of teaching of a school. The government’s policy regarding the plan to remove the UN system has received public attention. The removal of the UN is planned to be replaced with a competency assessment and character survey. In order to know the public’s sentiments regarding this policy, research needs to be done, one of which is to analyze public sentiment through social media Twitter. In text mining tasks such as text classification and sentiment analysis, careful selection of a term weighting scheme (TWS) can have a significant impact on effectiveness. We tested the effectiveness of six classification algorithms by varying the TWS in the dataset obtained from Twitter. The experimental results showed that overall TF-IGM outperformed TF-IDF on four classification algorithms. Finally, the sentiment analysis of the discourse on the removal of the UN is expected to provide a general picture to the government regarding public opinion from the point of view of data coming from social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Measuring the effect of social media on student academic performance using a social media influence factor model

Article 18 July 2022

References

Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Language in Social Media (LSM 2011), pp 30–38. https://doi.org/10.21105/joss.00764
Al Amrani Y, Lazaar M, El Kadiri KE (2018) A novel hybrid classification approach for sentiment analysis of text document. Int J Electr Comput Eng 8(6):4554–4567. https://doi.org/10.11591/ijece.v8i6.pp4554-4567
Article Google Scholar
Ali F, Kim EK, Kim YG (Nov. 2015) Type-2 fuzzy ontology-based opinion mining and information extraction: a proposal to automate the hotel reservation system. Appl Intell 42(3):481–500. https://doi.org/10.1007/s10489-014-0609-y
Article Google Scholar
Aninditya A, Hasibuan MA, Sutoyo E (2019) Text mining approach using TF-IDF and naive Bayes for classification of exam questions based on cognitive level of Bloom’s taxonomy. In: 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp 112–117. https://doi.org/10.1109/IoTaIS47347.2019.8980428.
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM Press, New York
Google Scholar
Bhargava K, Katarya R (2018) An improved lexicon using logistic regression for sentiment analysis. 2017 Int. Conf. Comput. Commun. Technol. Smart Nation, IC3TSN 2017, vol. 2017-Octob, no. December 2015, pp 332–337. https://doi.org/10.1109/IC3TSN.2017.8284501
Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227
Article MathSciNet MATH Google Scholar
Bourequat W, Mourad H (2021) Sentiment analysis approach for analyzing iPhone release using support vector machine. Int J Adv Data Inf Syst 2(1):36–44. https://doi.org/10.25008/ijadis.v2i1.1216
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13–17-Augu, pp 785–794. https://doi.org/10.1145/2939672.2939785
Chen K, Zhang Z, Long J, Zhang H (2016) Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 66:1339–1351. https://doi.org/10.1016/j.eswa.2016.09.009
Article Google Scholar
Deng X, Liu Q, Deng Y, Mahadevan S (2016) An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Inf Sci (NY) 340–341:250–261. https://doi.org/10.1016/j.ins.2016.01.033
Article Google Scholar
Domeniconi G, Moro G, Pasolini R, Sartori C (2016) A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. Commun Comput Inf Sci 584:39–58. https://doi.org/10.1007/978-3-319-30162-4_4
Article Google Scholar
Farooq U (2017) Negation handling in sentiment analysis at sentence level. J Comput 12(5):470–478. https://doi.org/10.17706/jcp.12.5.470-478
Article Google Scholar
Fauzi MA (2018) Word2Vec model for sentiment analysis of product reviews in Indonesian language. Int J Electr Comput Eng 9(1):525. https://doi.org/10.11591/ijece.v9i1.pp525-530
Article Google Scholar
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, vol vol. 72. Springer, New York
Google Scholar
García-Díaz V, Espada JP, Crespo RG, Pelayo G-Bustelo BC, Cueva Lovelle JM (2018) An approach to improve the accuracy of probabilistic classifiers for decision support systems in sentiment analysis. Appl Soft Comput J 67:822–833. https://doi.org/10.1016/j.asoc.2017.05.038
Article Google Scholar
Gönen M (2007) Analyzing receiver operating characteristic curves with SAS. SAS Institute, Cary
Google Scholar
Han J, Kamber M, Pei J (2012) Data mining, concepts and techniques
Hastie T, Tibshirani R, Friedman J (2009) Random forests. In: The elements of statistical learning. Springer, New York, pp 587–604
Chapter MATH Google Scholar
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Knowl Manag Process 5(2):1
Article Google Scholar
Indonesia PR (2003) Undang-Undang Republik Indonesia Nomor 20 Tahun 2003 Tentang Sistem Pendidikan Nasional. Jakarta Pemerintah Republik Indones
Irfan MR, Fauzi MA, Tibyani T, Mentari ND (2018) Twitter sentiment analysis on 2013 curriculum using ensemble features and K-nearest neighbor. Int J Electr Comput Eng 8(6):5409. https://doi.org/10.11591/ijece.v8i6.pp5409-5414
Article Google Scholar
Khairani NA, Sutoyo E (2020) Application of K-means clustering algorithm for determination of fire-prone areas utilizing hotspots in West Kalimantan Province. Int J Adv Data Inf Syst 1(1):9–16. https://doi.org/10.25008/ijadis.v1i1.7
Article Google Scholar
Kibriya AM, Frank E, Pfahringer B, Holmes G (2004), Multinomial naive bayes for text categorization revisited. In: Australasian Joint Conference on Artificial Intelligence, pp 488–499
Kim S-M, Hovy E (2006) Extracting opinions, opinion holders, and topics expressed in online news media text. In: Proceedings of the Workshop on Sentiment and Subjectivity in Text, pp 1–8. https://doi.org/10.3115/1654641.1654642.
Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers : a decision-tree hybrid. In: KDD, vol. 6319 LNAI, no. 96, pp. 202–207. https://doi.org/10.1007/978-3-642-16530-6_42.
Liao Y, Vemuri VR (2002) Use of k-nearest neighbor classifier for intrusion detection. Comput Secur 21(5):439–448. https://doi.org/10.1016/S0167-4048(02)00514-X
Article Google Scholar
Liu B (May 2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–184. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
Article Google Scholar
Lovins JB (1996) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1–2):22–31. https://doi.org/10.1111/j.1440-1681.1996.tb02836.x
Article Google Scholar
Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
MATH Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp 1–12
Mustafa RU, Nawaz MS, Lali MIU, Zia T, Mehmood W (2017) Predicting the cricket match outcome using crowd opinions on social networks: a comparative study of machine learning methods. Malays J Comput Sci 30(1):63–76
Article Google Scholar
Novendri R, Callista AS, Pratama DN, Puspita CE (2020) Sentiment analysis of YouTube movie trailer comments using Naïve Bayes. Bull Comput Sci Electr Eng 1(1):26–32. https://doi.org/10.25008/bcsee.v1i1.5
Article Google Scholar
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. Proc 7th Int Conf Lang Resour Eval Lr vol. 5, no. 12, pp. 1320–1326. https://doi.org/10.17148/ijarcce.2016.51274.
Puspendik Sejarah Ujian Nasional. [Online]. Available: https://puspendik.kemdikbud.go.id/ujian-nasional-un. (Accessed 08 Mar 2020)
Puspendik Capaian Nasional. [Online]. Available: https://puspendik.kemdikbud.go.id/hasil-un/. (Accessed 08 Mar 2020)
Rameshbhai CJ, Paulose J (2019) Opinion mining on newspaper headlines using SVM and NLP. Int J Electr Comput Eng 9(3):2152–2163. https://doi.org/10.11591/ijece.v9i3.pp2152-2163
Article Google Scholar
Rezaeian N, Novikova G (2020) Persian text classification using naive bayes algorithms and support vector machine algorithm. Indones J Electr Eng Inform 8(1):178–188. https://doi.org/10.11591/ijeei.v8i1.1696
Article Google Scholar
Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503–520. https://doi.org/10.1108/00220410410560582
Article Google Scholar
Salton G, Buckley C (Jan. 1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Article Google Scholar
Shahzad B, Lali I, Nawaz MS, Aslam W, Mustafa R, Mashkoor A (2017) Discovery and classification of user interests on social media. Inf Discov Deliv 45:130–138
Google Scholar
Stehman SV (Oct. 1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89. https://doi.org/10.1016/S0034-4257(97)00083-7
Article Google Scholar
Sutoyo E, Almaarif A (2020) Twitter sentiment analysis of the relocation of Indonesia’s Capital City. Bull Electr Eng Inform 9(04):1620–1630. https://doi.org/10.11591/eei.v9i4.2352
Article Google Scholar
Sutoyo E, Yanto ITR, Saedudin RR, Herawan T (2017) A soft set-based co-occurrence for clustering web user transactions. Telkomnika (Telecommun Comput Electron Control 15(3). https://doi.org/10.12928/TELKOMNIKA.v15i3.6382
Sutoyo E, Yanto ITR, Saadi Y, Chiroma H, Hamid S, Herawan T (2019) A framework for clustering of web users transaction based on soft set theory. In: Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015), vol. 520, pp 307–314. https://doi.org/10.1007/978-981-13-1799-6_32
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188. https://doi.org/10.1613/jair.2934
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

I would like to express my sincere thanks to Prof. Andrea Capiluppi (Department of Software Engineering, Faculty of Science and Engineering, University of Groningen, Netherlands) for his guidance and valuable advice on this research.

Author information

Authors and Affiliations

Department of Information Systems, Telkom University, Bandung, West Java, 40257, Indonesia
Edi Sutoyo & Muhardi Saputra
Department of Mechanical and Industrial Engineering, Faculty of Engineering, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
Achmad Pratama Rifai
Mechatronics Engineering Division, Politeknik Elektronika Negeri Surabaya, Surabaya, 60111, Indonesia
Anhar Risnumawan

Authors

Edi Sutoyo
View author publications
You can also search for this author in PubMed Google Scholar
Achmad Pratama Rifai
View author publications
You can also search for this author in PubMed Google Scholar
Anhar Risnumawan
View author publications
You can also search for this author in PubMed Google Scholar
Muhardi Saputra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edi Sutoyo.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this article. The authors confirmed that the data and the paper are free of plagiarism.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sutoyo, E., Rifai, A.P., Risnumawan, A. et al. A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations. Multimed Tools Appl 81, 6413–6431 (2022). https://doi.org/10.1007/s11042-022-11900-9

Download citation

Received: 01 April 2021
Revised: 06 December 2021
Accepted: 03 January 2022
Published: 12 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11042-022-11900-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Measuring the effect of social media on student academic performance using a social media influence factor model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Measuring the effect of social media on student academic performance using a social media influence factor model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation