Skip to main content

Advertisement

Log in

A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The National Examination (UN) is a system of evaluation of education standards for elementary and secondary schools conducted nationally and is also used to equalize the quality between education levels. The national examination aims to determine graduation, national education quality mapping, and also for selection to higher education levels. Over the years the UN has become a benchmark for the standardization of education in Indonesia, meaning that the UN is very much needed to find out the size of the quality of student education and the quality of teaching of a school. The government’s policy regarding the plan to remove the UN system has received public attention. The removal of the UN is planned to be replaced with a competency assessment and character survey. In order to know the public’s sentiments regarding this policy, research needs to be done, one of which is to analyze public sentiment through social media Twitter. In text mining tasks such as text classification and sentiment analysis, careful selection of a term weighting scheme (TWS) can have a significant impact on effectiveness. We tested the effectiveness of six classification algorithms by varying the TWS in the dataset obtained from Twitter. The experimental results showed that overall TF-IGM outperformed TF-IDF on four classification algorithms. Finally, the sentiment analysis of the discourse on the removal of the UN is expected to provide a general picture to the government regarding public opinion from the point of view of data coming from social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Language in Social Media (LSM 2011), pp 30–38. https://doi.org/10.21105/joss.00764

  2. Al Amrani Y, Lazaar M, El Kadiri KE (2018) A novel hybrid classification approach for sentiment analysis of text document. Int J Electr Comput Eng 8(6):4554–4567. https://doi.org/10.11591/ijece.v8i6.pp4554-4567

    Article  Google Scholar 

  3. Ali F, Kim EK, Kim YG (Nov. 2015) Type-2 fuzzy ontology-based opinion mining and information extraction: a proposal to automate the hotel reservation system. Appl Intell 42(3):481–500. https://doi.org/10.1007/s10489-014-0609-y

    Article  Google Scholar 

  4. Aninditya A, Hasibuan MA, Sutoyo E (2019) Text mining approach using TF-IDF and naive Bayes for classification of exam questions based on cognitive level of Bloom’s taxonomy. In: 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp 112–117. https://doi.org/10.1109/IoTaIS47347.2019.8980428.

  5. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM Press, New York

    Google Scholar 

  6. Bhargava K, Katarya R (2018) An improved lexicon using logistic regression for sentiment analysis. 2017 Int. Conf. Comput. Commun. Technol. Smart Nation, IC3TSN 2017, vol. 2017-Octob, no. December 2015, pp 332–337. https://doi.org/10.1109/IC3TSN.2017.8284501

  7. Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227

    Article  MathSciNet  MATH  Google Scholar 

  8. Bourequat W, Mourad H (2021) Sentiment analysis approach for analyzing iPhone release using support vector machine. Int J Adv Data Inf Syst 2(1):36–44. https://doi.org/10.25008/ijadis.v2i1.1216

    Article  Google Scholar 

  9. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  10. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13–17-Augu, pp 785–794. https://doi.org/10.1145/2939672.2939785

  11. Chen K, Zhang Z, Long J, Zhang H (2016) Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 66:1339–1351. https://doi.org/10.1016/j.eswa.2016.09.009

    Article  Google Scholar 

  12. Deng X, Liu Q, Deng Y, Mahadevan S (2016) An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Inf Sci (NY) 340–341:250–261. https://doi.org/10.1016/j.ins.2016.01.033

    Article  Google Scholar 

  13. Domeniconi G, Moro G, Pasolini R, Sartori C (2016) A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. Commun Comput Inf Sci 584:39–58. https://doi.org/10.1007/978-3-319-30162-4_4

    Article  Google Scholar 

  14. Farooq U (2017) Negation handling in sentiment analysis at sentence level. J Comput 12(5):470–478. https://doi.org/10.17706/jcp.12.5.470-478

    Article  Google Scholar 

  15. Fauzi MA (2018) Word2Vec model for sentiment analysis of product reviews in Indonesian language. Int J Electr Comput Eng 9(1):525. https://doi.org/10.11591/ijece.v9i1.pp525-530

    Article  Google Scholar 

  16. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, vol vol. 72. Springer, New York

    Google Scholar 

  17. García-Díaz V, Espada JP, Crespo RG, Pelayo G-Bustelo BC, Cueva Lovelle JM (2018) An approach to improve the accuracy of probabilistic classifiers for decision support systems in sentiment analysis. Appl Soft Comput J 67:822–833. https://doi.org/10.1016/j.asoc.2017.05.038

    Article  Google Scholar 

  18. Gönen M (2007) Analyzing receiver operating characteristic curves with SAS. SAS Institute, Cary

    Google Scholar 

  19. Han J, Kamber M, Pei J (2012) Data mining, concepts and techniques

  20. Hastie T, Tibshirani R, Friedman J (2009) Random forests. In: The elements of statistical learning. Springer, New York, pp 587–604

    Chapter  MATH  Google Scholar 

  21. Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Knowl Manag Process 5(2):1

    Article  Google Scholar 

  22. Indonesia PR (2003) Undang-Undang Republik Indonesia Nomor 20 Tahun 2003 Tentang Sistem Pendidikan Nasional. Jakarta Pemerintah Republik Indones

  23. Irfan MR, Fauzi MA, Tibyani T, Mentari ND (2018) Twitter sentiment analysis on 2013 curriculum using ensemble features and K-nearest neighbor. Int J Electr Comput Eng 8(6):5409. https://doi.org/10.11591/ijece.v8i6.pp5409-5414

    Article  Google Scholar 

  24. Khairani NA, Sutoyo E (2020) Application of K-means clustering algorithm for determination of fire-prone areas utilizing hotspots in West Kalimantan Province. Int J Adv Data Inf Syst 1(1):9–16. https://doi.org/10.25008/ijadis.v1i1.7

    Article  Google Scholar 

  25. Kibriya AM, Frank E, Pfahringer B, Holmes G (2004), Multinomial naive bayes for text categorization revisited. In: Australasian Joint Conference on Artificial Intelligence, pp 488–499

  26. Kim S-M, Hovy E (2006) Extracting opinions, opinion holders, and topics expressed in online news media text. In: Proceedings of the Workshop on Sentiment and Subjectivity in Text, pp 1–8. https://doi.org/10.3115/1654641.1654642.

  27. Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers : a decision-tree hybrid. In: KDD, vol. 6319 LNAI, no. 96, pp. 202–207. https://doi.org/10.1007/978-3-642-16530-6_42.

  28. Liao Y, Vemuri VR (2002) Use of k-nearest neighbor classifier for intrusion detection. Comput Secur 21(5):439–448. https://doi.org/10.1016/S0167-4048(02)00514-X

    Article  Google Scholar 

  29. Liu B (May 2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–184. https://doi.org/10.2200/S00416ED1V01Y201204HLT016

    Article  Google Scholar 

  30. Lovins JB (1996) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1–2):22–31. https://doi.org/10.1111/j.1440-1681.1996.tb02836.x

    Article  Google Scholar 

  31. Manning CD, Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  32. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp 1–12

  33. Mustafa RU, Nawaz MS, Lali MIU, Zia T, Mehmood W (2017) Predicting the cricket match outcome using crowd opinions on social networks: a comparative study of machine learning methods. Malays J Comput Sci 30(1):63–76

    Article  Google Scholar 

  34. Novendri R, Callista AS, Pratama DN, Puspita CE (2020) Sentiment analysis of YouTube movie trailer comments using Naïve Bayes. Bull Comput Sci Electr Eng 1(1):26–32. https://doi.org/10.25008/bcsee.v1i1.5

    Article  Google Scholar 

  35. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. Proc 7th Int Conf Lang Resour Eval Lr vol. 5, no. 12, pp. 1320–1326. https://doi.org/10.17148/ijarcce.2016.51274.

  36. Puspendik Sejarah Ujian Nasional. [Online]. Available: https://puspendik.kemdikbud.go.id/ujian-nasional-un. (Accessed 08 Mar 2020)

  37. Puspendik Capaian Nasional. [Online]. Available: https://puspendik.kemdikbud.go.id/hasil-un/. (Accessed 08 Mar 2020)

  38. Rameshbhai CJ, Paulose J (2019) Opinion mining on newspaper headlines using SVM and NLP. Int J Electr Comput Eng 9(3):2152–2163. https://doi.org/10.11591/ijece.v9i3.pp2152-2163

    Article  Google Scholar 

  39. Rezaeian N, Novikova G (2020) Persian text classification using naive bayes algorithms and support vector machine algorithm. Indones J Electr Eng Inform 8(1):178–188. https://doi.org/10.11591/ijeei.v8i1.1696

    Article  Google Scholar 

  40. Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503–520. https://doi.org/10.1108/00220410410560582

    Article  Google Scholar 

  41. Salton G, Buckley C (Jan. 1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523. https://doi.org/10.1016/0306-4573(88)90021-0

    Article  Google Scholar 

  42. Shahzad B, Lali I, Nawaz MS, Aslam W, Mustafa R, Mashkoor A (2017) Discovery and classification of user interests on social media. Inf Discov Deliv 45:130–138

    Google Scholar 

  43. Stehman SV (Oct. 1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89. https://doi.org/10.1016/S0034-4257(97)00083-7

    Article  Google Scholar 

  44. Sutoyo E, Almaarif A (2020) Twitter sentiment analysis of the relocation of Indonesia’s Capital City. Bull Electr Eng Inform 9(04):1620–1630. https://doi.org/10.11591/eei.v9i4.2352

    Article  Google Scholar 

  45. Sutoyo E, Yanto ITR, Saedudin RR, Herawan T (2017) A soft set-based co-occurrence for clustering web user transactions. Telkomnika (Telecommun Comput Electron Control 15(3). https://doi.org/10.12928/TELKOMNIKA.v15i3.6382

  46. Sutoyo E, Yanto ITR, Saadi Y, Chiroma H, Hamid S, Herawan T (2019) A framework for clustering of web users transaction based on soft set theory. In: Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015), vol. 520, pp 307–314. https://doi.org/10.1007/978-981-13-1799-6_32

  47. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188. https://doi.org/10.1613/jair.2934

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

I would like to express my sincere thanks to Prof. Andrea Capiluppi (Department of Software Engineering, Faculty of Science and Engineering, University of Groningen, Netherlands) for his guidance and valuable advice on this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edi Sutoyo.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this article. The authors confirmed that the data and the paper are free of plagiarism.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sutoyo, E., Rifai, A.P., Risnumawan, A. et al. A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations. Multimed Tools Appl 81, 6413–6431 (2022). https://doi.org/10.1007/s11042-022-11900-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-11900-9

Keywords

Navigation