Skip to main content
Log in

A decision tree using ID3 algorithm for English semantic analysis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Natural language processing has been studied for many years, and it has been applied to many researches and commercial applications. A new model is proposed in this paper, and is used in the English document-level emotional classification. In this survey, we proposed a new model by using an ID3 algorithm of a decision tree to classify semantics (positive, negative, and neutral) for the English documents. The semantic classification of our model is based on many rules which are generated by applying the ID3 algorithm to 115,000 English sentences of our English training data set. We test our new model on the English testing data set including 25,000 English documents, and achieve 63.6% accuracy of sentiment classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Agarwal, B., & Mittal, N. (2016). Semantic orientation-based approach for sentiment analysis. Prominent Feature Extraction for Sentiment Analysis. doi:10.1007/978-3-319-25343-5_6. Print ISBN 978-3-319-25341-1.

    Article  Google Scholar 

  • Agarwal, B., & Mittal, N. (2016). Machine learning approach for sentiment analysis. Prominent Feature Extraction for Sentiment Analysis. doi:10.1007/978-3-319-25343-5_3. ISBN 978-3-319-25341-1.

    Article  Google Scholar 

  • Ahmed, S., & Danti, A. (2016). Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. Computational Intelligence in Data Mining. doi:10.1007/978-81-322-2734-2_18. ISBN 978-81-322-2732-8.

    Google Scholar 

  • Baldwin, J. F., Lawry, J., & Martin, T. P. (1997). A mass assignment based ID3 algorithm for decision tree induction. International Journal of Intelligent Systems. doi:10.1002/(SICI)1098-111X(199707)12:7<523::AID-INT3>3.0.CO;2-N.

    Google Scholar 

  • Canuto, S., Gonçalves, M. A., & Benevenuto, F. (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In Proceedings of the ninth ACM International conference on web search and data mining (WSDM ‘16), New York, USA (pp. 53–62).

  • Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), 349–370.

    Article  MATH  Google Scholar 

  • Chaovalit, P., Zhou, L. (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In Proceedings of the 38th annual hawaii international conference on system sciences.

  • Cheng, J., Fayyad, U. M., Irani, K. B., & Qian, Z. (1988) Improved decision trees: A generalized version of ID3. In Proceedings of the fifth international conference on machine learning, Ann Arbor, Michigan, USA.

  • Cios, K. J., & Liu, N. (2002). A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Transactions on Neural Networks, 3(2), 280–291.

    Article  Google Scholar 

  • Cios, K. J., & Sztandera, L. M. (1992) Continuous ID3 algorithm with fuzzy entropy measures. In IEEE international conference on fuzzy systems (pp. 469–476).

  • Dalal, M. K., & Zaveri, M. (2011). Automatic text classification: A technical review. International Journal of Computer Applications (0975 – 8887), 28(2), 37–40.

    Google Scholar 

  • Ferro-Famil, L., Pottier, E., & Lee, J.-S. (2002). Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier. IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2332–2342.

    Article  Google Scholar 

  • Gllavata, J., Ewerth, R., & Freisleben, B. (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In Proceedings of the 17th International conference on pattern recognition (ICPR 2004) (Vol. 1, pp. 425–428).

  • Jin, C., De-lin, L., & Fen-xiang, M. (2009) An improved ID3 decision tree algorithm. In 4th international conference on computer science & education (ICCSE’09) (pp. 127–130).

  • Kaur, A., & Duhan, N. (2015) A survey on sentiment analysis and opinion mining. International Journal of Innovations & Advancement in Computer Science (IJIACS), 4(Special Issue). ISSN 2347–8616.

  • Large Movie Review Dataset (2017) http://ai.stanford.edu/~amaas/data/sentiment/.

  • Le Hegarat-Mascle, S., Bloch, I., & Vidal-Madjar, D. (2002). Application of Dempster–Shafer evidence theory to unsupervised classification in multisource remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 35(4), 1018–1031.

    Article  Google Scholar 

  • Lee, J.-S., Grunes, M. R., Ainsworth, T. L., & Du, L.-J. (2002). Unsupervised classification using polarimetric decomposition and the complex Wishart classifier. IEEE Transactions on Geoscience and Remote Sensing, 37(5), 2249–2258.

    Google Scholar 

  • Lee, T.-W., Lewicki, M. S., & Sejnowski, T. J. (2002). ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1078–1089.

    Google Scholar 

  • Maher, P. E., & Clair, D. S. (1993) Uncertain reasoning in an ID3 machine learning framework. In Second IEEE international conference on fuzzy systems (Vol. 1, pp.7–12).

  • Mandal, A. K., & Sen, R. (2014) Supervised learning methods for bangla web document categorization. International Journal of Artificial Intelligence & Applications (IJAIA), 5(5).

  • Manek, A. S., Shenoy, P. D., Mohan, M. C., & V. K., R. (2016) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web. doi:10.1007/s11280-015-0381-x. ISSN1386-145X.

    Google Scholar 

  • Ming, H., Wenying, N., & Xu, L. (2009) An improved decision tree classification algorithm based on ID3 and the application in score analysis. In Chinese control and decision conference (pp. 1876–1879).

  • Nizamani, S., Memon, N., Wiil, U. K., & Karampelas, P. (2013). Modeling suspicious email detection using enhanced feature selection. IJMO, 2(4), 371–377. ISSN 2010–3697.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Dat, N. D., Tran, V. T. N., & Nguyen, T. A. (2017a). A valences-totaling model for English sentiment classification. International Journal of Knowledge and Information Systems. doi:10.1007/s10115-017-1054-0.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., & Dat, N. D. (2017b). A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. International Journal of Artificial Intelligence Review (AIR). doi:10.1007/s10462-017-9538-6.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., & Dat, N. D. (2017c). A C4.5 algorithm for English emotional classification. International Journal of Evolving Systems. doi:10.1007/s12530-017-9180-1.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017d). Semantic lexicons of English nouns for classification. International Journal of Evolving Systems. doi:10.1007/s12530-017-9188-6.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017e). A valence-totaling model for Vietnamese sentiment classification. International Journal of Evolving Systems (EVOS). doi:10.1007/s12530-017-9187-7.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017f). SVM for English semantic classification in parallel environment. International Journal of Speech Technology (IJST). doi:10.1007/s10772-017-9421-5.

    Google Scholar 

  • Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Nguyen, T. A. (2017g). STING algorithm used English sentiment classification in a parallel environment. International Journal of Pattern Recognition and Artificial Intelligence. doi:10.1142/S0218001417500215.

    Google Scholar 

  • Phu, V. N., Dat, N. D., Chau, V. T. N., Tran, V. T. N., & Duy, K. L. D. (2017h). Shifting semantic values of English phrases for classification. International Journal of Speech Technology (IJST). doi:10.1007/s10772-017-9420-6.

    Google Scholar 

  • Phu, V. N., Dat, N. D., Tran, V. T. N., Chau, V. T. N., & Nguyen, T. A. (2017i). Fuzzy C-means for english sentiment classification in a distributed system. International Journal of Applied Intelligence (APIN). doi:10.1007/s10489-016-0858-z.

    Google Scholar 

  • Phu, V. N., & Tuoi, P. T. (2014) Sentiment classification using enhanced contextual valence shifters. In International conference on Asian language processing (IALP) (pp. 224–229).

  • Pong-Inwong, C., & Rungworawut, W. S. (2014) Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation. In 10th international conference on semantics, knowledge and grids (SKG) (pp. 84–91).

  • Prasad, S. S., Kumar, J., Prabhakar, D. K., & Pal, S. (2016) Sentiment classification: An approach for Indian language tweets using decision tree. Mining Intelligence and Knowledge Exploration, Volume 9468 of the series Lecture Notes in Computer Science (pp. 656–663).

  • Psomakelis, E., Tserpes, K., Anagnostopoulos, D., & Varvarigou, T. (2015) Comparing methods for Twitter sentiment analysis. arXiv:1505.02973 [cs.CL].

  • Shao, X., Zhang, G., Li, P., & Chen, Y. (2001). Application of ID3 algorithm in knowledge acquisition for tolerance design. Journal of Materials Processing Technology, 117(1–2), 66–74.

    Article  Google Scholar 

  • Sharma, M. (2014) Z-CRIME: A data mining tool for the detection of suspicious criminal activities based on decision tree. In International conference on data mining and intelligent computing (ICDMIC) (pp. 1–6).

  • Shrivastava, S., Dr. Nair, P. S. (2015). Mood prediction on tweets using classification algorithm. International Journal of Science and Research (IJSR), 14(1), 295–299.

    Google Scholar 

  • Taboada, M., Voll, K., & Brooke, J. (2008) Extracting sentiment as a function of discourse structure and topicality. Technical Report 2008-20, School of Computing Science, Simon Fraser University.

  • Tani, T., Sakoda, M., & Tanaka, K. (1992) Fuzzy modeling by ID3 algorithm and its application to prediction of heater outlet temperature. In IEEE international conference on fuzzy systems (pp. 923–930).

  • Tran, V. T. N., Phu, V. N., & Tuoi, P. T. (2014) Learning more chi square feature selection to improve the fastest and most accurate sentiment classification. In The third Asian conference on information systems, ACIS 2014.

  • Turney, P. D. (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL ‘02 Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424), USA.

  • Umanol, M., Okamoto, H., Hatono, I., & Tamura, H. (1994) Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In Proceedings of the third IEEE conference on fuzzy systems, 1994. IEEE world congress on computational intelligence (pp. 2113–2118).

  • van Zyl, J. J. (2002). Unsupervised classification of scattering behavior using radar polarimetry data. IEEE Transactions on Geoscience and Remote Sensing, 27(1), 36–45.

    Google Scholar 

  • Vinodhini, G., & Chandrasekaran, R. M. (2013). Performance evaluation of sentiment mining classifiers on balanced and imbalanced dataset. International Journal of Computer Science and Business Informatics, 6(1), 1–8.

    Google Scholar 

  • Voll, K., & Taboada, M. (2007) Not all words are created equal: Extracting semantic orientation as a function of adjective relevance. AI 2007: Advances in Artificial Intelligence, Volume 4830 of the series Lecture Notes in Computer Science (pp. 337–346).

  • Wan, Y., & Gao, Q. (2015) An ensemble sentiment classification system of twitter data for airline services analysis. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1318–1325).

  • Wang, X., Chen, B., Qian, G., & Ye, F. (2000). On the optimization of fuzzy decision trees. Fuzzy Sets and Systems, 112(1), 117–125.

    Article  MathSciNet  Google Scholar 

  • Winkler, S., Schaller, S., Dorfer, V., Affenzeller, M., Petz, G., & Karpowicz, M. (2015). Data-based prediction of sentiments using heterogeneous model ensembles. Soft Computing, 19(12), 3401–3412.

    Article  Google Scholar 

  • Xiao, M.-J., Huang, L.-S., Luo, Y.-L., & Shen, H. (2005) Privacy preserving ID3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT’05) (pp. 239–243).

  • Yuxun, L., & Niuniu, X. (2010) Improved ID3 algorithm. In 3rd IEEE international conference on computer science and information technology (ICCSIT) (Vol. 8, pp. 465–468).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vo Ngoc Phu.

Appendices

Appendices

See Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14.

Table 1 Training data set for a decision tree
Table 2 The results of the 25,000 English documents in the testing data set t1
Table 3 The results of the 25,000 English documents in the testing data set t2
Table 4 The accuracy of our new model for the 25,000 English documents in the testing data set t1
Table 5 The accuracy of our new model for the 25,000 English documents in the testing data set t2
Table 6 Comparing our model’s results with the researches related to the ID3 algorithm in Umanol et al. (1994), Cendrowska (1987), Cios and Liu (2002), Cios and Sztandera (1992), Jin et al. (2009), Wang et al. (2000), Baldwin et al. (1997), Tani et al. (1992), Xiao et al. (2005), Cheng et al. (1988), Shao et al. (2001), Ming et al. (2009), Yuxun and Niuniu (2010), Maher and Clair (1993)
Table 7 Comparing our model’s advantages and disadvantages with the studies related to the ID3 algorithm in Umanol et al. (1994), Cendrowska (1987), Cios and Liu (2002), Cios and Sztandera (1992), Jin et al. (2009), Wang et al. (2000), Baldwin et al. (1997), Tani et al. (1992), Xiao et al. (2005), Cheng et al. (1988), Shao et al. (2001), Ming et al. (2009), Yuxun and Niuniu (2010), Maher and Clair (1993)
Table 8 Comparisons of our model’s results with the surveys related to the decision tree for sentiment classification (or sentiment analysis) in Dalal and Zaveri (2011), Taboada et al. (2008), Nizamani et al. (2013), Wan and Gao (2015), Winkler et al. (2015), Psomakelis et al. (2015), Shrivastava (2015), Vinodhini and Chandrasekaran (2013), Voll and Taboada (2007), Mandal and Sen (2014), Kaur and Duhan (2015), Prasad et al. (2016), Pong-Inwong and Rungworawut (2014), Sharma (2014)
Table 9 Comparisons of our model’s merits and demerits the surveys related to the decision tree for sentiment classification (or sentiment analysis) in Dalal and Zaveri (2011), Taboada et al. (2008), Nizamani et al. (2013), Wan and Gao (2015), Winkler et al. (2015), Psomakelis et al. (2015), Shrivastava (2015), Vinodhini and Chandrasekaran (2013), Voll and Taboada (2007), Mandal and Sen (2014), Kaur and Duhan (2015), Prasad et al. (2016), Pong-Inwong and Rungworawut (2014), Sharma (2014)
Table 10 Comparisons of our model with the latest sentiment classification models (or the latest sentiment classification methods) in Manek et al. (2016), Agarwal and Mittal (2016), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014)
Table 11 Comparisons of our model’s benefits and drawbacks with the latest sentiment classification models (or the latest sentiment classification methods) in Manek et al. (2016), Agarwal and Mittal (2016), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014)
Table 12 Comparisons of our model’s results with the latest works of the unsupervised classification in Turney (2002), Lee et al. (2002), Zyl (2002), Le Hegarat-Mascle et al. (2002), Ferro-Famil et al. (2002), Chaovalit and Zhou (2005), Gllavata et al. (2004)
Table 13 Comparisons of our model’s positives and negatives with the latest works of the unsupervised classification in Turney (2002), Lee et al. (2002), Zyl (2002), Le Hegarat-Mascle et al. (2002), Ferro-Famil et al. (2002), Chaovalit and Zhou (2005), Gllavata et al. (2004)
Table 14 Comparisons of our model’s results with the normal ID3 algorithm

1.1 Appendices of all codes

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. A decision tree using ID3 algorithm for English semantic analysis. Int J Speech Technol 20, 593–613 (2017). https://doi.org/10.1007/s10772-017-9429-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9429-x

Keywords

Navigation