A decision tree using ID3 algorithm for English semantic analysis

Phu, Vo Ngoc; Tran, Vo Thi Ngoc; Chau, Vo Thi Ngoc; Dat, Nguyen Duy; Duy, Khanh Ly Doan

doi:10.1007/s10772-017-9429-x

A decision tree using ID3 algorithm for English semantic analysis

Published: 15 June 2017

Volume 20, pages 593–613, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Vo Ngoc Phu ORCID: orcid.org/0000-0001-6047-9066¹,
Vo Thi Ngoc Tran²,
Vo Thi Ngoc Chau³,
Nguyen Duy Dat⁴ &
…
Khanh Ly Doan Duy⁵

1055 Accesses
34 Citations
Explore all metrics

Abstract

Natural language processing has been studied for many years, and it has been applied to many researches and commercial applications. A new model is proposed in this paper, and is used in the English document-level emotional classification. In this survey, we proposed a new model by using an ID3 algorithm of a decision tree to classify semantics (positive, negative, and neutral) for the English documents. The semantic classification of our model is based on many rules which are generated by applying the ID3 algorithm to 115,000 English sentences of our English training data set. We test our new model on the English testing data set including 25,000 English documents, and achieve 63.6% accuracy of sentiment classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shifting semantic values of English phrases for classification

Article 16 May 2017

Semantic lexicons of English nouns for classification

Article 12 June 2017

An improved algorithm for sentiment analysis based on maximum entropy

Article 01 November 2017

References

Agarwal, B., & Mittal, N. (2016). Semantic orientation-based approach for sentiment analysis. Prominent Feature Extraction for Sentiment Analysis. doi:10.1007/978-3-319-25343-5_6. Print ISBN 978-3-319-25341-1.
Article Google Scholar
Agarwal, B., & Mittal, N. (2016). Machine learning approach for sentiment analysis. Prominent Feature Extraction for Sentiment Analysis. doi:10.1007/978-3-319-25343-5_3. ISBN 978-3-319-25341-1.
Article Google Scholar
Ahmed, S., & Danti, A. (2016). Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. Computational Intelligence in Data Mining. doi:10.1007/978-81-322-2734-2_18. ISBN 978-81-322-2732-8.
Google Scholar
Baldwin, J. F., Lawry, J., & Martin, T. P. (1997). A mass assignment based ID3 algorithm for decision tree induction. International Journal of Intelligent Systems. doi:10.1002/(SICI)1098-111X(199707)12:7<523::AID-INT3>3.0.CO;2-N.
Google Scholar
Canuto, S., Gonçalves, M. A., & Benevenuto, F. (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In Proceedings of the ninth ACM International conference on web search and data mining (WSDM ‘16), New York, USA (pp. 53–62).
Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), 349–370.
Article MATH Google Scholar
Chaovalit, P., Zhou, L. (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In Proceedings of the 38th annual hawaii international conference on system sciences.
Cheng, J., Fayyad, U. M., Irani, K. B., & Qian, Z. (1988) Improved decision trees: A generalized version of ID3. In Proceedings of the fifth international conference on machine learning, Ann Arbor, Michigan, USA.
Cios, K. J., & Liu, N. (2002). A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Transactions on Neural Networks, 3(2), 280–291.
Article Google Scholar
Cios, K. J., & Sztandera, L. M. (1992) Continuous ID3 algorithm with fuzzy entropy measures. In IEEE international conference on fuzzy systems (pp. 469–476).
Dalal, M. K., & Zaveri, M. (2011). Automatic text classification: A technical review. International Journal of Computer Applications (0975 – 8887), 28(2), 37–40.
Google Scholar
Ferro-Famil, L., Pottier, E., & Lee, J.-S. (2002). Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier. IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2332–2342.
Article Google Scholar
Gllavata, J., Ewerth, R., & Freisleben, B. (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In Proceedings of the 17th International conference on pattern recognition (ICPR 2004) (Vol. 1, pp. 425–428).
Jin, C., De-lin, L., & Fen-xiang, M. (2009) An improved ID3 decision tree algorithm. In 4th international conference on computer science & education (ICCSE’09) (pp. 127–130).
Kaur, A., & Duhan, N. (2015) A survey on sentiment analysis and opinion mining. International Journal of Innovations & Advancement in Computer Science (IJIACS), 4(Special Issue). ISSN 2347–8616.
Large Movie Review Dataset (2017) http://ai.stanford.edu/~amaas/data/sentiment/.
Le Hegarat-Mascle, S., Bloch, I., & Vidal-Madjar, D. (2002). Application of Dempster–Shafer evidence theory to unsupervised classification in multisource remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 35(4), 1018–1031.
Article Google Scholar
Lee, J.-S., Grunes, M. R., Ainsworth, T. L., & Du, L.-J. (2002). Unsupervised classification using polarimetric decomposition and the complex Wishart classifier. IEEE Transactions on Geoscience and Remote Sensing, 37(5), 2249–2258.
Google Scholar
Lee, T.-W., Lewicki, M. S., & Sejnowski, T. J. (2002). ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1078–1089.
Google Scholar
Maher, P. E., & Clair, D. S. (1993) Uncertain reasoning in an ID3 machine learning framework. In Second IEEE international conference on fuzzy systems (Vol. 1, pp.7–12).
Mandal, A. K., & Sen, R. (2014) Supervised learning methods for bangla web document categorization. International Journal of Artificial Intelligence & Applications (IJAIA), 5(5).
Manek, A. S., Shenoy, P. D., Mohan, M. C., & V. K., R. (2016) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web. doi:10.1007/s11280-015-0381-x. ISSN1386-145X.
Google Scholar
Ming, H., Wenying, N., & Xu, L. (2009) An improved decision tree classification algorithm based on ID3 and the application in score analysis. In Chinese control and decision conference (pp. 1876–1879).
Nizamani, S., Memon, N., Wiil, U. K., & Karampelas, P. (2013). Modeling suspicious email detection using enhanced feature selection. IJMO, 2(4), 371–377. ISSN 2010–3697.
Google Scholar
Phu, V. N., Chau, V. T. N., Dat, N. D., Tran, V. T. N., & Nguyen, T. A. (2017a). A valences-totaling model for English sentiment classification. International Journal of Knowledge and Information Systems. doi:10.1007/s10115-017-1054-0.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., & Dat, N. D. (2017b). A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. International Journal of Artificial Intelligence Review (AIR). doi:10.1007/s10462-017-9538-6.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., & Dat, N. D. (2017c). A C4.5 algorithm for English emotional classification. International Journal of Evolving Systems. doi:10.1007/s12530-017-9180-1.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017d). Semantic lexicons of English nouns for classification. International Journal of Evolving Systems. doi:10.1007/s12530-017-9188-6.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017e). A valence-totaling model for Vietnamese sentiment classification. International Journal of Evolving Systems (EVOS). doi:10.1007/s12530-017-9187-7.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Duy, K. L. D. (2017f). SVM for English semantic classification in parallel environment. International Journal of Speech Technology (IJST). doi:10.1007/s10772-017-9421-5.
Google Scholar
Phu, V. N., Chau, V. T. N., Tran, V. T. N., Dat, N. D., & Nguyen, T. A. (2017g). STING algorithm used English sentiment classification in a parallel environment. International Journal of Pattern Recognition and Artificial Intelligence. doi:10.1142/S0218001417500215.
Google Scholar
Phu, V. N., Dat, N. D., Chau, V. T. N., Tran, V. T. N., & Duy, K. L. D. (2017h). Shifting semantic values of English phrases for classification. International Journal of Speech Technology (IJST). doi:10.1007/s10772-017-9420-6.
Google Scholar
Phu, V. N., Dat, N. D., Tran, V. T. N., Chau, V. T. N., & Nguyen, T. A. (2017i). Fuzzy C-means for english sentiment classification in a distributed system. International Journal of Applied Intelligence (APIN). doi:10.1007/s10489-016-0858-z.
Google Scholar
Phu, V. N., & Tuoi, P. T. (2014) Sentiment classification using enhanced contextual valence shifters. In International conference on Asian language processing (IALP) (pp. 224–229).
Pong-Inwong, C., & Rungworawut, W. S. (2014) Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation. In 10th international conference on semantics, knowledge and grids (SKG) (pp. 84–91).
Prasad, S. S., Kumar, J., Prabhakar, D. K., & Pal, S. (2016) Sentiment classification: An approach for Indian language tweets using decision tree. Mining Intelligence and Knowledge Exploration, Volume 9468 of the series Lecture Notes in Computer Science (pp. 656–663).
Psomakelis, E., Tserpes, K., Anagnostopoulos, D., & Varvarigou, T. (2015) Comparing methods for Twitter sentiment analysis. arXiv:1505.02973 [cs.CL].
Shao, X., Zhang, G., Li, P., & Chen, Y. (2001). Application of ID3 algorithm in knowledge acquisition for tolerance design. Journal of Materials Processing Technology, 117(1–2), 66–74.
Article Google Scholar
Sharma, M. (2014) Z-CRIME: A data mining tool for the detection of suspicious criminal activities based on decision tree. In International conference on data mining and intelligent computing (ICDMIC) (pp. 1–6).
Shrivastava, S., Dr. Nair, P. S. (2015). Mood prediction on tweets using classification algorithm. International Journal of Science and Research (IJSR), 14(1), 295–299.
Google Scholar
Taboada, M., Voll, K., & Brooke, J. (2008) Extracting sentiment as a function of discourse structure and topicality. Technical Report 2008-20, School of Computing Science, Simon Fraser University.
Tani, T., Sakoda, M., & Tanaka, K. (1992) Fuzzy modeling by ID3 algorithm and its application to prediction of heater outlet temperature. In IEEE international conference on fuzzy systems (pp. 923–930).
Tran, V. T. N., Phu, V. N., & Tuoi, P. T. (2014) Learning more chi square feature selection to improve the fastest and most accurate sentiment classification. In The third Asian conference on information systems, ACIS 2014.
Turney, P. D. (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL ‘02 Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424), USA.
Umanol, M., Okamoto, H., Hatono, I., & Tamura, H. (1994) Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In Proceedings of the third IEEE conference on fuzzy systems, 1994. IEEE world congress on computational intelligence (pp. 2113–2118).
van Zyl, J. J. (2002). Unsupervised classification of scattering behavior using radar polarimetry data. IEEE Transactions on Geoscience and Remote Sensing, 27(1), 36–45.
Google Scholar
Vinodhini, G., & Chandrasekaran, R. M. (2013). Performance evaluation of sentiment mining classifiers on balanced and imbalanced dataset. International Journal of Computer Science and Business Informatics, 6(1), 1–8.
Google Scholar
Voll, K., & Taboada, M. (2007) Not all words are created equal: Extracting semantic orientation as a function of adjective relevance. AI 2007: Advances in Artificial Intelligence, Volume 4830 of the series Lecture Notes in Computer Science (pp. 337–346).
Wan, Y., & Gao, Q. (2015) An ensemble sentiment classification system of twitter data for airline services analysis. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1318–1325).
Wang, X., Chen, B., Qian, G., & Ye, F. (2000). On the optimization of fuzzy decision trees. Fuzzy Sets and Systems, 112(1), 117–125.
Article MathSciNet Google Scholar
Winkler, S., Schaller, S., Dorfer, V., Affenzeller, M., Petz, G., & Karpowicz, M. (2015). Data-based prediction of sentiments using heterogeneous model ensembles. Soft Computing, 19(12), 3401–3412.
Article Google Scholar
Xiao, M.-J., Huang, L.-S., Luo, Y.-L., & Shen, H. (2005) Privacy preserving ID3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT’05) (pp. 239–243).
Yuxun, L., & Niuniu, X. (2010) Improved ID3 algorithm. In 3rd IEEE international conference on computer science and information technology (ICCSIT) (Vol. 8, pp. 465–468).

Download references

Author information

Authors and Affiliations

Institute of Research and Development, Duy Tan University - DTU, Da Nang, Vietnam
Vo Ngoc Phu
School of Industrial Management (SIM), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam
Vo Thi Ngoc Tran
Computer Science & Engineering (CSE), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam
Vo Thi Ngoc Chau
Faculty of Information Technology, Ly Tu Trong Technical College, Ho Chi Minh City, Vietnam
Nguyen Duy Dat
Faculty of Information Technology, Ho Chi Minh City University of Foreign Languages, Ho Chi Minh City, Vietnam
Khanh Ly Doan Duy

Authors

Vo Ngoc Phu
View author publications
You can also search for this author in PubMed Google Scholar
Vo Thi Ngoc Tran
View author publications
You can also search for this author in PubMed Google Scholar
Vo Thi Ngoc Chau
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Duy Dat
View author publications
You can also search for this author in PubMed Google Scholar
Khanh Ly Doan Duy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vo Ngoc Phu.

Appendices

See Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14.

Table 1 Training data set for a decision tree

Full size table

Table 2 The results of the 25,000 English documents in the testing data set t1

Full size table

Table 3 The results of the 25,000 English documents in the testing data set t2

Full size table

Table 4 The accuracy of our new model for the 25,000 English documents in the testing data set t1

Full size table

Table 5 The accuracy of our new model for the 25,000 English documents in the testing data set t2

Full size table

Table 6 Comparing our model’s results with the researches related to the ID3 algorithm in Umanol et al. (1994), Cendrowska (1987), Cios and Liu (2002), Cios and Sztandera (1992), Jin et al. (2009), Wang et al. (2000), Baldwin et al. (1997), Tani et al. (1992), Xiao et al. (2005), Cheng et al. (1988), Shao et al. (2001), Ming et al. (2009), Yuxun and Niuniu (2010), Maher and Clair (1993)

Full size table

Table 7 Comparing our model’s advantages and disadvantages with the studies related to the ID3 algorithm in Umanol et al. (1994), Cendrowska (1987), Cios and Liu (2002), Cios and Sztandera (1992), Jin et al. (2009), Wang et al. (2000), Baldwin et al. (1997), Tani et al. (1992), Xiao et al. (2005), Cheng et al. (1988), Shao et al. (2001), Ming et al. (2009), Yuxun and Niuniu (2010), Maher and Clair (1993)

Full size table

Table 8 Comparisons of our model’s results with the surveys related to the decision tree for sentiment classification (or sentiment analysis) in Dalal and Zaveri (2011), Taboada et al. (2008), Nizamani et al. (2013), Wan and Gao (2015), Winkler et al. (2015), Psomakelis et al. (2015), Shrivastava (2015), Vinodhini and Chandrasekaran (2013), Voll and Taboada (2007), Mandal and Sen (2014), Kaur and Duhan (2015), Prasad et al. (2016), Pong-Inwong and Rungworawut (2014), Sharma (2014)

Full size table

Table 9 Comparisons of our model’s merits and demerits the surveys related to the decision tree for sentiment classification (or sentiment analysis) in Dalal and Zaveri (2011), Taboada et al. (2008), Nizamani et al. (2013), Wan and Gao (2015), Winkler et al. (2015), Psomakelis et al. (2015), Shrivastava (2015), Vinodhini and Chandrasekaran (2013), Voll and Taboada (2007), Mandal and Sen (2014), Kaur and Duhan (2015), Prasad et al. (2016), Pong-Inwong and Rungworawut (2014), Sharma (2014)

Full size table

Table 10 Comparisons of our model with the latest sentiment classification models (or the latest sentiment classification methods) in Manek et al. (2016), Agarwal and Mittal (2016), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014)

Full size table

Table 11 Comparisons of our model’s benefits and drawbacks with the latest sentiment classification models (or the latest sentiment classification methods) in Manek et al. (2016), Agarwal and Mittal (2016), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014)

Full size table

Table 12 Comparisons of our model’s results with the latest works of the unsupervised classification in Turney (2002), Lee et al. (2002), Zyl (2002), Le Hegarat-Mascle et al. (2002), Ferro-Famil et al. (2002), Chaovalit and Zhou (2005), Gllavata et al. (2004)

Full size table

Table 13 Comparisons of our model’s positives and negatives with the latest works of the unsupervised classification in Turney (2002), Lee et al. (2002), Zyl (2002), Le Hegarat-Mascle et al. (2002), Ferro-Famil et al. (2002), Chaovalit and Zhou (2005), Gllavata et al. (2004)

Full size table

Table 14 Comparisons of our model’s results with the normal ID3 algorithm

Full size table

1.1 Appendices of all codes

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. A decision tree using ID3 algorithm for English semantic analysis. Int J Speech Technol 20, 593–613 (2017). https://doi.org/10.1007/s10772-017-9429-x

Download citation

Received: 06 April 2017
Accepted: 06 June 2017
Published: 15 June 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10772-017-9429-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A decision tree using ID3 algorithm for English semantic analysis

Abstract

Access this article

Similar content being viewed by others

Shifting semantic values of English phrases for classification

Semantic lexicons of English nouns for classification

An improved algorithm for sentiment analysis based on maximum entropy

References