skip to main content
10.1145/3448734.3450777acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdtmisConference Proceedingsconference-collections
research-article

Research on Multi-granularity Ensemble Learning Based on Korean

Published:17 May 2021Publication History

ABSTRACT

Ensemble learning can train and combine multiple classifiers where the predictions are used as new features to train a meta-classifier. This improves the accuracy of the model. This paper proposes a multi granularity model based on Stacking ensemble learning for Korean text classification. Firstly, eojeol and subeojeol granularity is proposed according to the Korean language composition. Since different feature granularity contains different semantic information, compare the six different granularities of the phoneme, syllable, subword, word, subeojeol, and eojeol in Korean text classification task. Secondly, construct suffix words based on Korean grammatical morphology and compare the different granularities effects after suffix preprocessing. Finally, propose a multi granularity ensemble learning model based on Korean called MGEL-K. To enrich the diversity of ensemble learning using different granularities, making differences between learners. The results show that MGEL-K model proposed in this paper works best in the Korean text classification task with an accuracy of 92.33%.

References

  1. L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems, 2000, pp. 1–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 161–168 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in neural information processing systems, 2015, pp. 649–657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv Prepr. arXiv1508.07909, 2015.Google ScholarGoogle Scholar
  6. Mintae Kim, Yeongtaek Oh, and Wooju Kim, “Sentence similarity prediction based on siamese CNN-Bidirectional LSTM with Self-attention,” Korean Inst. Inf. Sci. Eng., vol. 46, no. 3, pp. 241–245, 2019.Google ScholarGoogle Scholar
  7. X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, “Joint learning of character and word embeddings,” 2015.Google ScholarGoogle Scholar
  8. J. Yu, X. Jian, H. Xin, and Y. Song, “Joint embeddings of chinese words, characters, and fine-grained subcharacter components,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 286–291.Google ScholarGoogle ScholarCross RefCross Ref
  9. X. Meng, Y. Zhao, and M. Fang, “Multilingual text classification method based on bi-directional long term memory and convolutional neural network,” Appl. Res. Comput., vol. 37, no. 9, pp. 2669–2673, 2020.Google ScholarGoogle Scholar
  10. E. L. Park and S. Cho, “KoNLPy: Korean natural language processing in Python,” Proc. 26th Annu. Conf. Hum. Cogn. Lang. Technol., pp. 133–136, 2014.Google ScholarGoogle Scholar
  11. T. Kudo, “Subword regularization: Improving neural network translation models with multiple subword candidates,” arXiv Prepr. arXiv1804.10959, 2018.Google ScholarGoogle Scholar
  12. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv Prepr. arXiv1810.04805, 2018.Google ScholarGoogle Scholar
  13. Cho S, Whitman J, “Korean: A Linguistic Introduction,” Cambridge University Press, 2019, pp. 31-35.Google ScholarGoogle Scholar
  14. F. Yang, Y. Zhao, R. Cui, and Z. Yi, “Words Alignment in Parallel Corpus Based on Translation Probability,” J. Chinese Inf. Process., vol. 33, no. 12, pp. 37–44, 2019.Google ScholarGoogle Scholar
  15. R. E. Schapire, “The strength of weak learnability,” Mach. Learn., vol. 5, no. 2, pp. 197–227, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly combined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341–348, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. Kim, “Convolutional neural networks for sentence classification,” arXiv Prepr. arXiv1408.5882, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” Adv. Neural Inf. Process. Syst., vol. 21, pp. 1081–1088, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Lin , “A structured self-attentive sentence embedding,” arXiv Prepr. arXiv1703.03130, 2017.Google ScholarGoogle Scholar
  22. A. Vaswani , “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Tian, Y. Zhao, and R. Cui, “Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features,” in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, 2018, pp. 76–87.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Research on Multi-granularity Ensemble Learning Based on Korean
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
            January 2021
            1142 pages
            ISBN:9781450389570
            DOI:10.1145/3448734

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 May 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format