research-article

Research on Multi-granularity Ensemble Learning Based on Korean

Authors:
Jing Jin

Yanbian University, China

Yanbian University, China
View Profile

,
Yahui Zhao

Yanbian University, China

Yanbian University, China
View Profile

,
Rongyi Cui

Yanbian University, China

Yanbian University, China
View Profile

CONF-CDS 2021: The 2nd International Conference on Computing and Data ScienceJanuary 2021Article No.: 47Pages 1–6https://doi.org/10.1145/3448734.3450777

Published:17 May 2021Publication History

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

Pages 1–6

ABSTRACT

Ensemble learning can train and combine multiple classifiers where the predictions are used as new features to train a meta-classifier. This improves the accuracy of the model. This paper proposes a multi granularity model based on Stacking ensemble learning for Korean text classification. Firstly, eojeol and subeojeol granularity is proposed according to the Korean language composition. Since different feature granularity contains different semantic information, compare the six different granularities of the phoneme, syllable, subword, word, subeojeol, and eojeol in Korean text classification task. Secondly, construct suffix words based on Korean grammatical morphology and compare the different granularities effects after suffix preprocessing. Finally, propose a multi granularity ensemble learning model based on Korean called MGEL-K. To enrich the diversity of ensemble learning using different granularities, making differences between learners. The results show that MGEL-K model proposed in this paper works best in the Korean text classification task with an accuracy of 92.33%.

References

L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, 1990. Google ScholarDigital Library
T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems, 2000, pp. 1–15. Google ScholarDigital Library
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 161–168 Google ScholarDigital Library
X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in neural information processing systems, 2015, pp. 649–657. Google ScholarDigital Library
R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv Prepr. arXiv1508.07909, 2015.Google Scholar
Mintae Kim, Yeongtaek Oh, and Wooju Kim, “Sentence similarity prediction based on siamese CNN-Bidirectional LSTM with Self-attention,” Korean Inst. Inf. Sci. Eng., vol. 46, no. 3, pp. 241–245, 2019.Google Scholar
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, “Joint learning of character and word embeddings,” 2015.Google Scholar
J. Yu, X. Jian, H. Xin, and Y. Song, “Joint embeddings of chinese words, characters, and fine-grained subcharacter components,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 286–291.Google ScholarCross Ref
X. Meng, Y. Zhao, and M. Fang, “Multilingual text classification method based on bi-directional long term memory and convolutional neural network,” Appl. Res. Comput., vol. 37, no. 9, pp. 2669–2673, 2020.Google Scholar
E. L. Park and S. Cho, “KoNLPy: Korean natural language processing in Python,” Proc. 26th Annu. Conf. Hum. Cogn. Lang. Technol., pp. 133–136, 2014.Google Scholar
T. Kudo, “Subword regularization: Improving neural network translation models with multiple subword candidates,” arXiv Prepr. arXiv1804.10959, 2018.Google Scholar
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv Prepr. arXiv1810.04805, 2018.Google Scholar
Cho S, Whitman J, “Korean: A Linguistic Introduction,” Cambridge University Press, 2019, pp. 31-35.Google Scholar
F. Yang, Y. Zhao, R. Cui, and Z. Yi, “Words Alignment in Parallel Corpus Based on Translation Probability,” J. Chinese Inf. Process., vol. 33, no. 12, pp. 37–44, 2019.Google Scholar
R. E. Schapire, “The strength of weak learnability,” Mach. Learn., vol. 5, no. 2, pp. 197–227, 1990. Google ScholarDigital Library
L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996. Google ScholarDigital Library
D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992. Google ScholarDigital Library
K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly combined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341–348, 1996.Google ScholarCross Ref
Y. Kim, “Convolutional neural networks for sentence classification,” arXiv Prepr. arXiv1408.5882, 2014.Google ScholarCross Ref
A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” Adv. Neural Inf. Process. Syst., vol. 21, pp. 1081–1088, 2008. Google ScholarDigital Library
Z. Lin , “A structured self-attentive sentence embedding,” arXiv Prepr. arXiv1703.03130, 2017.Google Scholar
A. Vaswani , “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008. Google ScholarDigital Library
M. Tian, Y. Zhao, and R. Cui, “Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features,” in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, 2018, pp. 76–87.Google ScholarCross Ref

Index Terms

Research on Multi-granularity Ensemble Learning Based on Korean
1. Applied computing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification
Abstract
In this paper, we aim at improving Japanese text classification using TextCNN-based ensemble learning model. Specifically, we first construct three different sub-classifiers, combining ALBERT, RoBERTa, DistilBERT with TextCNN, respectively; and ...
Graphical abstract

Display Omitted
Highlights
- Three TextCNN-based sub-classifiers for Japanese text classification are designed.
- A Bagging ensemble learning model is proposed to combine three different subclassifiers for multi-label Japanese text classification.
- A Japanese ...
Read More
Ensemble of feature sets and classification algorithms for sentiment classification

In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and ...
Read More
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
January 2021
1142 pages
ISBN:9781450389570
DOI:10.1145/3448734

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Ensemble learning
Korean natural language processing
multi-granularity segment
text classification
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 28
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Research on Multi-granularity Ensemble Learning Based on Korean

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

ABSTRACT

References

Cited By

Index Terms

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification

Ensemble of feature sets and classification algorithms for sentiment classification

Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Research on Multi-granularity Ensemble Learning Based on Korean

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

ABSTRACT

References

Cited By

Index Terms

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification

Ensemble of feature sets and classification algorithms for sentiment classification

Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media