The Comprehensive Analysis of the Effect of Chinese Word Segmentation on Fuzzy-Based Classification Algorithms for Agricultural Questions

Zhao, Xinyue; Huang, Jianing; Zhang, Jing; Song, Yunsheng

doi:10.1007/s40815-024-01724-0

The Comprehensive Analysis of the Effect of Chinese Word Segmentation on Fuzzy-Based Classification Algorithms for Agricultural Questions

Published: 20 May 2024

Volume 26, pages 2726–2749, (2024)
Cite this article

International Journal of Fuzzy Systems Aims and scope Submit manuscript

Xinyue Zhao¹,
Jianing Huang¹,
Jing Zhang¹ &
…
Yunsheng Song^1,2

228 Accesses
Explore all metrics

Abstract

Fuzzy logic is the core method for handling uncertainty and vagueness of information in agricultural natural language processing, and it also plays a crucial role in word segmentation and text classification algorithms using the neural network. Word segmentation is often the primary step in Chinese text classification tasks and has a profound effect on the generation ability of classification algorithm-based fuzzy logic. However, the high complexity of text classification models structure and specificity of agricultural data take a great challenge to studying the effect of word segmentation. Although there have been several attempts to resolve this issue, the main effort focuses on word segment Precision or the generalization performance of multiple word segment methods for the same classification algorithm and does not involve agricultural text. To solve this problem from the perspective of rational analysis and empirical analysis, a comprehensive analysis has been made to study the effect of Chinese word segmentation on fuzzy-based classification algorithms for agricultural questions. It initially discusses the characteristics of agricultural questions for the subsequent analysis of the field adaptability of word segmentation and classification algorithms, employs fuzzy logic to convert the Chinese word segmentation task into a sequence labeling problem, and then analyzes the characteristics, techniques, and performance disparities of the seven mainstream open-source Chinese word segmentation integration tools at the current stage. Subsequently, an exploration has been conducted into the impact of Chinese word segmentation on the generalization performance of classification algorithms under the proposed unified model framework for text classification based on fuzzy logic. Finally, many experiments have been performed on the actual data crawled from typical agricultural websites to empirically study the differences and robustness of the effect of different word segmentation tools on classification performance, as well as the contribution of the external dictionary. Comparative experimental results show which word segmentation tools have a solid effect on classification performance and a strong robust effect on the typical text feature extraction layer for classification tasks, and the external dictionary have no significant effect on classification performance. The research results have essential reference significance for how to select appropriate word segmentation tools to deal with Chinese natural language processing tasks in future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic categorization of web text documents using fuzzy inference rule

Article 27 June 2020

Fusion of Root and Affix Information with Pre-trained Language Models for Text Classification

Open-Domain Question Answering with Topic Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Li, W., Zhan, T.: Multi-granularity probabilistic rough fuzzy sets for interval-valued fuzzy decision systems. Int. J. Fuzzy Syst. 25, 3061–3073 (2023)
Article MATH Google Scholar
Pandya, B., Pourabdollah, A., Lotfi, A.: A comparative study of stand-alone and cloud-based fuzzy logic systems for human fall detection. Int. J. Fuzzy Syst. 25(3), 951–965 (2023)
Article Google Scholar
Wanzala, J.N., Atim, M.R., Obungoloch, J.: Design of fuzzy logic-based ARDS Berlin definition for ventilator adjustments to ensure lung protection. Int. J. Fuzzy Syst. 25(5), 1–17 (2023)
Article Google Scholar
Zhang, C., Li, D., Liang, J.: Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf. Sci. 507, 665–683 (2020)
Article MathSciNet MATH Google Scholar
Zhang, C., Ding, J., Zhan, J., Sangaiah, A.K., Li, D.: Fuzzy intelligence learning based on bounded rationality in IoMT systems: a case study in Parkinson’s disease. IEEE Trans. Comput. Soc. Syst. 10(4), 1607–1621 (2023)
Article MATH Google Scholar
Gupta, C., Jain, A., Joshi, N.: Fuzzy logic in natural language processing-a closer view. Procedia Comput. Sci. 132, 1375–1384 (2018)
Article MATH Google Scholar
Omoregbe, N.A., Ndaman, I.O., Misra, S., Abayomi-Alli, O.O., Damaševičius, R., Dogra, A.: Text messaging-based medical diagnosis using natural language processing and fuzzy logic. J. Healthc. Eng. 2020, 1–14 (2020)
Article Google Scholar
Li, W., Zhai, S., Xu, W., Pedrycz, W., Qian, Y., Ding, W., Zhan, T.: Feature selection approach based on improved fuzzy c-means with principle of refined justifiable granularity. IEEE Trans. Fuzzy Syst. 31(7), 2112–2126 (2023)
Article MATH Google Scholar
Madani, Y., Erritali, M., Bengourram, J., Sailhan, F.: A multilingual fuzzy approach for classifying twitter data using fuzzy logic and semantic similarity. Neural Comput. Appl. 32, 8655–8673 (2020)
Article Google Scholar
Gu, X., Xia, K., Jiang, Y., Jolfaei, A.: Multi-task fuzzy clustering-based multi-task tsk fuzzy system for text sentiment classification. Trans. Asian Low-Resour. Lang. Inf. Process. 21(2), 1–24 (2021)
Google Scholar
Jain, G., Lobiyal, D.: Word sense disambiguation using cooperative game theory and fuzzy Hindi wordnet based on conceptnet. Trans. Asian Low-Resour. Lang. Inf. Process. 21(4), 1–25 (2022)
Article MATH Google Scholar
Lai, L., Wu, C., Lin, P., Huang, L.: Developing a fuzzy search engine based on fuzzy ontology and semantic search. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 2684–2689. IEEE, Taipei, Taiwan (2011)
Li, M., Li, Y., Peng, Q., Wang, J., Yu, C.: Evaluating community question-answering websites using interval-valued intuitionistic fuzzy DANP and TODIM methods. Appl. Soft Comput. 99, 106918 (2021)
Article Google Scholar
Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1193–1203. Association for Computational Linguistics, Vancouver, Canada (2017)
Cai, D., Zhao, H., Zhang, Z., Xin, Y., Wu, Y., Huang, F.: Fast and accurate neural word segmentation for Chinese. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 608–615. Association for Computational Linguistics, Vancouver, Canada (2017)
Liu, S., He, T., Dai, J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mob. Netw. Appl. 26, 1891–1903 (2021)
Article MATH Google Scholar
Yang, M., Liu, S., Chen, K., Zhang, H., Zhao, E., Zhao, T.: A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans. Fuzzy Syst. 28(5), 992–1002 (2020)
Article MATH Google Scholar
Du, Y., Huo, H.: News text summarization based on multi-feature and fuzzy logic. IEEE Access 8, 140261–140272 (2020)
Article MATH Google Scholar
Moldovan, D., Paşca, M., Harabagiu, S., Surdeanu, M.: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2), 133–154 (2003)
Article MATH Google Scholar
Pintas, J.T., Fernandes, L.A., Garcia, A.C.B.: Feature selection methods for text classification: a systematic literature review. Artif. Intell. Rev. 54(8), 6149–6200 (2021)
Article MATH Google Scholar
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Article MATH Google Scholar
Liu, H., Burnap, P., Alorainy, W., Williams, M.L.: A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Trans. Comput. Soc. Syst. 6(2), 227–240 (2019)
Article Google Scholar
Asgarnezhad, R., Monadjemi, S.A., Soltanaghaei, M.: Fahpbep: a fuzzy analytic hierarchy process framework in text classification. Majlesi J. Electr. Eng. 14(3), 111–123 (2020)
Google Scholar
Lu, X.S., Zhou, M., Wu, K.: A novel fuzzy logic-based text classification method for tracking rare events on twitter. IEEE Trans. Syst. Man Cybern.: Syst. 51(7), 4324–4333 (2019)
Article MATH Google Scholar
Soares, M.A.C., Parreiras, F.S.: A literature review on question answering techniques, paradigms and systems. J. King Saud Univ.-Comput. Inf. Sci. 32(6), 635–646 (2020)
MATH Google Scholar
Dimitrakis, E., Sgontzos, K., Tzitzikas, Y.: A survey on question answering systems over linked data and documents. J. Intell. Inf. Syst. 55, 233–259 (2020)
Article Google Scholar
Zulqarnain, M., Alsaedi, A.K.Z., Ghazali, R., Ghouse, M.G., Sharif, W., Husaini, N.A.: A comparative analysis on question classification task based on deep learning approaches. PeerJ Comput. Sci. 7, 570 (2021)
Article Google Scholar
Huang, K., Fu, S.: Some related problems faced by the application of it in information retrieval. Data Anal. Knowl. Discov., pp. 26–29 (2001)
Liu, Y., Zhang, S., Wang, Y., Xie, Y.: Speech recognition method based on multi-task loss with additional language model. J. Jiangsu Univ. (Nat. Sci. Ed.) 44, 564–569 (2023)
MATH Google Scholar
Li, F., Fu, D.: Sentiment analysis method of financial text based on transformer encoder. Electron. Sci. Technol. 33, 10–15 (2020)
MATH Google Scholar
Jin, N., Chunjiang, Z., Wu, H., Yisheng, M., Li, S., Baozhu, Y.: Classification technology of agricultural questions based on bigru_mulcnn. Trans. Chin. Soc. Agric. Mach. 51(5), 199–206 (2020)
Google Scholar
Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., Li, J.: Is word segmentation necessary for deep learning of Chinese representations? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3242–3252. Association for Computational Linguistics, Florence, Italy (2019)
Sun, X., Wang, H., Li, W.: Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 253–262. Association for Computational Linguistics, Jeju Island, Korea (2012)
Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)
Article MATH Google Scholar
Zhang, H., Shang, J.: Nlpir-parser: an intelligent semantic analysis toolkit for big data. Corpus Linguist. 6(1), 87–104 (2019)
MATH Google Scholar
He, H., Choi, J.D.: The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 5555–5577. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Che, W., Feng, Y., Qin, L., Liu, T.: N-LTP: An open-source neural language technology platform for Chinese. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 42–49. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2873–2879. AAAI Press, New York, USA (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273. AAAI Press, Austin, Texas (2015)
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 207–212. Association for Computational Linguistics, Berlin, Germany (2016)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 427–431. Association for Computational Linguistics, Valencia, Spain (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)
Drury, B., Roche, M.: A survey of the applications of text mining for agriculture. Comput. Electron. Agric. 163, 104864 (2019)
Article MATH Google Scholar
Zhang, X.: The past life of the input method. China Internet, pp. 54–55 (2009)
Demšar, J., Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported by Shandong Provincial Natural Science Foundation, China, grant number ZR2020MF146.

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Agricultural University, Daizong Street, Taian, 271018, Shandong, China
Xinyue Zhao, Jianing Huang, Jing Zhang & Yunsheng Song
Key Laboratory of Huang-Huai-Hai Smart Agricultural Technology of Ministry of Agriculture and Rural Affars, Shandong Agricultural University, Daizong Street, Taian, 271018, Shandong, China
Yunsheng Song

Authors

Xinyue Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Jianing Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yunsheng Song
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yunsheng Song.

Ethics declarations

Funding

Shandong Provincial Natural Science Foundation, China: ZR2020MF146.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, X., Huang, J., Zhang, J. et al. The Comprehensive Analysis of the Effect of Chinese Word Segmentation on Fuzzy-Based Classification Algorithms for Agricultural Questions. Int. J. Fuzzy Syst. 26, 2726–2749 (2024). https://doi.org/10.1007/s40815-024-01724-0

Download citation

Received: 11 November 2023
Revised: 16 February 2024
Accepted: 01 March 2024
Published: 20 May 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s40815-024-01724-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Comprehensive Analysis of the Effect of Chinese Word Segmentation on Fuzzy-Based Classification Algorithms for Agricultural Questions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic categorization of web text documents using fuzzy inference rule

Fusion of Root and Affix Information with Pre-trained Language Models for Text Classification

Open-Domain Question Answering with Topic Clustering

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now