Abstract
Fuzzy logic is the core method for handling uncertainty and vagueness of information in agricultural natural language processing, and it also plays a crucial role in word segmentation and text classification algorithms using the neural network. Word segmentation is often the primary step in Chinese text classification tasks and has a profound effect on the generation ability of classification algorithm-based fuzzy logic. However, the high complexity of text classification models structure and specificity of agricultural data take a great challenge to studying the effect of word segmentation. Although there have been several attempts to resolve this issue, the main effort focuses on word segment Precision or the generalization performance of multiple word segment methods for the same classification algorithm and does not involve agricultural text. To solve this problem from the perspective of rational analysis and empirical analysis, a comprehensive analysis has been made to study the effect of Chinese word segmentation on fuzzy-based classification algorithms for agricultural questions. It initially discusses the characteristics of agricultural questions for the subsequent analysis of the field adaptability of word segmentation and classification algorithms, employs fuzzy logic to convert the Chinese word segmentation task into a sequence labeling problem, and then analyzes the characteristics, techniques, and performance disparities of the seven mainstream open-source Chinese word segmentation integration tools at the current stage. Subsequently, an exploration has been conducted into the impact of Chinese word segmentation on the generalization performance of classification algorithms under the proposed unified model framework for text classification based on fuzzy logic. Finally, many experiments have been performed on the actual data crawled from typical agricultural websites to empirically study the differences and robustness of the effect of different word segmentation tools on classification performance, as well as the contribution of the external dictionary. Comparative experimental results show which word segmentation tools have a solid effect on classification performance and a strong robust effect on the typical text feature extraction layer for classification tasks, and the external dictionary have no significant effect on classification performance. The research results have essential reference significance for how to select appropriate word segmentation tools to deal with Chinese natural language processing tasks in future.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li, W., Zhan, T.: Multi-granularity probabilistic rough fuzzy sets for interval-valued fuzzy decision systems. Int. J. Fuzzy Syst. 25, 3061–3073 (2023)
Pandya, B., Pourabdollah, A., Lotfi, A.: A comparative study of stand-alone and cloud-based fuzzy logic systems for human fall detection. Int. J. Fuzzy Syst. 25(3), 951–965 (2023)
Wanzala, J.N., Atim, M.R., Obungoloch, J.: Design of fuzzy logic-based ARDS Berlin definition for ventilator adjustments to ensure lung protection. Int. J. Fuzzy Syst. 25(5), 1–17 (2023)
Zhang, C., Li, D., Liang, J.: Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf. Sci. 507, 665–683 (2020)
Zhang, C., Ding, J., Zhan, J., Sangaiah, A.K., Li, D.: Fuzzy intelligence learning based on bounded rationality in IoMT systems: a case study in Parkinson’s disease. IEEE Trans. Comput. Soc. Syst. 10(4), 1607–1621 (2023)
Gupta, C., Jain, A., Joshi, N.: Fuzzy logic in natural language processing-a closer view. Procedia Comput. Sci. 132, 1375–1384 (2018)
Omoregbe, N.A., Ndaman, I.O., Misra, S., Abayomi-Alli, O.O., Damaševičius, R., Dogra, A.: Text messaging-based medical diagnosis using natural language processing and fuzzy logic. J. Healthc. Eng. 2020, 1–14 (2020)
Li, W., Zhai, S., Xu, W., Pedrycz, W., Qian, Y., Ding, W., Zhan, T.: Feature selection approach based on improved fuzzy c-means with principle of refined justifiable granularity. IEEE Trans. Fuzzy Syst. 31(7), 2112–2126 (2023)
Madani, Y., Erritali, M., Bengourram, J., Sailhan, F.: A multilingual fuzzy approach for classifying twitter data using fuzzy logic and semantic similarity. Neural Comput. Appl. 32, 8655–8673 (2020)
Gu, X., Xia, K., Jiang, Y., Jolfaei, A.: Multi-task fuzzy clustering-based multi-task tsk fuzzy system for text sentiment classification. Trans. Asian Low-Resour. Lang. Inf. Process. 21(2), 1–24 (2021)
Jain, G., Lobiyal, D.: Word sense disambiguation using cooperative game theory and fuzzy Hindi wordnet based on conceptnet. Trans. Asian Low-Resour. Lang. Inf. Process. 21(4), 1–25 (2022)
Lai, L., Wu, C., Lin, P., Huang, L.: Developing a fuzzy search engine based on fuzzy ontology and semantic search. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 2684–2689. IEEE, Taipei, Taiwan (2011)
Li, M., Li, Y., Peng, Q., Wang, J., Yu, C.: Evaluating community question-answering websites using interval-valued intuitionistic fuzzy DANP and TODIM methods. Appl. Soft Comput. 99, 106918 (2021)
Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1193–1203. Association for Computational Linguistics, Vancouver, Canada (2017)
Cai, D., Zhao, H., Zhang, Z., Xin, Y., Wu, Y., Huang, F.: Fast and accurate neural word segmentation for Chinese. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 608–615. Association for Computational Linguistics, Vancouver, Canada (2017)
Liu, S., He, T., Dai, J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mob. Netw. Appl. 26, 1891–1903 (2021)
Yang, M., Liu, S., Chen, K., Zhang, H., Zhao, E., Zhao, T.: A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans. Fuzzy Syst. 28(5), 992–1002 (2020)
Du, Y., Huo, H.: News text summarization based on multi-feature and fuzzy logic. IEEE Access 8, 140261–140272 (2020)
Moldovan, D., Paşca, M., Harabagiu, S., Surdeanu, M.: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2), 133–154 (2003)
Pintas, J.T., Fernandes, L.A., Garcia, A.C.B.: Feature selection methods for text classification: a systematic literature review. Artif. Intell. Rev. 54(8), 6149–6200 (2021)
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Liu, H., Burnap, P., Alorainy, W., Williams, M.L.: A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Trans. Comput. Soc. Syst. 6(2), 227–240 (2019)
Asgarnezhad, R., Monadjemi, S.A., Soltanaghaei, M.: Fahpbep: a fuzzy analytic hierarchy process framework in text classification. Majlesi J. Electr. Eng. 14(3), 111–123 (2020)
Lu, X.S., Zhou, M., Wu, K.: A novel fuzzy logic-based text classification method for tracking rare events on twitter. IEEE Trans. Syst. Man Cybern.: Syst. 51(7), 4324–4333 (2019)
Soares, M.A.C., Parreiras, F.S.: A literature review on question answering techniques, paradigms and systems. J. King Saud Univ.-Comput. Inf. Sci. 32(6), 635–646 (2020)
Dimitrakis, E., Sgontzos, K., Tzitzikas, Y.: A survey on question answering systems over linked data and documents. J. Intell. Inf. Syst. 55, 233–259 (2020)
Zulqarnain, M., Alsaedi, A.K.Z., Ghazali, R., Ghouse, M.G., Sharif, W., Husaini, N.A.: A comparative analysis on question classification task based on deep learning approaches. PeerJ Comput. Sci. 7, 570 (2021)
Huang, K., Fu, S.: Some related problems faced by the application of it in information retrieval. Data Anal. Knowl. Discov., pp. 26–29 (2001)
Liu, Y., Zhang, S., Wang, Y., Xie, Y.: Speech recognition method based on multi-task loss with additional language model. J. Jiangsu Univ. (Nat. Sci. Ed.) 44, 564–569 (2023)
Li, F., Fu, D.: Sentiment analysis method of financial text based on transformer encoder. Electron. Sci. Technol. 33, 10–15 (2020)
Jin, N., Chunjiang, Z., Wu, H., Yisheng, M., Li, S., Baozhu, Y.: Classification technology of agricultural questions based on bigru_mulcnn. Trans. Chin. Soc. Agric. Mach. 51(5), 199–206 (2020)
Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., Li, J.: Is word segmentation necessary for deep learning of Chinese representations? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3242–3252. Association for Computational Linguistics, Florence, Italy (2019)
Sun, X., Wang, H., Li, W.: Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 253–262. Association for Computational Linguistics, Jeju Island, Korea (2012)
Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)
Zhang, H., Shang, J.: Nlpir-parser: an intelligent semantic analysis toolkit for big data. Corpus Linguist. 6(1), 87–104 (2019)
He, H., Choi, J.D.: The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 5555–5577. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Che, W., Feng, Y., Qin, L., Liu, T.: N-LTP: An open-source neural language technology platform for Chinese. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 42–49. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2873–2879. AAAI Press, New York, USA (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273. AAAI Press, Austin, Texas (2015)
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 207–212. Association for Computational Linguistics, Berlin, Germany (2016)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 427–431. Association for Computational Linguistics, Valencia, Spain (2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)
Drury, B., Roche, M.: A survey of the applications of text mining for agriculture. Comput. Electron. Agric. 163, 104864 (2019)
Zhang, X.: The past life of the input method. China Internet, pp. 54–55 (2009)
Demšar, J., Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgements
This research was supported by Shandong Provincial Natural Science Foundation, China, grant number ZR2020MF146.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
Shandong Provincial Natural Science Foundation, China: ZR2020MF146.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, X., Huang, J., Zhang, J. et al. The Comprehensive Analysis of the Effect of Chinese Word Segmentation on Fuzzy-Based Classification Algorithms for Agricultural Questions. Int. J. Fuzzy Syst. 26, 2726–2749 (2024). https://doi.org/10.1007/s40815-024-01724-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-024-01724-0