Abstract
Calculation of text similarity is an essential task for the text analysis and classification. It be can based, e.g., on Jaccard, cosine or other similar measures. Such measures consider the text as a bag-of-words and, therefore, lose some syntactic and semantic features of its sentences. This article presents a different measure based on the so-called artificial sentence pattern (ASP) method. This method has been developed to analyze texts in the Polish language which has very rich inflection. Therefore, ASP has utilized syntactic and semantic rules of the Polish language. Nevertheless, we argue that it admits extensions to other languages. As a result of the analysis, we have obtained several hypernodes which contain the most important words. Each hypernode corresponds to one of the examined documents, the latter being published papers from agriculture domain written in Polish. Experimental results obtained from that set of papers have been described and discussed. Those results have been visually illustrated using graphs of hypernodes and compared with Jaccard and cosine measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jurafski, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice Hall, Englewood Cliffs (2008)
Encyclopedia Britannica: Pattern recognition. https://www.britannica.com/technology/pattern-recognition-computer-science
Indurkhya, N., Damerau, F.J.: Handbook of Natural Language Processing, 2nd edn. Chapman & Hall CRC Press, Boca Raton (2010)
Kornai, A.: Mathematical Linguistics. Springer, London (2008). https://doi.org/10.1007/978-1-84628-986-6
Kocaleva, M., Stojanov, D., Stojanovik, I., Zdravev, Z.: Pattern recognition and natural language processing. State of the art. TEM J. 5(2), 236–240 (2016)
Clopinet, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Bellegarda, J.R.: Statistical language models with embedded latent semantic knowledge in pattern recognition. In: Chou, W., Juang, B.-H. (eds.) Speech and Language Processing. Electrical Engineering & Applied Signal Processing Series, 1st edn. CRC Press, Boca Raton (2003)
Wu, Q., Fuller, E., Zhang, C.Q.: Graph model for pattern recognition in text. In: Ting, I.H., Wu, H.J., Ho, T.H. (eds.) Mining and Analyzing Social Networks. SCI, vol. 288, pp. 1–20. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13422-7_1
Huynh, D., Tran, D., Ma, W., Sharma, D.: Grammatical dependency-based relations for term weighting in text classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6634, pp. 476–487. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20841-6_39
Ożdżyński, P., Zakrzewska, D.: Using frequent pattern mining algorithms in text analysis. Inf. Syst. Manag. 6(3), 213–222 (2017)
Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)
Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR 2006 Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 485–492 (2006)
Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)
Atoum, I., Otoom, A., Kulthuramaiyer, N.: A comprehensive comparative study of word and sentence similarity measures. Int. J. Comput. Appl. 135(1), 10–17 (2016)
Lin, D.: An information-theoretic definition of similarity. In: ICML 1998 Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann (1998)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
The Polish language dictionary (Słownik Języka Polskiego) Homepage. sjp.pl. Accessed 10 Mar 2018
Wrzeciono, P., Karwowski, W.: Automatic indexing and creating semantic networks for agricultural science papers in the Polish language. In: 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, COMPSACW 2013, Kyoto, Japan, 22–26 July 2013, pp. 356–360 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wrzeciono, P., Karwowski, W. (2018). Pattern Recognition Method for Classification of Agricultural Scientific Papers in Polish. In: Chmielewski, L., Kozera, R., Orłowski, A., Wojciechowski, K., Bruckstein, A., Petkov, N. (eds) Computer Vision and Graphics. ICCVG 2018. Lecture Notes in Computer Science(), vol 11114. Springer, Cham. https://doi.org/10.1007/978-3-030-00692-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-00692-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00691-4
Online ISBN: 978-3-030-00692-1
eBook Packages: Computer ScienceComputer Science (R0)