Abstract
This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results indicate the performance of text classification improves significantly in the case of employing Persian thesaurus rather the case of ignoring Persian thesaurus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
American Society of Indexers. Frequently Asked Questions Indexing. Index review in Books, Ireland, http://www.asindexing.org/site/indfaq.shtml
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Hamshahri newspaper, http://www.hamshahrionline.ir
Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (2010) (in Persian)
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)
Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: International Joint Conference on Artificial Intelligence, India (1999)
Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Computational Systems Bioinformatics Conference, Stanford (2005)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)
Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)
Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)
Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (2003)
Deegan, M.: Keyword Extraction with Thesauri and Content Analysis, http://www.rlg.org/en/page.php?Page_ID=17068
Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)
Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)
Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)
Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)
Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (2002) (in Persian)
Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1962) (in Persian)
Khalouei, M.: indexing machine. Journal Books 6(3) (2009) (in Persian)
Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (2005) (in Persian)
Parvin, H., Minaei-Bidgoli, B., Dahbashi, A.: Improving Persian Text Classification Using Persian Thesaurus. In: Iberoamerican Congress on Pattern Recognition, pp. 391–398 (2011)
Hori, E.: A Manual to make and develop a multilingual thesaurus, Scientific Documentation Center (2003) (in Persian)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Parvin, H., Dahbashi, A., Parvin, S., Minaei-Bidgoli, B. (2012). Improving Persian Text Classification and Clustering Using Persian Thesaurus. In: Omatu, S., De Paz Santana, J., González, S., Molina, J., Bernardos, A., RodrÃguez, J. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28765-7_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-28765-7_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28764-0
Online ISBN: 978-3-642-28765-7
eBook Packages: EngineeringEngineering (R0)