Improving Persian Text Classification and Clustering Using Persian Thesaurus

Parvin, Hamid; Dahbashi, Atousa; Parvin, Sajad; Minaei-Bidgoli, Behrouz

doi:10.1007/978-3-642-28765-7_59

Hamid Parvin⁷,
Atousa Dahbashi⁷,
Sajad Parvin⁷ &
…
Behrouz Minaei-Bidgoli⁷

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 151))

1908 Accesses
4 Citations

Abstract

This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results indicate the performance of text classification improves significantly in the case of employing Persian thesaurus rather the case of ignoring Persian thesaurus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 429.00; Price excludes VAT (USA)

Softcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

American Society of Indexers. Frequently Asked Questions Indexing. Index review in Books, Ireland, http://www.asindexing.org/site/indfaq.shtml
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
MathSciNet Google Scholar
Hamshahri newspaper, http://www.hamshahrionline.ir
Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (2010) (in Persian)
Google Scholar
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)
Google Scholar
Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: International Joint Conference on Artificial Intelligence, India (1999)
Google Scholar
Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Computational Systems Bioinformatics Conference, Stanford (2005)
Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)
Article Google Scholar
Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)
Google Scholar
Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)
Google Scholar
Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (2003)
Google Scholar
Deegan, M.: Keyword Extraction with Thesauri and Content Analysis, http://www.rlg.org/en/page.php?Page_ID=17068
Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)
Google Scholar
Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)
Google Scholar
Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)
Google Scholar
Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)
Google Scholar
Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (2002) (in Persian)
Google Scholar
Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1962) (in Persian)
Google Scholar
Khalouei, M.: indexing machine. Journal Books 6(3) (2009) (in Persian)
Google Scholar
Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (2005) (in Persian)
Google Scholar
Parvin, H., Minaei-Bidgoli, B., Dahbashi, A.: Improving Persian Text Classification Using Persian Thesaurus. In: Iberoamerican Congress on Pattern Recognition, pp. 391–398 (2011)
Google Scholar
Hori, E.: A Manual to make and develop a multilingual thesaurus, Scientific Documentation Center (2003) (in Persian)
Google Scholar

Download references

Author information

Authors and Affiliations

Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin, Atousa Dahbashi, Sajad Parvin & Behrouz Minaei-Bidgoli

Authors

Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Atousa Dahbashi
View author publications
You can also search for this author in PubMed Google Scholar
Sajad Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Behrouz Minaei-Bidgoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Parvin .

Editor information

Editors and Affiliations

Dept. Computer Science &, Intelligent Systems, Osaka Prefecture University, Gakuen-chu 1-1, Sakai, Osaka, 599-8531, Japan
Sigeru Omatu
, Department of Computing Science, University of Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Juan F. De Paz Santana
, Department of Computing Science, University of Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Sara Rodríguez González
Escuela Politécnica Superior (EPS), Depto. Informática, Universidad Carlos III de Madrid, Avenida de la Universidad Carloss III 22, Madrid, 28270, Spain
Jose M. Molina
, Data Processing and Simulation Group, Universidad Politécnica de Madrid, Calle de Ramiro de Maeztu, 7, Madrid, 28040, Spain
Ana M. Bernardos
Faculty of Science, Department of Computing Science, University of Salamanca, Plaza de la Merced S/N, Salamanca, 37008, Spain
Juan M. Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parvin, H., Dahbashi, A., Parvin, S., Minaei-Bidgoli, B. (2012). Improving Persian Text Classification and Clustering Using Persian Thesaurus. In: Omatu, S., De Paz Santana, J., González, S., Molina, J., Bernardos, A., Rodríguez, J. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28765-7_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-28765-7_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28764-0
Online ISBN: 978-3-642-28765-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics