Skip to main content

Improving Persian Text Classification and Clustering Using Persian Thesaurus

  • Conference paper
Distributed Computing and Artificial Intelligence

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 151))

Abstract

This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results indicate the performance of text classification improves significantly in the case of employing Persian thesaurus rather the case of ignoring Persian thesaurus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 429.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 549.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. American Society of Indexers. Frequently Asked Questions Indexing. Index review in Books, Ireland, http://www.asindexing.org/site/indfaq.shtml

  2. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  3. Hamshahri newspaper, http://www.hamshahrionline.ir

  4. Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (2010) (in Persian)

    Google Scholar 

  5. Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)

    Google Scholar 

  6. Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: International Joint Conference on Artificial Intelligence, India (1999)

    Google Scholar 

  7. Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Computational Systems Bioinformatics Conference, Stanford (2005)

    Google Scholar 

  8. Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)

    Article  Google Scholar 

  9. Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)

    Google Scholar 

  10. Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)

    Google Scholar 

  11. Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (2003)

    Google Scholar 

  12. Deegan, M.: Keyword Extraction with Thesauri and Content Analysis, http://www.rlg.org/en/page.php?Page_ID=17068

  13. Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)

    Google Scholar 

  14. Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)

    Google Scholar 

  15. Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)

    Google Scholar 

  16. Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)

    Google Scholar 

  17. Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (2002) (in Persian)

    Google Scholar 

  18. Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1962) (in Persian)

    Google Scholar 

  19. Khalouei, M.: indexing machine. Journal Books 6(3) (2009) (in Persian)

    Google Scholar 

  20. Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (2005) (in Persian)

    Google Scholar 

  21. Parvin, H., Minaei-Bidgoli, B., Dahbashi, A.: Improving Persian Text Classification Using Persian Thesaurus. In: Iberoamerican Congress on Pattern Recognition, pp. 391–398 (2011)

    Google Scholar 

  22. Hori, E.: A Manual to make and develop a multilingual thesaurus, Scientific Documentation Center (2003) (in Persian)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Parvin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Parvin, H., Dahbashi, A., Parvin, S., Minaei-Bidgoli, B. (2012). Improving Persian Text Classification and Clustering Using Persian Thesaurus. In: Omatu, S., De Paz Santana, J., González, S., Molina, J., Bernardos, A., Rodríguez, J. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28765-7_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28765-7_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28764-0

  • Online ISBN: 978-3-642-28765-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics