Skip to main content

Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Abstract

This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The official web site of the Library of Congress (Retrieved December 4, 2006), http://www.loc.gov/about/facts.html

  2. Kanungo, T., Marton, G.A., Bulbul, O.: OmniPage vs. Sakhr: Paired model evaluation of two Arabic OCR products. In: Proc. of SPIE Conf. on Document Recognition and Retrieval (1999)

    Google Scholar 

  3. Al-Kharashi, I.A., Evans, M.W.: Comparing words, stems, and roots as index terms in an Arabic information retrieval system. Journal of the American Society for Information Science (JASIS) 5(8), 548–560 (1994)

    Article  Google Scholar 

  4. Abu-Salem, H., Al-Omari, M., Evens, M.: Stemming Methodologies over Individual Query Words for an Arabic Information Retrieval System. JASIS 50(6), 524–529 (1999)

    Article  Google Scholar 

  5. Beesley, K.: Arabic Morphological Analysis on the Internet. In: Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, Cambridge (1998)

    Google Scholar 

  6. Aljlayl, M., Frieder, O.: On Arabic Search: Improving the Retrieval Effectiveness Via Light Stemming Approach. In: Proceeding the 11th ACM International Conference on Information and Knowledge Management, Illions Institute of Technology, pp. 340–347. ACM Press, New York

    Google Scholar 

  7. Khoja, S., Garside, R.: Stemming Arabic Text. Computing Department, Lancaster University, UK (Retrieved, April 2007), http://zeus.cs.pacificu.edu/shereen/research.htm

  8. Larkey, L.S., Connell, M.E.: Arabic Information Retrieval at Umass in TREC-10. In: Text REtrieval Conference (2001)

    Google Scholar 

  9. Hunston, S.: Corpora in applied linguistics. Cambridge University Press, Cambridge (2002)

    Book  Google Scholar 

  10. Hmeidi, I., Kanaan, G., Evens, M.: Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents. Journal of the American Society for Information Science 48(10), 867–881 (1997)

    Article  Google Scholar 

  11. Goweder, A., De Roeck, A.: Assessment of a significant Arabic corpus. In: The Arabic NLP Workshop at ACL/EACL 2001, Toulouse, France (2001)

    Google Scholar 

  12. Darwish, K., Doermann, D., Jones, R., Oard, D., Rautiainen, M.: TREC-10 experiments at University of Maryland CLIR and video. In: Text RE-trieval Conference TREC10 Proceedings, Gaithersburg, MD, pp. 549–562 (2001)

    Google Scholar 

  13. Pirkola, A.: The Effects of Query Structure and Dictionary Setups in a Dictionary-based Cross-Language Information Retrieval. In: SIGIR 1998, Melbourne, Australia (1998)

    Google Scholar 

  14. Oard, D.: A Comparative Study of Query and Document Translation for Cross-language Information Retrieval. In: Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, pp. 472–483 (1998)

    Google Scholar 

  15. Davis, M.W., Dunning, T.E.: Query Translation Using Evolutionary Programming for Multi-lingual Information Retrieval. In: Proceedings of the Fifth Annual Conference on Evolutionary Programming (1995)

    Google Scholar 

  16. Landauer, T.K., Dumais, S.T., Littman, M.L.: Full Automatic Cross-Language Document Retrieval using Latent Semantic Indexing. In: 1996, update of the original paper on the 6th Conf. of UW center for New OED and Text Research, pp. 31–38 (1990)

    Google Scholar 

  17. Sheridan, P., Ballerini, J.P.: Experiments in Multilingual Information Retrieval using the SPIDER System. In: The 19th Annual International ACM SIGIR 1996, pp. 58–65 (1996)

    Google Scholar 

  18. Adriani, M., Croft, W.: The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval, CLIR Technical Report IR-170, University of Massachusetts, Amherst (1997)

    Google Scholar 

  19. Ballesteros, L., Croft, B.: Dictionary Methods for Cross-Lingual Information Retrieval. In: 7th DEXA Conf. on Database and Expert Systems Applications, pp. 791–801 (1996)

    Google Scholar 

  20. Ballesteros, L., Croft, B.: Phrasal Translation and Query Expansion Techniques for Cross-language Information Retrieval. In: SIGIR 1997, pp. 84–91 (1997)

    Google Scholar 

  21. Xu, J., Croft, W.B.: Query Expansion using Local and Global Document Analysis. In: The 19th Annual International ACM SIGIR 1996, Zurich, Switzerland, pp. 4–11 (1996)

    Google Scholar 

  22. Ballesteros, L., Croft, B.: Resolving Ambiguity for Cross-Language Retrieval. In: SIGIR 1998, pp. 64–71 (1998)

    Google Scholar 

  23. Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval Via Machine-Readable Dictionaries and Machine Translation, Information Retrieval Laboratory, Illinois Institute of Technology (2002)

    Google Scholar 

  24. The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, http://trec.nist.gov/

  25. Hasnah, A., Evens, M.: Arabic/English Cross Language Information Retrieval Using a Bilingual Dictionary, Department of Computer Science University- Qatar, and Illinois Institute of Technology

    Google Scholar 

  26. Darwish, K.: Probabilistic Methods for Searching OCR-Degraded Arabic Text, A PhD Dissertation, University of Maryland, College Park (2003)

    Google Scholar 

  27. Elaraby Ahmed, M.A.M.: A Large-Scale Computational Processor of the Arabic Morphology, and Applications, M.Sc. Thesis, Cairo University, Faculty of Engineering, pp. 37–39 (2000)

    Google Scholar 

  28. (Retrieved December 4, 2006), http://www.moheet.com

  29. Fellbaum, C.: WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  30. Adly, A.: Senior Translation Consultant

    Google Scholar 

  31. Retrieved December 4, 2008, http://www.arabeyes.org

  32. Last time visited April 15, 2007, http://sourceforge.net/project/showfiles.php?group_id=34866&package_id=93898

  33. Last time visited April 15, 2007, http://crl.nmsu.edu/Resources/lang_res/arabic.html

  34. Last time visited April 15, 2007, http://wordnet.princeton.edu/

  35. Last time visited April 15, 2007, http://dictionary.Sakhr.com/

  36. AbdulJaleel, N., Larkey, L.S.: Statistical Transliteration for English-Arabic Cross Language Information Retrieval. In: Proceedings of the twelfth international conference on Information and knowledge management table of contents, New Orleans, LA, USA (2003)

    Google Scholar 

  37. WordNet documentations, MORHY (7N), Princeton University, Cognitive Science Laboratory (January 2005), http://wordnet.princeton.edu/

  38. Rice, S.V., Kanai, J., Nartker, T.A.: The 3rd Annual Test of OCR Accuracy, TR 94-03, ISRI, University of Nevada, Las Vegas (April 1994)

    Google Scholar 

  39. Adobe Company, http://www.adobe.com

  40. Sakhr Software, http://www.Sakhr.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elghazaly, T., Fahmy, A. (2009). Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics