Abstract
This paper focuses on the problem of archaeological textual information retrieval, covering various field-related topics, and investigating different issues related to special characteristics of Arabic.
The suggested hybrid retrieval approach employs various clustering and classification methods that enhances both retrieval and presentation, and infers further information from the results returned by a primary retrieval engine, which, in turn, uses Latent Semantic Analysis (LSA) as a primary retrieval method. In addition, a stemmer for Arabic words was designed and implemented to facilitate the indexing process and to enhance the quality of retrieval.
The performance of our module was measured by carrying out experiments using standard datasets, where the system showed promising results with many possibilities for future research and further development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Akritas, G., Malaschonok, G.I.: Applications of Singular-Value Decomposition. Mathematics and Computers in Simulation 67(1-2), 15–31 (2004)
Berkhin, P.: Survey of clustering data mining techniques. Tech. Rep., Accrue Software, San Jose, CA (2002)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4), 573–595 (1995)
Chen, F.G.: Building an Arabic Stemmer for Information Retrieval. In: Proc. Eleventh Text Retrieval Conference TREC 2002, Gaithersburg, Maryland, USA, pp. 19–22 (2002)
Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: 7th ACM International Conference on Information and Knowledge Management ACM-CIKM 1998, Bethesda, USA, pp. 148–155 (1998)
Fox: Lexical Analysis and Stoplists. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures. Prentice Hall, Englewood Cliffs (1992)
Frakes, B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Halabi, A.D.I., Keshishian, R., Rehawi, O.: The Archaeological Text Retrieval System. BSc. thesis, Dept. Artificial Intelligence, Faculty of Informatics, University of Aleppo (2007)
Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proc. 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1996), Zurich, Switzerland, June 1996, pp. 76–84 (1996)
Hull: Stemming algorithms – A case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Landauer, T.K., Littman, M.L.: A statistical method for language-independent representation of the topical content of text segments. In: Proc. Eleventh International Conference: Expert Systems and Their Applications, Avignon, France, vol. 8, pp. 77–85 (May 1991)
Larkey, L., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 275–282 (2002)
Larkey, L., Ballesteros, L., Connell, M.: Light Stemming for Arabic Information Retrieval. In: Soudi, A., van den Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods. Series on Text, Speech, and Language Technology. Kluwer/Springer’s (2005)
Lerman, K.: Document Clustering in Reduced Dimension Vector Space (1999) (unpublished), http://www.isi.edu/~lerman/papers/papers.html (retrieved on 13-08-2007)
Lewis, D.D.: Naive Bayes at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross-Language Information Retrieval, pp. 51–62. Kluwer Academic Publishers, Dordrecht (1998)
Littman, M.L., Jiang, F.: A Comparison of Two Corpus-Based Methods for Translingual Information Retrieval. Tech. Rep. CS-98-11, Duke University, Department of Computer Science, Durham, NC (June 1998)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, August 13. Cambridge University Press, Cambridge (2007), http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
Sahami, M.: Using Machine Learning to Improve Information Access. Ph.d. thesis, Dept. Computer Science, Stanford University (1999)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. In: Proc. AAAI 1998 Workshop on Learning for Text Categorization, Madison, Wisconsin, USA, pp. 55–62 (1998)
Schutze, H., Silverstein, C.: Projections for efficient document clustering. In: Proc. 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, Pennsylvania, USA, pp. 74–81 (1997)
Al-Sulaiti, L., Atwell, E.: Designing and Developing a Corpus of Contemporary Arabic. In: Proc. Sixth TALC Conference, Granada, Spain, p. 92 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Halabi, A., Islim, AD., Kurdi, MZ. (2010). A Hybrid Approach for Indexing and Retrieval of Archaeological Textual Information. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-15384-6_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15383-9
Online ISBN: 978-3-642-15384-6
eBook Packages: Computer ScienceComputer Science (R0)