Abstract
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angioni, M., Demontis, R., Tuveri, F.: A semantic approach for resource cataloguing and query resolution. Communications of SIWN. Special Issue on Distributed Agent-based Retrieval Tools 5, 62–66 (2008)
Bradford, R.B.: An empirical study of required dimensionality for large-scale latent semantic indexing applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 153–162. ACM, New York (2008)
Deerwester, S.: Improving Information Retrieval with Latent Semantic Indexing. In: Borgman, C.L., Pai, E.Y.H. (eds.) Proceedings of the 51st ASIS Annual Meeting (ASIS 1988), Atlanta, Georgia, vol. 25. American Society for Information Science (October 1988)
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. In: Machine Learning, pp. 143–175 (2001)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Ferilli, S.: Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques, 1st edn. Springer Publishing Company, Incorporated (2011)
Ferilli, S., Biba, M., Di Mauro, N., Basile, T.M.A., Esposito, F.: Plugging Taxonomic Similarity in First-Order Logic Horn Clauses Comparison. In: Serra, R., Cucchiara, R. (eds.) AI*IA 2009. LNCS (LNAI), vol. 5883, pp. 131–140. Springer, Heidelberg (2009)
Jones, W.P., Furnas, G.W.: Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science 38(6), 420–442 (1987)
Karypis, G., Han, E.-H.(S.): Concept indexing: A fast dimensionality reduction algorithm with applications to document retrieval and categorization. Technical report, In CIKM 2000 (2000)
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems, vol. 15. MIT Press (2003)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Magnini, B., Cavaglià, G.: Integrating subject field codes into wordnet, pp. 1413–1418 (2000)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC (1994)
Salton, G.: The SMART Retrieval System–Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)
Salton, G.: Automatic term class construction using relevance–a summary of work in automatic pseudoclassification. Inf. Process. Manage. 16(1), 1–15 (1980)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company (1984)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Singhal, A., Buckley, C., Mitra, M., Mitra, A.: Pivoted document length normalization, pp. 21–29. ACM Press (1996)
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, electronic proceedings (May 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rotella, F., Ferilli, S., Leuzzi, F. (2013). A Domain Based Approach to Information Retrieval in Digital Libraries. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds) Digital Libraries and Archives. IRCDL 2012. Communications in Computer and Information Science, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35834-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-35834-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35833-3
Online ISBN: 978-3-642-35834-0
eBook Packages: Computer ScienceComputer Science (R0)