Abstract
The increase in the amount of data available in digital libraries calls for the development of search engines that allow the users to find quickly and effectively what they are looking for. The XML tagging makes possible the addition of structural information in digitized content. These metadata offer new opportunities to a wide variety of new services. This paper describes the requirements that a search engine inside a digital library should fulfill and it also presents a specific XML search engine architecture. This architecture is designed to index a large amount of text with structural tagging and to be web-available. The architecture has been developed and successfully tested at the Miguel de Cervantes Digital Library.
Work partially funded by the Spanish Government through grant TIC2003-08681-C02-01.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Neumann, A., Berlea, A., Seidl, H.: Fxgrep: A XML querying tool (2000), http://www.informatik.uni-trier.de/~aberlea/Fxgrep/
Zhao, B.Y., Joseph, A.: Xset: A lightweight XML search engine for internet applications (2000), http://www.cs.berkeley.edu/~ravenben/xset/html/xset-saint.pdf
Jaakkola, J., Kipeläinen, P.: Using sgrep for querying structured text files (1996), http://www.cs.helsinki.fi/TR/C-1996/83/
Katz, H.: XQEngine - XML query engine (2003), http://xengine.sourceforge.net/
Goetz, B.: The Lucene search engine: Powerful, flexible and free (2000), http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-lucene.html
Noehring, O., Jedlicka, M.: TSep: The search engine project (2004), http://tsep.sourceforge.net/
Meier, W.: eXist: An Open Source Native XML Database. Web, Web-Services, and Database Systems, 169–183 (2002)
Doclinx: TeraXML enterprise search (2002), http://www.doclinx.com/products/ftxml.html
Liota, M.: Apache’s XIndice organizes XML data without schema (2002), http://www.devx.com/xml/article/9796
Zakharov, M.: DataparkSearch engine (2004), http://www.dataparksearch.org/
Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)
Convera: Convera Retrievalware (2004), http://www.convera.com/
Croft, W.: What do people want from information retrieval? D-Lib Magazine 1 (1995), http://www.dlib.org/dlib/november95/11croft.html
Yates, B.: Proximal nodes: a model to query document databases by content and structure. ACM Transactions on Information Systems (TOIS) 15(4), 400–435 (1997)
Canals-Marote, R., Esteve-Guillén, A., Garrido, A., Guardiola-Savall, M., Iturraspe-Bellver, A., Montserrat-Buendia, S., Ortiz-Rojas, S., Pastor-Pina, H., Pérez-Antón, P., Forcada, M.: The Spanish-Catalan machine translation system interNOSTRUM. 0922-6567 - Machine Translation VIII, 73–76 (2001)
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searching. In: The first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez-Villamil, E., Muñoz, C.G., Carrasco, R.C. (2005). XMLibrary Search: An XML Search Engine Oriented to Digital Libraries. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_8
Download citation
DOI: https://doi.org/10.1007/11551362_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28767-4
Online ISBN: 978-3-540-31931-3
eBook Packages: Computer ScienceComputer Science (R0)