Abstract
This paper proposes a novel approach for retrieving large-scale XML data using the vector space model. The vector space model is commonly used in the information retrieval community. Last year, for the Evaluation of XML Retrieval (INEX) 2006 Adhoc Track, we developed a system using fragment elements. The system made it possible to search over XML elements for queries with varying constraints on XML elements to be included in the search, without the need for reindexing the collection, supporting more flexible queries. However the system took significant time to unitize the fragment elements. To solve the problem, our new system is composed of an inverted-file list and a relative inverted-path list on the INEX 2007 Adhoc Track corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Extensible Markup Language (XML) 1.1, 2nd edn., http://www.w3.org/TR/xml11/
XML Path Language (XPath) Version 1.0, http://www.w3.org/TR/xpath
XQuery 1.0: An XML Query Language, http://www.w3.org/TR/xquery/
INitiative for the Evaluation of XML Retrieval (INEX), http://inex.is.informatik.uni-duisburg.de/
Clarke, C., Kamps, J., Lalmas, M.: INEX 2006, Retrieval Task and Result Submission Specification (2006), http://inex.is.informatik.uni-duisburg.de/2006/inex06/pdf/INEX06_Tasks_v1.pdf
Kazai, G., Lalmas, M.: INEX 2005, Evaluation Metrics. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 16–29. Springer, Heidelberg (2006), http://www.dcs.qmul.ac.uk/7Emounia/CV/Papers/inex-2005-metrics.pdf
Pehcevski, J., Kamps, J., Kazai, G., Lalmas, M., Ogilvie, P., Piwowarski, B., Robertson, S.: INEX 2007 Evaluation Measures (Draft) (2007), http://inex.is.informatik.uni-duisburg.de/2007/inex07/pdf/inex07-measures.pdf
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Series, pp. 1–69, 141–162. Addison-Wesley, Reading (1999)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Evans, D.A., Lefferts, R.G.: Design and evaluation of the CLARIT-TREC-2 system. In: TREC, pp. 137–150 (1994)
Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and scoring. In: SIGMOD Rec., vol. 35(4), pp. 16–23. ACM Press, New York (2006)
Shin, D., Jang, H., Jin, H.: BUS: an effective indexing and retrieval scheme in structured documents. In: DL 1998: Proceedings of the third ACM conference on Digital libraries, pp. 235–243 (1998)
Weigel, F., Meuss, H., Schulz, K.U., Bry, F.: Content and structure in indexing and ranking XML. In: WebDB 2004 Proceedings of the 7th International Workshop on the Web and Databases, pp. 67–72 (2004)
Tanioka, H.: A Method of Preferential Unification of Plural Retrieved Elements for XML Retrieval Task. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 45–56. Springer, Heidelberg (2007)
Eugen, P., Ménier, G., Marteau, P.-F.: SIRIUS XML IR System at INEX 2006: Approximate Matching of Structure and Textual Content. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 185–199. Springer, Heidelberg (2007)
Geva, S.: GPX - Gardens Point XML IR at INEX 2006. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 137–150. Springer, Heidelberg (2007)
Hatano, K., Kikutani, H., Yoshikawa, M., Uemura, S.: Determining the Retrieval Targets for XML Fragment Retrieval Systems Based on Statistical Information. The IEICE transactions on information and systems J89-D(3), 422–431 (2006)
Shimizu, T., Onizuka, M., Eda, T., Yoshikawa, M.: A Survey in Management and Stream Processing of XML Data. The IEICE transactions on information and systems J99-D, 159–184 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tanioka, H. (2008). A Fast Retrieval Algorithm for Large-Scale XML Data. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-85902-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85901-7
Online ISBN: 978-3-540-85902-4
eBook Packages: Computer ScienceComputer Science (R0)