Encoding XML in Vector Spaces

Kakade, Vinay; Raghavan, Prabhakar

doi:10.1007/978-3-540-31865-1_8

Vinay Kakade¹⁸ &
Prabhakar Raghavan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

European Conference on Information Retrieval

4765 Accesses

Abstract

We develop a framework for representing XML documents and queries in vector spaces and build indexes for processing text-centric semi-structured queries that support a proximity measure between XML documents. The idea of using vector spaces for XML retrieval is not new. In this paper we (i) unify prior approaches into a single framework; (ii) develop techniques to eliminate special purpose auxiliary computations (outside the vector space) used previously; (iii) give experimental evidence on benchmark queries that our approach is competitive in its retrieval quality and (iv) as an immediate consequence of the framework, are able to classify and cluster XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Retrieval in XML Document: State of the Art

XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces

Article 18 April 2018

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate matching in XML, http://www.research.att.com/~sihem/publications/PART1.pdf
Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: A Full-Text Search Extension to XQuery. In: WWW 2004 (2004)
Google Scholar
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD 2004 (2004)
Google Scholar
Carmel, D., Afraty, N., Landau, G., Maarek, Y., Mass, Y.: An extension of the vector space model for querying XML documents via XML fragments. In: XML and Information Retrieval Workshop at SIGIR (2002)
Google Scholar
Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: SIGIR 2003 (2003)
Google Scholar
Chamberlin, D., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery: A query language for XML. W3C Technical Report
Google Scholar
Crouch, C.J., Apte, S., Bapat, H.: Using the extended vector model for XML retrieval. [9], 95–98 (2002)
Google Scholar
Doucet, A., Ahonen-Myka, H.: Naive clustering of a large XML document collection. [9], 81–88 (2002)
Google Scholar
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval, INEX (2002)
Google Scholar
Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. Research and Development in Information Retrieval, 172–180 (2001)
Google Scholar
Fuhr, N., Weikum, G.: Classification and Intelligent Search on Information in XML. IEEE Data Engineering Bulletin 25(1) (2002)
Google Scholar
Gövert, N., Abolhassani, M., Fuhr, N., Großjohann, K.: Content-oriented XML retrieval with HyRex. [9], 26–32 (2002)
Google Scholar
Gövert, N., Kazai, G.: Overview of INEX 2002. [9], 1–17 (2002)
Google Scholar
Grabs, T., Schek, H.-J.: Generating vector spaces on-the-fly for flexible XML retrieval. In: Second SIGIR XML workshop (2002)
Google Scholar
Guillaume, D., Murtagh, F.: Clustering of XML documents. Computer Physics Communications 127, 215–227 (2000)
Article MATH Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003 (2003)
Google Scholar
Initiative for the evaluation of XML retrieval, http://qmir.dcs.qmul.ac.uk/INEX/
Kilpeläinen, P.: Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, Dept. of Computer Science, University of Helsinki (1992)
Google Scholar
Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the Evaluation of XML Retrieval (INEX 02). Journal of the American Society for Information Science and Technology 54 (2003)
Google Scholar
Luk, R., Leong, H., Dillon, T., Chan, A., Bruce Croft, W., Allan, J.: A survey in indexing and searching XML documents. JASIST 53(6), 415–437 (2002)
Article Google Scholar
Kazai, G., Masood, S., Lalmas, M.: A Study of the Assessment of Relevance for the INEX 2002 Test Collection. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 296–310. Springer, Heidelberg (2004)
Chapter Google Scholar
Mass, Y., Mandelbrod, M., Amitay, E., Carmel, D., Maarek, Y., Soffer, A.: JuruXML – an XML retrieval system at INEX 2002. [9],73–80 (2002)
Google Scholar
Meila, M.: Comparing Clusterings. Technical Report 418, University of Washington Statistics Dept. (2002)
Google Scholar
Mignet, L., Barbosa, D., Veltri, P.: The XML Web: a First Study. In: Proceedings of the 12th International World Wide Web Conference. Evaluating Structural Similarity in XML Documents. Proceedings of the Fifth International Workshop on the Web and Databases, WebDB 2002 (2003)
Google Scholar
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML Query Answers. In: SIGMOD 2004 (2004)
Google Scholar
Punin, J., Krishnamoorthy, M., Zaki, M.: LOGML: Log markup language for web usage mining. In: WEBKDD Workshop, with SIGKDD 2001 (2001)
Google Scholar
Rizzolo, F., Mendelzon, A.: Indexing XML Data with ToXin. In: Proceedings of Fourth International Workshop on the Web and Databases (2001)
Google Scholar
Salton, G.: The SMART Retrieval System – Experiments in automatic document processing. Prentice Hall Inc, Englewood Cliffs (1971)
Google Scholar
Schlieder, T.: Similarity search in XML data using cost-based query transformations. In: Proc. 4th WebDB, pp. 19–24 (2001)
Google Scholar
Schlieder, T., Meuss, H.: Querying and Ranking XML Documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)
Article Google Scholar
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proc. VLDB 1999 (1999)
Google Scholar
Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of ACM KDD 2002 (2002)
Google Scholar
Zaki, M., Aggarwal, C.: XRULES: An Effective Structural Classifier for XML Data. In: Proceedings of ACM KDD 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

A9.com, Inc., USA
Vinay Kakade
Verity, Inc., USA
Prabhakar Raghavan

Authors

Vinay Kakade
View author publications
You can also search for this author in PubMed Google Scholar
Prabhakar Raghavan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Electrónica y Computación, Universidad de Santiago de Compostela, Spain
David E. Losada
Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, 18071, Granada, Spain
Juan M. Fernández-Luna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kakade, V., Raghavan, P. (2005). Encoding XML in Vector Spaces. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-31865-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics