ABSTRACT
In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimental results based on a benchmark showed superior performance of the proposed measure over the baseline which ignores structural knowledge of XML documents.
- D. Carmel, Y.S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer. "Searching XML Documents via XML Fragments", In Proceedings of SIGIR' 2003, Toronto, Canada, 2003 Google ScholarDigital Library
- V. Kakade and P. Raghavan. "Encoding XML in Vector Spaces", In Proceedings of ECIR'2005, Santiago, Spain Google ScholarDigital Library
- Initiative for the evaluation of XML retrieval http://qmir.dcs.qmul.ac.hk/INEX/Google Scholar
- S. Liu, Q. Zhu and W.W. Chu. "Configurable Indexing and Ranking for XML Information Retrieval", In Proceedings of SIGIR' 2003, Toronto, Canada Google ScholarDigital Library
Index Terms
- Measuring similarity of semi-structured documents with context weights
Recommendations
Measuring the structural similarity among XML documents and DTDs
Measuring the structural similarity between an XML document and a DTD has many relevant applications that range from document classification and approximate structural queries on XML documents to selective dissemination of XML documents and document ...
Measuring Similarity among Legal Court Case Documents
Compute '17: Proceedings of the 10th Annual ACM India Compute ConferenceComputing the similarity between two legal documents is an important challenge in the Legal Information Retrieval domain. Efficient calculation of this similarity has useful applications in various tasks such as identifying relevant prior cases for a ...
Comments