research-article

Improving XML search by generating and utilizing informative result snippets

Authors:

Yi ChenAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 35, Issue 3

Article No.: 19, Pages 1 - 45

https://doi.org/10.1145/1806907.1806911

Published: 30 July 2010 Publication History

Abstract

Snippets are used by almost every text search engine to complement the ranking scheme in order to effectively handle user searches, which are inherently ambiguous and whose relevance semantics are difficult to assess. Despite the fact that XML is a standard representation format of Web data, research on generating result snippets for XML search remains limited.

To tackle this important yet open problem, in this article, we present a system eXtract which generates snippets for XML search results. We identify that a good XML result snippet should be a meaningful information unit of a small size that effectively summarizes this query result and differentiates it from others, according to which users can quickly assess the relevance of the query result. We have designed and implemented a novel algorithm to satisfy these requirements. Furthermore, we propose to cluster the query results based on their snippets. Since XML result clustering can only be done at query time, snippet-based clustering significantly improves the efficiency while compromising little clustering accuracy. We verified the efficiency and effectiveness of our approach through experiments.

References

[1]

Aggarwal, C. C., Ta, N., Wang, J., Feng, J., and Zaki, M. 2007. Xpro j: A framework for projected structural clustering of xml documents. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07).

Digital Library

[2]

Ali, M. S., Consens, M. P., Khatchadourian, S., and Rizzolo, F. 2008. DescribeX: Interacting with AxPRE summaries (demo description). In Proceedings of the International Conference on Data Engineering (ICDE'08).

Digital Library

[3]

Bao, Z., Ling, T. W., Chen, B., and Lu, J. 2009. Effective XML keyword search with relevance oriented ranking. In Proceedings of the International Conference on Data Engineering (ICDE'09).

Digital Library

[4]

Barg, M. and Wong, R. K. 2001. Structural proximity searching for large collections of semi-structured data. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'01).

Digital Library

[5]

Carbonell, J. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[6]

Clarke, C. L. A. 2005. Controlling overlap in content-oriented XML retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[7]

Cohen, S., Mamou, J., Kanza, Y., and Sagiv, Y. 2003. XSEarch: A semantic search engine for XML. In Proceedings of the International Conference on Very Large Databases (VLDB'03).

Digital Library

[8]

Dalamagas, T., Cheng, T., Winkel, K.-J., and Sellis, T. 2006. A methodology for clustering XML documents by structure. Inform. Syst. 31, 3, 187--228.

Digital Library

[9]

Dalamagas, T., Cheng, T., Winkel, K.-J., and Sellis, T. K. 2004. Clustering XML documents using structural summaries. In Proceedings of the International Conference on Extending Database Technology (EDBT'04) Workshops.

Digital Library

[10]

Das, G., Hristidis, V., Kapoor, N., and Sudarshan, S. 2006. Ordering the attributes of query results. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[11]

Doucet, A. and Ahonen-Myka, H. 2002. Naive clustering of a large XML document collection. In Proceedings of the Initative for the Evaluation of XML Retrieval (INEX'02) Workshop.

[12]

Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[13]

Golenberg, K., Kimelfeld, B., and Sagiv, Y. 2008. Keyword proximity search in complex data graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[14]

Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[15]

He, H., Wang, H., Yang, J., and Yu, P. S. 2007. BLINKS: Ranked keyword searches on graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[16]

Hristidis, V., Koudas, N., Papakonstantinou, Y., and Srivastava, D. 2006. Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Engin. 18, 4.

Digital Library

[17]

Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003. Keyword proximity search on XML graphs. In Proceedings of the International Conference on Data Engineering (ICDE'03).

[18]

Huang, Y., Liu, Z., and Chen, Y. 2008. Query biased snippet generation in XML search. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[19]

Kamps, J., de Rijke, M., and Sigurbjornsson, B. 2004. Length normalization in XML retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[20]

Kazai, G., Lalmas, M., and de Vries, A. P. The overlap problem in content-oriented XML retrieval evaluation. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[21]

Lee, M. L., Yang, L. H., Hsu, W., and Yang, X. 2002. XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'02).

Digital Library

[22]

Li, G., Feng, J., Wang, J., and Zhou, L. 2007. Effective keyword search for valuable LCAs over XML documents. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'07).

Digital Library

[23]

Li, Y., Yu, C., and Jagadish, H. V. 2004. Schema-Free XQuery. In Proceedings of the International Conference on Very Large Databases (VLDB'04).

Digital Library

[24]

Lian, W., lok Cheung, D. W., Mamoulis, N., and Yiu, S.-M. 2004. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Trans. Knowl. Data Engin. 16, 1, 82--96.

Digital Library

[25]

Liang, Y.-H., Zhao, T.-J., Yu, H., and Yao, J.-M. 2005. High precision English base noun phrase identification based on “Waterfall” model. In Proceedings of the Conference on Machine Learning and Cybernetics.

[26]

Lin, C.-Y. 2003. Improving summarization performance by sentence compression: A pilot study. In Proceedings of the International Workshop on Information Retrieval with Asia Languages (IRAL'03).

Digital Library

[27]

Liu, Z. and Chen, Y. 2007. Identifying meaningful return information for XML keyword search. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[28]

Luo, Y., Lin, X., Wang, W., and Zhou, X. 2007. SPARK: Top-k keyword query in relational databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[29]

Nierman, A. and Jagadish, H. V. 2002. Evaluating structural similarity in XML documents. In Proceedings of the International Workshop on Web and Databases (WebDB'02).

[30]

Ogilvie, P. and Callan, J. 2003. Using language models for flat text queries in XML re-trieval. In Proceedings of the Initiative for the Evaluation of XML Retrieval Workshop (INEX'03).

[31]

Piwowarski, B. and Dupret, G. 2006. Evaluation in (XML) information retrieval: Expected precision-recall with user modelling (EPRUM). In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[32]

Polyzotis, N. and Garofalakis, M. 2006. XCluster synopses for structured XML content. In Proceedings of the International Conference on Data Engineering (ICDE'06).

Digital Library

[33]

Ramanath, M. and Kumar, K. S. 2008. A rank-rewrite framework for summarizing XML documents. In Proceedings of the International Workshop on Ranking in Databases (DBRank'08).

[34]

Silber, H. G. and McCoy, K. F. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist. 28, 4.

Digital Library

[35]

Sun, C., Chan, C.-Y., and Goenka, A. 2007. Multiway SLCA-based keyword search in XML data. In Proceedings of the International World Wide Web Conference (WWW'07).

Digital Library

[36]

Szlavik, Z., Tombros, A., and Lalmas, M. 2006. The use of summaries in XML retrieval. In Proceedings of the European Conference on Digital Libraries (ECDL'06).

[37]

Tagarelli, A. and Greco, S. 2006. Toward semantic XML clustering. In Proceedings of the SIAM International Conference on Data Mining (SDM'06).

[38]

Tombros, A. and Sanderson, M. 1998. Advantages of query biased summaries in information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[39]

Tombros, A., Villa, R., and Rijsberge, C. J. V. 2002. The effectiveness of query-specific hierarchic clustering in information retrieval. Inform. Process. Manag. 38, 4, 559--582.

Digital Library

[40]

Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in web search. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[41]

Varadarajan, R. and Hristidis, V. 2005. Structure-Based query-specific document summarization. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'05).

Digital Library

[42]

Varadarajan, R. and Hristidis, V. 2006. A system for query-specific document summarization. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'06).

Digital Library

[43]

Wacholder, N., Evans, D. K., and Klavans, J. L. 2001. Automatic identification and organization of index terms for interactive browsing. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries.

Digital Library

[44]

Wang, J. T. L., Liu, J., and Wang, J. 2005. XML clustering and retrieval through principal component analysis. Int. J. Artif. Intell. Tools 14, 4, 683.

[45]

Wang, T., xin Liu, D., and Lin, X.-Z. 2006. XML document clustering by independent component analysis. In Proceedings of the International Workshop on Knowledge Discovery from XML Documents (KDXD'06).

Digital Library

[46]

White, M., Korelsky, T., Cardie, C., Ng, V., Pierce, D., and Wagstaff, K. 2001. Multi-Document summarization via information extraction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT'01).

Digital Library

[47]

White, R. W., Ruthven, I., and Jose, J. M. 2002. Finding relevant documents using top ranking sentences: An evaluation of two alternative schemes. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[48]

Xing, G., Guo, J., and Xia, Z. 2006. Classifying XML documents based on structure/content similarity. In Proceedings of the Initiative for the Evaluation of XML Retrieval Workshop (INEX'06).

[49]

Xing, G., Xia, Z., and Guo, J. 2007. Clustering XML documents based on structural similarity. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA'07).

Digital Library

[50]

Xu, Y. and Papakonstantinou, Y. 2005. Efficient keyword search for smallest LCAs in XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

Digital Library

[51]

Zechner, K. 1996. Fast generation of abstracts from general domain text corpora by extracting relevant sentences. In Proceedings of the 16th International Conference on Computational Linguistics (COLING'96). 986--989.

Digital Library

Cited By

Naseriparsa MIslam MLiu CChen L(2019)XSnippets: Exploring semi-structured data via snippetsData & Knowledge Engineering10.1016/j.datak.2019.101758Online publication date: Oct-2019
https://doi.org/10.1016/j.datak.2019.101758
Liu ZChen Y(2018)Processing keyword search on XMLWorld Wide Web10.1007/s11280-011-0128-214:5-6(671-707)Online publication date: 25-Dec-2018
https://dl.acm.org/doi/10.1007/s11280-011-0128-2
Liu XWan CLiu D(2016)Keyword query with structureInformation Technology and Management10.1007/s10799-015-0247-z17:2(151-163)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10799-015-0247-z
Show More Cited By

Index Terms

Improving XML search by generating and utilizing informative result snippets
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Return specification inference and result clustering for keyword search on XML

Keyword search enables Web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Query biased snippet generation in XML search
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Snippets are used by almost every text search engine to complement ranking scheme in order to effectively handle user searches, which are inherently ambiguous and whose relevance semantics are difficult to assess. Despite the fact that XML is a standard ...
Towards an Effective XML Keyword Search

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: 1) Identify the user search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 35, Issue 3

July 2010

311 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/1806907

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2010

Accepted: 01 February 2010

Revised: 01 October 2009

Received: 01 March 2009

Published in TODS Volume 35, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Information and Intelligent Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
592
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Naseriparsa MIslam MLiu CChen L(2019)XSnippets: Exploring semi-structured data via snippetsData & Knowledge Engineering10.1016/j.datak.2019.101758Online publication date: Oct-2019
https://doi.org/10.1016/j.datak.2019.101758
Liu ZChen Y(2018)Processing keyword search on XMLWorld Wide Web10.1007/s11280-011-0128-214:5-6(671-707)Online publication date: 25-Dec-2018
https://dl.acm.org/doi/10.1007/s11280-011-0128-2
Liu XWan CLiu D(2016)Keyword query with structureInformation Technology and Management10.1007/s10799-015-0247-z17:2(151-163)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10799-015-0247-z
Aksoy CDimitriou ATheodoratos D(2015)Reasoning with patterns to effectively answer XML keyword queriesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0384-324:3(441-465)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s00778-015-0384-3
Shalabi RElfatatry A(2014)Towards improving XML search by using structure clustering techniqueJournal of Information Science10.1177/016555151456052341:2(146-166)Online publication date: 12-Dec-2014
https://doi.org/10.1177/0165551514560523
Liu ZChen Y(2012)Exploiting and Maintaining Materialized Views for XML Keyword QueriesACM Transactions on Internet Technology (TOIT)10.1145/2390209.239021212:2(1-27)Online publication date: 1-Dec-2012
https://dl.acm.org/doi/10.1145/2390209.2390212
Liu ZChen Y(2012)Differentiating search results on structured dataACM Transactions on Database Systems (TODS)10.1145/2109196.210920037:1(1-30)Online publication date: 6-Mar-2012
https://dl.acm.org/doi/10.1145/2109196.2109200
Deng ZXiang YGao N(2012)LAF: a new XML encoding and indexing strategy for keyword‐based XML searchConcurrency and Computation: Practice and Experience10.1002/cpe.290625:11(1604-1621)Online publication date: 24-Jul-2012
https://doi.org/10.1002/cpe.2906
Chen YWang WLiu Z(2011)Keyword-based search and exploration on databasesProceedings of the 2011 IEEE 27th International Conference on Data Engineering10.1109/ICDE.2011.5767958(1380-1383)Online publication date: 11-Apr-2011
https://dl.acm.org/doi/10.1109/ICDE.2011.5767958

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents