Search and Aggregation in XML Documents

Habi, Abdelmalek; Effantin, Brice; Kheddouci, Hamamache

doi:10.1007/978-3-319-64468-4_22

Abdelmalek Habi¹⁹,
Brice Effantin¹⁹ &
Hamamache Kheddouci¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1057 Accesses
1 Citations

Abstract

Information retrieval encounters a migration from the traditional paradigm (returning an ordered list of responses) to the aggregate search paradigm (grouping the most comprehensive and relevant answers into one final aggregated document). Nowadays extensible markup language (XML) is an important standard of information exchange and representation. Usually the tree representation of documents and queries is used to process them. It allows to consider the XML documents retrieval as a tree matching problem between the document trees and the query tree. Several paradigms for retrieving XML documents have been proposed in the literature but only a few of them try to aggregate a set of XML documents in order to provide more significant answers for a given query. In this paper, we propose and evaluate an aggregated search method to obtain the most accurate and richest answers in XML fragment search. Our search method is based on the Top-k Approximate Subtree Matching (TASM) algorithm and a new similarity function is proposed to improve the returned fragments. Then an aggregation process is presented to generate a single aggregate response containing the most relevant, exhaustive and non-redundant information given by the fragments. The method is evaluated on two real world datasets. Experimentations show that it generates good results in terms of relevance and quality.

This work is partially funded by the French National Agency of Research project: Contextual and Aggregated Information Retrieval (ANR-14-CE23-0006).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

W3C XML web page. http://www.w3.org/XML/
Arguello, J.: Improving aggregated search coherence. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 25–36. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3_3
Google Scholar
Arguello, J., Capra, R.: The effect of aggregated search coherence on search behavior. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, New York, NY, USA, pp. 1293–1302. ACM (2012)
Google Scholar
Arguello, J., Diaz, F., Callan, J., Carterette, B.: A methodology for evaluating aggregated search results. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 141–152. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_15
Chapter Google Scholar
Augsten, N., Barbosa, D., BÃűhlen, M., Palpanas, T.: TASM: top-k approximate subtree matching. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 353–364, March 2010
Google Scholar
Bessai-Mechmache, F.Z., Alimazighi, Z.: Aggregated search in XML documents. J. Emerg. Technol. Web Intell. 4(2), 181–188 (2012)
Google Scholar
Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 151–158. ACM, New York (2003)
Google Scholar
Dunlavy, D.M., OâĂŹLeary, D.P., Conroy, J.M., Schlesinger, J.D.: QCS: a system for querying, clustering and summarizing documents. Inf. Process. Manag. 43(6), 1588–1605 (2007)
Article Google Scholar
Géry, M., Largeron, C., Thollard, F.: Probabilistic document model integrating xml structure. In: Proceedings in INEX, pp. 139–149 (2007)
Google Scholar
Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 315–326. ACM, New York (2008)
Google Scholar
Kaptein, R., Marx, M.: Focused retrieval and result aggregation with political data. Inf. Retrieval 13(5), 412–433 (2010)
Article Google Scholar
Kopliku, A., Pinel-Sauvagnat, K., Boughanem, M.: Aggregated search: a new information retrieval paradigm. ACM Comput. Surv. 46(3), 41:1–41:31 (2014)
Article Google Scholar
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Lalmas, M.: Aggregated search. In: Melucci, M., Baeza-Yates, R. (eds.) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol. 33, pp. 109–123. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20946-8_5
Chapter Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)
Google Scholar
Mass, Y., Mandelbrod, M.: Retrieving the most relevant xml components. In: INEX 2003 Workshop Proceedings, p. 58. Citeseer (2003)
Google Scholar
Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., León, E.: Extractive single-document summarization based on genetic operators and guided local search. Expert Syst. Appl. 41(9), 4158–4169 (2014)
Article Google Scholar
Murdock, V., Lalmas, M.: Workshop on aggregated search. SIGIR Forum 42(2), 80–83 (2008)
Article Google Scholar
Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(1), 251–266 (1986)
Article MathSciNet MATH Google Scholar
Naffakhi, N., Faiz, R.: Aggregated search in XML documents: what to retrieve? In: 2012 International Conference on Information Technology and e-Services, pp. 1–6, March 2012
Google Scholar
Paris, C., Wan, S., Thomas, P.: Focused and aggregated search: a perspective from natural language generation. Inf. Retrieval 13(5), 434–459 (2010)
Article Google Scholar
Qumsiyeh, R., Qumsiyeh, R., Ng, Y.-K., Ng, Y.-K.: Searching web documents using a summarization approach. Int. J. Web Inf. Syst. 12(1), 83–101 (2016)
Article Google Scholar
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: Newsinessence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)
Article Google Scholar
Sauvagnat, K., Hlaoua, L., Boughanem, M.: XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 88–103. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_7
Google Scholar
Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland (2000)
Google Scholar
Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979)
Article MathSciNet MATH Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: TopX and XXL at INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 282–295. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_21
Google Scholar
Tufte, K., Maier, D.: Aggregation and accumulation of XML data. IEEE Data Eng. Bull. 24(2), 34–39 (2001)
Google Scholar
Tufte, K., Maier, D.: Merge as a lattice-join of xml documents. In: 28th International Conference on VLDB (2002)
Google Scholar
Turpin, L., Kelly, D., Arguello, J.: To blend or not to blend? Perceptual speed, visual memory and aggregated search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 1021–1024. ACM, New York (2016)
Google Scholar
Wei, W., Liu, M., Li, S.: Merging of XML documents. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 273–285. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30464-7_22
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Université de Lyon, Université Lyon 1, CNRS LIRIS, UMR 5205, 69622, Lyon, France
Abdelmalek Habi, Brice Effantin & Hamamache Kheddouci

Authors

Abdelmalek Habi
View author publications
You can also search for this author in PubMed Google Scholar
Brice Effantin
View author publications
You can also search for this author in PubMed Google Scholar
Hamamache Kheddouci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelmalek Habi .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Habi, A., Effantin, B., Kheddouci, H. (2017). Search and Aggregation in XML Documents. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-64468-4_22
Published: 01 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics