Skip to main content

Search and Aggregation in XML Documents

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Included in the following conference series:

Abstract

Information retrieval encounters a migration from the traditional paradigm (returning an ordered list of responses) to the aggregate search paradigm (grouping the most comprehensive and relevant answers into one final aggregated document). Nowadays extensible markup language (XML) is an important standard of information exchange and representation. Usually the tree representation of documents and queries is used to process them. It allows to consider the XML documents retrieval as a tree matching problem between the document trees and the query tree. Several paradigms for retrieving XML documents have been proposed in the literature but only a few of them try to aggregate a set of XML documents in order to provide more significant answers for a given query. In this paper, we propose and evaluate an aggregated search method to obtain the most accurate and richest answers in XML fragment search. Our search method is based on the Top-k Approximate Subtree Matching (TASM) algorithm and a new similarity function is proposed to improve the returned fragments. Then an aggregation process is presented to generate a single aggregate response containing the most relevant, exhaustive and non-redundant information given by the fragments. The method is evaluated on two real world datasets. Experimentations show that it generates good results in terms of relevance and quality.

This work is partially funded by the French National Agency of Research project: Contextual and Aggregated Information Retrieval (ANR-14-CE23-0006).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dblp.uni-trier.de/xml/.

  2. 2.

    http://research.cs.wisc.edu/niagara/data/.

References

  1. W3C XML web page. http://www.w3.org/XML/

  2. Arguello, J.: Improving aggregated search coherence. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 25–36. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3_3

    Google Scholar 

  3. Arguello, J., Capra, R.: The effect of aggregated search coherence on search behavior. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, New York, NY, USA, pp. 1293–1302. ACM (2012)

    Google Scholar 

  4. Arguello, J., Diaz, F., Callan, J., Carterette, B.: A methodology for evaluating aggregated search results. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 141–152. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_15

    Chapter  Google Scholar 

  5. Augsten, N., Barbosa, D., BÃűhlen, M., Palpanas, T.: TASM: top-k approximate subtree matching. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 353–364, March 2010

    Google Scholar 

  6. Bessai-Mechmache, F.Z., Alimazighi, Z.: Aggregated search in XML documents. J. Emerg. Technol. Web Intell. 4(2), 181–188 (2012)

    Google Scholar 

  7. Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 151–158. ACM, New York (2003)

    Google Scholar 

  8. Dunlavy, D.M., OâĂŹLeary, D.P., Conroy, J.M., Schlesinger, J.D.: QCS: a system for querying, clustering and summarizing documents. Inf. Process. Manag. 43(6), 1588–1605 (2007)

    Article  Google Scholar 

  9. Géry, M., Largeron, C., Thollard, F.: Probabilistic document model integrating xml structure. In: Proceedings in INEX, pp. 139–149 (2007)

    Google Scholar 

  10. Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 315–326. ACM, New York (2008)

    Google Scholar 

  11. Kaptein, R., Marx, M.: Focused retrieval and result aggregation with political data. Inf. Retrieval 13(5), 412–433 (2010)

    Article  Google Scholar 

  12. Kopliku, A., Pinel-Sauvagnat, K., Boughanem, M.: Aggregated search: a new information retrieval paradigm. ACM Comput. Surv. 46(3), 41:1–41:31 (2014)

    Article  Google Scholar 

  13. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lalmas, M.: Aggregated search. In: Melucci, M., Baeza-Yates, R. (eds.) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol. 33, pp. 109–123. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20946-8_5

    Chapter  Google Scholar 

  15. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)

    Google Scholar 

  16. Mass, Y., Mandelbrod, M.: Retrieving the most relevant xml components. In: INEX 2003 Workshop Proceedings, p. 58. Citeseer (2003)

    Google Scholar 

  17. Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., León, E.: Extractive single-document summarization based on genetic operators and guided local search. Expert Syst. Appl. 41(9), 4158–4169 (2014)

    Article  Google Scholar 

  18. Murdock, V., Lalmas, M.: Workshop on aggregated search. SIGIR Forum 42(2), 80–83 (2008)

    Article  Google Scholar 

  19. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(1), 251–266 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  20. Naffakhi, N., Faiz, R.: Aggregated search in XML documents: what to retrieve? In: 2012 International Conference on Information Technology and e-Services, pp. 1–6, March 2012

    Google Scholar 

  21. Paris, C., Wan, S., Thomas, P.: Focused and aggregated search: a perspective from natural language generation. Inf. Retrieval 13(5), 434–459 (2010)

    Article  Google Scholar 

  22. Qumsiyeh, R., Qumsiyeh, R., Ng, Y.-K., Ng, Y.-K.: Searching web documents using a summarization approach. Int. J. Web Inf. Syst. 12(1), 83–101 (2016)

    Article  Google Scholar 

  23. Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: Newsinessence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)

    Article  Google Scholar 

  24. Sauvagnat, K., Hlaoua, L., Boughanem, M.: XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 88–103. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_7

    Google Scholar 

  25. Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland (2000)

    Google Scholar 

  26. Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  27. Theobald, M., Schenkel, R., Weikum, G.: TopX and XXL at INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 282–295. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_21

    Google Scholar 

  28. Tufte, K., Maier, D.: Aggregation and accumulation of XML data. IEEE Data Eng. Bull. 24(2), 34–39 (2001)

    Google Scholar 

  29. Tufte, K., Maier, D.: Merge as a lattice-join of xml documents. In: 28th International Conference on VLDB (2002)

    Google Scholar 

  30. Turpin, L., Kelly, D., Arguello, J.: To blend or not to blend? Perceptual speed, visual memory and aggregated search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 1021–1024. ACM, New York (2016)

    Google Scholar 

  31. Wei, W., Liu, M., Li, S.: Merging of XML documents. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 273–285. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30464-7_22

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelmalek Habi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Habi, A., Effantin, B., Kheddouci, H. (2017). Search and Aggregation in XML Documents. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64468-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64467-7

  • Online ISBN: 978-3-319-64468-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics