Skip to main content
Log in

An approach of top-k keyword querying for fuzzy XML

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Keyword search on XML document has received wide attention. Many search semantics and algorithms have been proposed for XML keyword queries. But the existing approaches fall short in their abilities to support keyword queries over fuzzy XML documents. To overcome this limitation, in this paper, we discuss how to obtain and evaluate top-k smallest lowest common ancestor (SLCA) results of keyword queries on fuzzy XML documents. We define the fuzzy SLCA semantics on the fuzzy XML document, and then propose a novel encoding scheme to denote different types of nodes in fuzzy XML documents. After these, we propose two efficient algorithms to find k SLCA results with highest possibilities for a given keyword query on the fuzzy XML document. First one is an algorithm which can obtain the top-k SLCA results and their possibilities based on the stack technique. The second algorithm can obtain top-k SLCA results of keyword queries based on a set of SLCA’s properties. Finally, we compare and evaluate the performances of the two algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Nierman A, Jagadish HV (2002) ProTDB: probabilistic data in XML. In: Proceedings of VLDB, pp 646–657

  2. Abiteboul S, Kimelfeld B, Sagiv Y, Senellart P (2009) On the expressiveness of probabilistic XML models. VLDB J 18(5):1041–1064

    Article  Google Scholar 

  3. Hung E, Getoor L, Subrahmanian VS (2003) PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th international conference on data engineering, pp 467–478

  4. Senellart P, Abiteboul S (2007) On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 283–292

  5. van Keulen M, de Keijzer A, Alink W (2005) A probabilistic XML approach to data integration. In: Proceedings of ICDE, pp 459–470

  6. Abiteboul S, Senellart P (2006) Querying and updating probabilistic information in XML. In: Proceedings of EDBT, pp 1059–1068

  7. Cohen S, Kimelfeld B, Sagiv Y (2009) Incorporating constraints in probabilistic XML. ACM Trans Database Syst 34(3):109–118

    Article  Google Scholar 

  8. Kimelfeld B, Sagiv Y (2007) Matching twigs in probabilistic XML. In: Proceedings of the 33rd international conference on vary large data bases, pp 27–38

  9. Li Y, et al (2009) Holistically twig matching in probabilistic XML. In: Proceedings of the 25th international conference on data engineering, pp 1649–1656

  10. Ma ZM (2005) Fuzzy database modeling with XML. Springer, New York

    MATH  Google Scholar 

  11. Ma ZM, Yan L (2007) Fuzzy XML data modeling with the UML and relational data models. Data Knowl Eng 63:972–996

    Article  Google Scholar 

  12. Yan L, Ma ZM, Liu J (2009) Fuzzy data modeling based on XML schema. In: Proceedings of the 2009 ACM symposium on applied computing, pp 1563–1567

  13. Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML. In: Proceedings of the 2006 ACM symposium on applied computing, pp 456–460

  14. Panić G, Racković M, Škrbić S (2014) Fuzzy XML and prioritized fuzzy XQuery with implementation. J Intell Fuzzy Syst 26(1):303–316

    Google Scholar 

  15. Buche P, Dibie-Barthèlemy J, Haemmerlé O, Hignette G (2006) Fuzzy semantic tagging and flexible querying of XML documents extracted from the web. J Intell Inf Syst 26(1):25–40

    Article  Google Scholar 

  16. Jin Y, Veerappan S (2010) A fuzzy XML database system: data storage and query. In: Proceedings of the 2010 IEEE international conference on information reuse and integration, pp 318–321

  17. Lee J, Fanjiang Y (2003) Modeling imprecise requirements with XML. Inf Softw Technol 45(7):445–460

    Article  Google Scholar 

  18. Kimelfeld B, Senellart P (2013) Probabilistic XML: models and complexity. Advances in probabilistic databases for uncertain information management. Springer, Berlin, pp 39–66

    Book  Google Scholar 

  19. Ma Z, Yan L (2016) Modeling fuzzy data with XML: a survey. Fuzzy Sets Syst 301:146–159

    Article  MathSciNet  Google Scholar 

  20. Zhou R, Liu CF, Li JX, Yu JX (2013) ELCA evaluation for keyword search on probabilistic XML data. World Wide Web 16(2):171–193

    Article  Google Scholar 

  21. Li J, Liu C, Zhou R, Wang W (2011) Top-k keyword search over probabilistic XML data. In: Proceedings of ICDE, pp 673–684

  22. Zhang CJ, et al (2012) Keywords filtering over probabilistic XML data. In: Web technologies and applications, pp 183–194

  23. Li JX, Liu CF, Zhou R, Yu JX Quasi-SLCA based keyword query processing over probabilistic XML data. IEEE Trans Knowl Data Eng (PrePrints)

  24. Liu J, Ma ZM, Yan L (2009) Efficient processing of twig pattern matching in fuzzy XML. In: Proceedings of CIKM, pp 117–126

  25. Liu J, Ma ZM, Qv Q (2014) Dynamically querying possibilistic XML data. Inf Sci 261:70–84

    Article  MathSciNet  MATH  Google Scholar 

  26. Ma ZM, Liu J, Yan L (2011) Matching twigs in fuzzy XML. Inf Sci 181:184–200

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu J, Ma ZM, Ma RZ (2013) Efficient processing of twig query with compound predicates in fuzzy XML. Fuzzy Sets Syst 229:33–53

    Article  MathSciNet  MATH  Google Scholar 

  28. Xu Y, Papakonstantinou Y (2005) Efficient keyword search for smallest LCAs in XML databases. In: Proceedings of SIGMOD, pp 527–538

  29. Xu Y, Papakonstantinou Y (2008) Efficient LCA based keyword search in XML data. In: Proceedings of EDBT, pp 535–546

  30. Liu Z, Chen Y (2007) Identifying meaningful return information for XML keyword search. In: Proceedings of SIGMOD, pp 329–340

  31. Li Y, Yu C, Jagadish HV (2004) Schema-free XQuery. In: Proceedings of VLDB, pp 72–83

  32. Guo L, Shao F, Botev C, Shanmugasundaram J (2003) XRANK: ranked keyword search over XML documents. In: Proceedings of SIGMOD

  33. Li J, et al (2009) Processing XML keyword search by constructing effective structured queries. In: Proceedings of the joint international conferences on advances in data and web management, pp 88–99

  34. Sun C, Chan CY, Goenka AK (2007) Multiway SLCA-based keyword search in XML data. In: Proceedings of WWW, pp 1043–1052

  35. Li T, Li X, Meng XF (2012) Rtop-k: a keyword proximity search method based on semantic and structural relaxation. In: Proceedings of the 2012 IEEE international conference on systems, man and cybernetics, pp 2079–2084

  36. Kong L, Gilleron R, Lemay A (2009) Retrieving meaningful relaxed tightest fragments for XML keyword search. In: Proceedings of EDBT

  37. Cohen S, Namou J, Kanza Y, Sagiv Y (2003) XSEarch: a semantic search engine for XML. In: Proceedings of VLDB

  38. Hristidis V, Koudas N, Papakonstantinou Y, Srivastava D (2006) Keyword proximity search in xml trees. IEEE Trans Knowl Data Eng 18:525–539

    Article  Google Scholar 

  39. Li GL, Feng JH, Wang JY, Zhou LZ (2007) Efficient keyword search for valuable LCAs over XML documents. In: Proceedings of the 16th ACM conference on conference on information and knowledge management, pp 31–40

  40. Bhalotia G, Nakhe C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th conference on data engineering, pp 431–440

  41. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  MATH  Google Scholar 

  42. Dewey Decimal Classification. http://www.oclc.org/dewey/

  43. George JK, Bo Y (1995) Fuzzy sets and fuzzy logic, theory and applications. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  44. Klir G, Folder T (1988) Fuzzy sets, uncertainty and information. Prentice Hall, Englewood Cliff

    Google Scholar 

  45. DBLP. http://dblp.uni-trier.de/xml/

  46. XMARK the XML-benchmark Project. http://www.monetdb.cwi.nl/xml/index.html

Download references

Acknowledgements

The work was supported in part by the National Natural Science Foundation of China (61370075 and 61772269).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongmin Ma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Z., Li, T. & Yan, L. An approach of top-k keyword querying for fuzzy XML. Computing 100, 303–330 (2018). https://doi.org/10.1007/s00607-017-0577-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-017-0577-2

Keywords

Mathematics Subject Classification

Navigation