Abstract
Keyword search on XML document has received wide attention. Many search semantics and algorithms have been proposed for XML keyword queries. But the existing approaches fall short in their abilities to support keyword queries over fuzzy XML documents. To overcome this limitation, in this paper, we discuss how to obtain and evaluate top-k smallest lowest common ancestor (SLCA) results of keyword queries on fuzzy XML documents. We define the fuzzy SLCA semantics on the fuzzy XML document, and then propose a novel encoding scheme to denote different types of nodes in fuzzy XML documents. After these, we propose two efficient algorithms to find k SLCA results with highest possibilities for a given keyword query on the fuzzy XML document. First one is an algorithm which can obtain the top-k SLCA results and their possibilities based on the stack technique. The second algorithm can obtain top-k SLCA results of keyword queries based on a set of SLCA’s properties. Finally, we compare and evaluate the performances of the two algorithms.
Similar content being viewed by others
References
Nierman A, Jagadish HV (2002) ProTDB: probabilistic data in XML. In: Proceedings of VLDB, pp 646–657
Abiteboul S, Kimelfeld B, Sagiv Y, Senellart P (2009) On the expressiveness of probabilistic XML models. VLDB J 18(5):1041–1064
Hung E, Getoor L, Subrahmanian VS (2003) PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th international conference on data engineering, pp 467–478
Senellart P, Abiteboul S (2007) On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 283–292
van Keulen M, de Keijzer A, Alink W (2005) A probabilistic XML approach to data integration. In: Proceedings of ICDE, pp 459–470
Abiteboul S, Senellart P (2006) Querying and updating probabilistic information in XML. In: Proceedings of EDBT, pp 1059–1068
Cohen S, Kimelfeld B, Sagiv Y (2009) Incorporating constraints in probabilistic XML. ACM Trans Database Syst 34(3):109–118
Kimelfeld B, Sagiv Y (2007) Matching twigs in probabilistic XML. In: Proceedings of the 33rd international conference on vary large data bases, pp 27–38
Li Y, et al (2009) Holistically twig matching in probabilistic XML. In: Proceedings of the 25th international conference on data engineering, pp 1649–1656
Ma ZM (2005) Fuzzy database modeling with XML. Springer, New York
Ma ZM, Yan L (2007) Fuzzy XML data modeling with the UML and relational data models. Data Knowl Eng 63:972–996
Yan L, Ma ZM, Liu J (2009) Fuzzy data modeling based on XML schema. In: Proceedings of the 2009 ACM symposium on applied computing, pp 1563–1567
Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML. In: Proceedings of the 2006 ACM symposium on applied computing, pp 456–460
Panić G, Racković M, Škrbić S (2014) Fuzzy XML and prioritized fuzzy XQuery with implementation. J Intell Fuzzy Syst 26(1):303–316
Buche P, Dibie-Barthèlemy J, Haemmerlé O, Hignette G (2006) Fuzzy semantic tagging and flexible querying of XML documents extracted from the web. J Intell Inf Syst 26(1):25–40
Jin Y, Veerappan S (2010) A fuzzy XML database system: data storage and query. In: Proceedings of the 2010 IEEE international conference on information reuse and integration, pp 318–321
Lee J, Fanjiang Y (2003) Modeling imprecise requirements with XML. Inf Softw Technol 45(7):445–460
Kimelfeld B, Senellart P (2013) Probabilistic XML: models and complexity. Advances in probabilistic databases for uncertain information management. Springer, Berlin, pp 39–66
Ma Z, Yan L (2016) Modeling fuzzy data with XML: a survey. Fuzzy Sets Syst 301:146–159
Zhou R, Liu CF, Li JX, Yu JX (2013) ELCA evaluation for keyword search on probabilistic XML data. World Wide Web 16(2):171–193
Li J, Liu C, Zhou R, Wang W (2011) Top-k keyword search over probabilistic XML data. In: Proceedings of ICDE, pp 673–684
Zhang CJ, et al (2012) Keywords filtering over probabilistic XML data. In: Web technologies and applications, pp 183–194
Li JX, Liu CF, Zhou R, Yu JX Quasi-SLCA based keyword query processing over probabilistic XML data. IEEE Trans Knowl Data Eng (PrePrints)
Liu J, Ma ZM, Yan L (2009) Efficient processing of twig pattern matching in fuzzy XML. In: Proceedings of CIKM, pp 117–126
Liu J, Ma ZM, Qv Q (2014) Dynamically querying possibilistic XML data. Inf Sci 261:70–84
Ma ZM, Liu J, Yan L (2011) Matching twigs in fuzzy XML. Inf Sci 181:184–200
Liu J, Ma ZM, Ma RZ (2013) Efficient processing of twig query with compound predicates in fuzzy XML. Fuzzy Sets Syst 229:33–53
Xu Y, Papakonstantinou Y (2005) Efficient keyword search for smallest LCAs in XML databases. In: Proceedings of SIGMOD, pp 527–538
Xu Y, Papakonstantinou Y (2008) Efficient LCA based keyword search in XML data. In: Proceedings of EDBT, pp 535–546
Liu Z, Chen Y (2007) Identifying meaningful return information for XML keyword search. In: Proceedings of SIGMOD, pp 329–340
Li Y, Yu C, Jagadish HV (2004) Schema-free XQuery. In: Proceedings of VLDB, pp 72–83
Guo L, Shao F, Botev C, Shanmugasundaram J (2003) XRANK: ranked keyword search over XML documents. In: Proceedings of SIGMOD
Li J, et al (2009) Processing XML keyword search by constructing effective structured queries. In: Proceedings of the joint international conferences on advances in data and web management, pp 88–99
Sun C, Chan CY, Goenka AK (2007) Multiway SLCA-based keyword search in XML data. In: Proceedings of WWW, pp 1043–1052
Li T, Li X, Meng XF (2012) Rtop-k: a keyword proximity search method based on semantic and structural relaxation. In: Proceedings of the 2012 IEEE international conference on systems, man and cybernetics, pp 2079–2084
Kong L, Gilleron R, Lemay A (2009) Retrieving meaningful relaxed tightest fragments for XML keyword search. In: Proceedings of EDBT
Cohen S, Namou J, Kanza Y, Sagiv Y (2003) XSEarch: a semantic search engine for XML. In: Proceedings of VLDB
Hristidis V, Koudas N, Papakonstantinou Y, Srivastava D (2006) Keyword proximity search in xml trees. IEEE Trans Knowl Data Eng 18:525–539
Li GL, Feng JH, Wang JY, Zhou LZ (2007) Efficient keyword search for valuable LCAs over XML documents. In: Proceedings of the 16th ACM conference on conference on information and knowledge management, pp 31–40
Bhalotia G, Nakhe C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th conference on data engineering, pp 431–440
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Dewey Decimal Classification. http://www.oclc.org/dewey/
George JK, Bo Y (1995) Fuzzy sets and fuzzy logic, theory and applications. Prentice Hall, Upper Saddle River
Klir G, Folder T (1988) Fuzzy sets, uncertainty and information. Prentice Hall, Englewood Cliff
XMARK the XML-benchmark Project. http://www.monetdb.cwi.nl/xml/index.html
Acknowledgements
The work was supported in part by the National Natural Science Foundation of China (61370075 and 61772269).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, Z., Li, T. & Yan, L. An approach of top-k keyword querying for fuzzy XML. Computing 100, 303–330 (2018). https://doi.org/10.1007/s00607-017-0577-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-017-0577-2