Skip to main content
Log in

Object-stack: An object-oriented approach for top-k keyword querying over fuzzy XML

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Keyword search is the most popular technique of searching information from XML (eXtensible markup language) document. It enables users to easily access XML data without learning the structure query language or studying the complex data schemas. Existing traditional keyword query methods are mainly based on LCA (lowest common ancestor) semantics, in which the returned results match all keywords at the granularity of elements. In many practical applications, information is often uncertain and vague. As a result, how to identify useful information from fuzzy data is becoming an important research topic. In this paper, we focus on the issue of keyword querying on fuzzy XML data at the granularity of objects. By introducing the concept of “object tree”, we propose the query semantics for keyword query at object-level. We find the minimum whole matching result object trees which contain all keywords and the partial matching result object trees which contain partial keywords, and return the root nodes of these result object trees as query results. For effectively and accurately identifying the top-K answers with the highest scores, we propose a score mechanism with the consideration of tf*idf document relevance, users’ preference and possibilities of results. We propose a stack-based algorithm named object-stack to obtain the top-K answers with the highest scores. Experimental results show that the object-stack algorithm outperforms the traditional XML keyword query algorithms significantly, and it can get high quality of query results with high search efficiency on the fuzzy XML document.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Abiteboul, S., & P. Senellart, (2006) Querying and updating probabilistic information in xml, in Proceedings of the 2006 International Conference on Extended Data Base Technology, pp. 1059–1068.

  • Abiteboul, S., Segoufin, L., & Vianu, V. (2006). Representing and querying XML with incomplete information. Transactions on Database systems, 31(1), 208–254.

    Article  Google Scholar 

  • Bao, Z. F., T. W. Ling, B. & Chen, J. H. Lu, (2009) Effective XML keyword search with relevance oriented ranking, in: IEEE international conference on data engineering, pp. 517–528.

  • Barcel’o, P., & Libkin, L. (2010). A. poggi, C. Sirangelo. XML with incomplete information. Journal of the ACM, 58(1), 1–62.

    Article  Google Scholar 

  • Barcel’o, P., L. Libkin, A. poggi, & C. Sirangelo. (2009) XML with incomplete information: models, properties, and query answering. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 237–246.

  • Bhalotia, G., C. Nakhe, A. Hulgeri, S. Chakrabarti, & S. Sudarshan, (2002) Keyword Searching and Browsing in Databases using BANKS, in: Proceedings of IEEE 18th international conference on Data Engineering, pp 431–440.

  • Cohen, S., J. Mamou, Y. Kanza, & Y. Sagiv. (2003) XSEarch: A semantic search engine for XML, in: Proceedings of the 29th international conference on very large data bases, pp. 45–56.

  • Cohen, S., B. Kimelfeld, & Y. Sagiv, (2009) Running tree automata on probabilistic xml, in Proceedings of the 28th ACM SIGMOD-SIGACTSIGART symposium on Principles of database systems, pp. 227–236.

  • DBLP (n.d) Bibliography. Available: http://dblp.uni-trier.de/xml/

  • Gaurav, R. (2006) Alhajj, Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML, in: Proceedings of the 2006 ACM Symposium on Applied, Computing, pp. 456–460.

  • George, J. K., & Bo, Y. (1995). Fuzzy Sets and Fuzzy Logic. Theory and Applications, Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Guo, L., F. Shao, C. Botev, & J. Shanmugasundaram, (2003) XRANK: Ranked keyword search over XML documents, in: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 16–27.

  • Hristidis, V., Koudas, N., Papakonstantinou, Y., & Srivastava, D. (2006). Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4), 525–539.

    Article  Google Scholar 

  • Kanza, Y., Nutt, W., & Sagiv, Y. (2002). Querying incomplete information in semistructured data. Journal of Computer and System Sciences, 64(3), 655–693.

    Article  Google Scholar 

  • Kimelfeld, Y. & Sagiv, (2007a) Combining incompleteness and ranking in tree queries, in: Proceedings of the 11th International Conference on Database Theory, pp. 329–343.

  • Kimelfeld, Y. & Sagiv, (2007b) Matching twigs in probabilistic XML, in: Proceedings of the 33rd International Conference on Vary large Data Bases, pp. 27–38.

  • Kimelfeld, & Sagiv, Y. (2008). Modeling and querying probabilistic xml data. ACM SIGMOD Record, 37(4), 69–77.

    Article  Google Scholar 

  • Kimelfeld, Y. Kosharovsky, & Y. Sagiv, (2008) Query efficiency in probabilistic xml models, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 701–714.

  • Li, G., J. Feng, J. Wang, & L. Zhou, (2007) Efficient keyword search for valuable LCAs over xml documents, in: Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 31–40.

  • Li, J., C. Liu, R. Zhou et al., (2009a) Processing XML Keyword Search by Constructing Effective Structured Queries, in: Proceedings of the Joint International Conferences on Advances in Data and Web Management, pp. 88–99.

  • Li, Y., et al., (2009b) Holistically twig matching in probabilistic XML, in: Proceedings of 25th international conference on data engineering, pp. 1649-1656

  • Li, J., C. Liu, R. Zhou, & W. Wang, (2011) Top-k keyword search over probabilistic XML data, in: Proceedings of IEEE 27th international Conference on Data Engineering, pp. 673–684.

  • Li, L., Le, T. N., Wu, H., Ling, T. W., & Bressan, S. (2013). Discovering semantics from data-centric XML. Database and Expert Systems Applications, 88–102.

  • Liu, Z., & Y. Chen. (2007) Identifying meaningful return information for XML keyword search. In: Proceedings of the 2007ACM SIGMOD international conference on Management of data, pp. 329–340.

  • Liu, Z., J. Walker, & Y. Chen. (2007) XSeek: A semantic XML search engine using keywords. in: Proceedings of the 33rd international conference on very large data bases, pp. 1330–1333.

  • Liu, J., Ma, Z. M., & Ma, R. Z. (2013). Efficient processing of twig query with compound predicates in fuzzy XML. Fuzzy Sets and Systems, 229, 33–53.

    Article  Google Scholar 

  • Liu, J., Ma, Z. M., & Qv, Q. (2014). Dynamically querying possibilistic XML data. Information Sciences, 261, 70–84.

    Article  Google Scholar 

  • Ma, Z. M., & Yan, L. (2007). Fuzzy XML data modeling with the UML and relational data models. Data & Knowledge Engineering, 63, 972–996.

    Article  Google Scholar 

  • Ma, Z., & Yan, L. (2016). Modeling fuzzy data with XML: A survey. Fuzzy Sets and Systems, 301, 146–159.

    Article  Google Scholar 

  • Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented databases for fuzzy information modeling. Information Systems, 29, 421–435.

    Article  Google Scholar 

  • Ma, Z. M., Liu, J., & Yan, L. (2011). Matching twigs in fuzzy XML. Information Science, 181, 184–200.

    Article  Google Scholar 

  • Meng, X. F., Li, Y., Ma, Z. M., Zhang, F., & Wang, X. (2011). An adaptive query relaxation approach for relational databases based on semantic similarity. Chinese Journal of Computers, 34(5), 812–824.

    Article  Google Scholar 

  • Mondial (2016). Available: http://www.dbis.informatik.uni-goettingen.de/mondial.

  • Oliboni, B., & Pozzani, G. (2010). An XML schema for managing fuzzy documents. Soft Computing in XML Data Management, 3–23.

  • Panić, G., Racković, M., & Škrbić, S. (2014). Fuzzy XML and prioritized fuzzy XQuery with implementation. Journal of Intelligent and Fuzzy Systems, 26, 303–316.

    Google Scholar 

  • Ribeiro, L., & T. Härder. Entity identification in XML documents. Grundlagen von Datenbanken (2006).

  • Singhal, J. Choi, D. Hindle, et al., (1999) At&t at TREC-7, in Proceedings of the 7th Text Retrieval Conference (TREC-7), pp. 239–252.

  • Sun, C. Y. Chan, & A. K. Goenka, Multiway SLCA-based keyword search in xml data, in: Proceedings of the 16th International Conference on World Wide Web, 2007, pp. 1043–1052.

  • Wei, X., et al. (2015). Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Transactions on Fuzzy Systems, 23(1), 72–84.

    Article  Google Scholar 

  • XMARK (2016). Available: http://www.xml-benchmark.org/.

  • Xu, Y., & Y. Papakonstantinou, (2005) Efficient keyword search for smallest LCAs in XML databases, in: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 527–538.

  • Xu, Y., & Y. Papakonstantinou, (2008) Efficient lca based keyword search in XML data, in: Proceedings of 11th international conference on extending database technology: Advances in database technology, pp. 535–546.

  • Xu, Z., et al. (2017a) From Latency, through Outbreak, to Decline: Detecting Different States of Emergency Events Using Web Resources, IEEE Transactions on Big Data, doi:10.1109/TBDATA.2016.2599935.

  • Xu, Z., et al., (2017b) Crowdsourcing based Description of Urban Emergency Events using Social Media Big Data, IEEE Transactions on Cloud Computing, doi:10.1109/TCC.2016.2517638.

  • Xuan, J., et al. (2016). Uncertainty analysis for the keyword system of web events. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(6), 829–842.

    Article  Google Scholar 

  • Yang, W. D., & B. Shi. (2007) Schema-aware keyword search over XML streams, in: Proceedings of 7th international conference on computer and information technology, pp. 29–34.

  • Zhang, J., Chang, L., Sha, C. F., et al. (2012). Keywords filtering over probabilistic XML data. Web Technologies and Applications, 183–194.

  • Zhou, R., C. F. Liu, J. X. Li, J. X. Y, (2013) ELCA evaluation for keyword search on probabilistic XML data, World Wide Web 16(2) 171–193.

Download references

Acknowledgements

The authors thank the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. The work was supported in part by the National Natural Science Foundation of China (61370075 and 61572118).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongmin Ma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Ma, Z. Object-stack: An object-oriented approach for top-k keyword querying over fuzzy XML. Inf Syst Front 19, 669–697 (2017). https://doi.org/10.1007/s10796-017-9748-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-017-9748-0

Keywords

Navigation