Abstract
In this paper, we present a novel method for automatically deriving structured XML queries from keyword-based queries and show how it was applied to the experimental tasks proposed for the INEX 2010 data-centric track. In our method, called StruX, users specify a schema-independent unstructured keyword-based query and it automatically generates a top-k ranking of schema-aware queries based on a target XML database. Then, one of the top ranked structured queries can be selected, automatically or by a user, to be executed by an XML query engine. The generated structured queries are XPath expressions consisting of an entity path (e.g., dblp/article) and predicates (e.g., /dblp/article[author=”john” and title=”xml”]). We use the concept of entity, commonly adopted in the XML keyword search literature, to define suitable root nodes for the query results. Also, StruX uses IR techniques to determine in which elements a term is more likely to occur.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, P., Sudarshan, S.: BANKS: Browsing and Keyword Searching in Relational Databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 1083–1086 (2002)
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A System for Keyword-Based Search over Relational Databases. In: Proceedings of the 18th International Conference on Data Engineering, pp. 5–16 (2002)
Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: Authority-Based Keyword Search in Databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 564–575 (2004)
Barros, E.G., Moro, M.M., Laender, A.H.F.: An Evaluation Study of Search Algorithms for XML Streams. Journal of Information and Data Management 1(3), 487–502 (2010)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 45–56 (2003)
Demidova, E., Zhou, X., Zenz, G., Nejdl, W.: SUITS: Faceted User Interface for Constructing Structured Queries from Keywords. In: Proceedings of the International Conference on Database Systems for Advanced Applications, pp. 772–775 (2009)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 16–27 (2003)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style Keyword Search over Relational Databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 850–861 (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. IEEE Transactions on Knowledge and Data Engineering 18(4), 525–539 (2006)
Hristidis, V., Papakonstantinou, Y.: DISCOVER: Keyword Search in Relational Databases. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 670–681 (2002)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proceedings of the 19th International Conference on Data Engineering, pp. 367–378 (2003)
Kim, J., Xue, X., Croft, W.: A probabilistic retrieval model for semistructured data. Advances in Information Retrieval, pp. 228–239 (2009)
Laender, A.H.F., Moro, M.M., Nascimento, C., Martins, P.: An X-ray on Web-Available XML Schemas. SIGMOD Record 38(1), 37–42 (2009)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective Keyword Search for Valuable LCAs over XML Documents. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 31–40 (2007)
Li, G., Feng, J., Wang, J., Zhou, L.: An Effective and Versatile Keyword Search Engine on Heterogenous Data Sources. Proceedings of the VLDB Endowment 1(2), 1452–1455 (2008)
Li, Y., Yang, H., Jagadish, H.V.: NaLIX: an Interactive Natural Language Interface for Querying XML. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 900–902 (2005)
Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 72–83 (2004)
Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 329–340 (2007)
Luo, Y., Wang, W., Lin, X.: SPARK: A Keyword Search Engine on Relational Databases. In: Proceedings of the 24th International Conference on Data Engineering, pp. 1552–1555 (2008)
Mesquita, F., Barbosa, D., Cortez, E., da Silva, A.S.: FleDEx: Flexible Data Exchange. In: Proceedings of the 9th ACM International Workshop on Web Information and Data Management, pp. 25–32 (2007)
Mesquita, F., da Silva, A.S., de Moura, E.S., Calado, P., Laender, A.H.F.: LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces. Information Process Management 43(4), 983–1004 (2007)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway SLCA-based keyword search in XML data. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1043–1052 (2007)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 527–538 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da C. Hummel, F., da Silva, A.S., Moro, M.M., Laender, A.H.F. (2011). Automatically Generating Structured Queries in XML Keyword Search. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-23577-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23576-4
Online ISBN: 978-3-642-23577-1
eBook Packages: Computer ScienceComputer Science (R0)