skip to main content
10.1145/2254736.2254746acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

A schema-driven approach for knowledge-oriented retrieval and query formulation

Published: 20 May 2012 Publication History

Abstract

In order to search across factual knowledge and content explicated using different data formats this paper leverages a generic data model (schema) that transforms keyword-based retrieval models and queries to knowledge-oriented models and semantically-expressive queries. As each of the transformed retrieval models capitalises on a specific evidence space (term, classification, relationship and attribute), we demonstrate two possible combinations of these spaces, namely macro-based or micro-based. For bare keyword-based queries we demonstrate how the data model can be used to augment the queries with classifications, relationships, etc. that reflect the underlying constraints and objects found in the heterogeneous knowledge bases. Using the IMDb benchmark the results demonstrate the feasibility and effectiveness of the instantiated retrieval models and the query reformulation process.

References

[1]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, 2007.
[2]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002.
[3]
H. Azzam and T. Roelleke. A generic data model for schema-driven design in information retrieval applications. In ICTIR, 2011.
[4]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, pages 431--440, 2002.
[5]
M. W. Bilotti, P. Ogilvie, J. Callan, and E. Nyberg. Structured retrieval for question answering. In SIGIR, pages 351--358, 2007.
[6]
C. Bizer and A. Schultz. Benchmarking the performance of storage systems that expose SPARQL endpoints. In ISWC, 2008.
[7]
J. Callan. Passage-level evidence in document retrieval. In SIGIR, pages 302--310, 1994.
[8]
J. Callan. Search engine support for software applications. In CIKM, pages 1--2, 2010.
[9]
R. Cornacchia and A. P. de Vries. A parameterised search system. In ECIR, 2007.
[10]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004.
[11]
S. Elbassuoni and R. Blanco. Keyword search over RDF graphs. In CIKM, pages 237--242. ACM, 2011.
[12]
N. Fuhr, N. Goevert, and T. Roelleke. Dolores: A system for logic-based retrieval of multimedia objects. In SIGIR, pages 257--265, 1998.
[13]
S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS, volume 89, 2003.
[14]
D. Hiemstra and V. Mihajlovic. A database approach to information retrieval: The remarkable relationship between language models and region models. Technical Report arXiv:1005.4752, May 2010. Comments: Published as CTIT Technical Report 05--35.
[15]
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681, 2002.
[16]
G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. Naga: Searching and ranking knowledge. In ICDE, pages 953--962, 2008.
[17]
J. Kim, X. Xue, and W. B. Croft. A probabilistic retrieval model for semistructured data. In ECIR, pages 228--239, 2009.
[18]
W. Lu, S. E. Robertson, and A. MacFarlane. Field-weighted XML retrieval based on BM25. In INEX, pages 161--171, 2005.
[19]
C. Meghini, F. Sebastiani, U. Straccia, and C. Thanos. A model of information retrieval based on a terminological logic. In SIGIR, pages 298--308, 1993.
[20]
P. Mika, E. Meij, and H. Zaragoza. Investigating the semantic gap through query log analysis. In The Semantic Web - ISWC 2009, volume 5823 of Lecture Notes in Computer Science, pages 441--455. 2009.
[21]
T. Neumann and G. Weikum. RDF-3X: A RISC-style engine for RDF. PVLDB, 1(1):647--659, 2008.
[22]
P. Ogilvie and J. Callan. Language models and structured document retrieval. In INEX, pages 33--40, 2002.
[23]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pages 275--281, 1998.
[24]
S. Pradhan, W. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky. Shallow semantic parsing using support vector machines. In HLT-NAACL, pages 233--240, 2004.
[25]
C. Ré and D. Suciu. Materialized views in probabilistic databases: for information exchange and query optimization. In VLDB, pages 51--62, 2007.
[26]
S. Robertson. Understanding inverse document frequency: on theoretical arguments. Journal of Documentation, 60(5):503--520, 2004.
[27]
S. Robertson, H. Zaragoza, and M. J. Taylor. Simple BM25 extension to multiple weighted fields. In CIKM, pages 42--49, 2004.
[28]
T. Roelleke. A frequency-based and a Poisson-based probability of being informative. In SIGIR, pages 227--234, 2003.
[29]
T. Roelleke and N. Fuhr. Retrieval of complex objects using a four-valued logic. In SIGIR, pages 206--214, 1996.
[30]
G. Salton, A. Wong, and C. Yu. Automatic indexing using term discrimination and term precision. Information Processing and Management, 12:43--56, 1976.
[31]
S. Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3):261--377, 2008.
[32]
M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL performance benchmark. CoRR, 2008.
[33]
B. Sigurbjornsson, J. Kamps, and M. de Rijke. An element-based approach to XML retrieval. In INEX, 2003.
[34]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. In VLDB, pages 553--564, 2005.
[35]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In WWW, New York, NY, USA, 2007.
[36]
R. van Zwol and T. van Loosbroek. Effective use of semantic structure in XML retrieval. In ECIR, pages 621--628, 2007.
[37]
K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in jena2. In SWDB, pages 131--150, 2003.
[38]
H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. In CIKM, pages 1015--1018, 2007.
[39]
L. Zhao and J. Callan. A generative retrieval model for structured documents. In CIKM, pages 1163--1172, 2008.

Cited By

View all
  • (2016)Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpiritDatenbank-Spektrum10.1007/s13222-015-0208-z16:1(39-48)Online publication date: 26-Jan-2016
  • (2015)IR meets NLPProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809448(231-240)Online publication date: 27-Sep-2015
  • (2013)On the modelling of ranking algorithms in probabilistic datalogProceedings of the 7th International Workshop on Ranking in Databases10.1145/2524828.2524832(1-6)Online publication date: 30-Aug-2013

Index Terms

  1. A schema-driven approach for knowledge-oriented retrieval and query formulation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KEYS '12: Proceedings of the Third International Workshop on Keyword Search on Structured Data
      May 2012
      78 pages
      ISBN:9781450311984
      DOI:10.1145/2254736
      • General Chairs:
      • Ling Tok Wang,
      • Ge Yu,
      • Jiaheng Lu,
      • Wei Wang
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 May 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. DB+IR
      2. knowledge representation
      3. semantic retrieval

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '12
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpiritDatenbank-Spektrum10.1007/s13222-015-0208-z16:1(39-48)Online publication date: 26-Jan-2016
      • (2015)IR meets NLPProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809448(231-240)Online publication date: 27-Sep-2015
      • (2013)On the modelling of ranking algorithms in probabilistic datalogProceedings of the 7th International Workshop on Ranking in Databases10.1145/2524828.2524832(1-6)Online publication date: 30-Aug-2013

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media