Beyond search: Retrieving complete tuples from a text-database

Löser, Alexander; Nagel, Christoph; Pieper, Stephan; Boden, Christoph

doi:10.1007/s10796-012-9403-8

Beyond search: Retrieving complete tuples from a text-database

Published: 23 January 2013

Volume 15, pages 311–329, (2013)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Alexander Löser¹,
Christoph Nagel¹,
Stephan Pieper¹ &
…
Christoph Boden¹

312 Accesses
5 Citations
Explore all metrics

Abstract

A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

Boolean interpretation, matching, and ranking of natural language queries in product selection systems

Article Open access 03 April 2024

Graph Databases: Their Power and Limitations

Notes

For instance, Yahoo Boss (Clarke et al. 2008) charges currently $0.30 dollar per 1.000 requests for the top-10 search results and OpenCalais (Croft et al. 2009) charges $2000 per 3.000.000 pages that are extracted with their service.

References

Agichtein, E., & Gravano, L. (2003). Qxtract: a building block for efficient information extraction from plain-text databases. In SIGMOD conference (p. 663).
Avnur, R., & Hellerstein, J.M. (2000). Eddies: continuously adaptive query processing. In SIGMOD conference (pp. 261–272).
Banko, M., & Etzioni, O. (2008). The tradeoffs between open and traditional relation extraction. In ACL (pp. 28–36).
Boden, C., Hafele, T., Löser A. (2011). Classification algorithms for relation prediction. In ICDE workshops (pp. 46–52).
Boden, C., Löser, A., Nagel, C., Pieper, S. (2011). Factcrawl: a fact retrieval framework for full-text indices. In 14th WebDB workshop with ACM SIGMOD
Boden, C., Löser, A., Nagel, C., Pieper, S. (2012). Fact-aware document retrieval for information extraction. Datenbank-Spektrum, 12, 89–100.
Article Google Scholar
Castellanos, M., Wang, S., Dayal, U., Gupta, C. (2010). Sie-obi: a streaming information extraction platform for operational business intelligence. In SIGMOD conference (pp. 1105–1110).
Chakrabarti, S., Sarawagi, S., Sudarshan, S. (2010). Enhancing search with structure. IEEE Data Engineering Bulletin, 33(1), 3–24.
Google Scholar
Clarke, C.L.A., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. (2008). Novelty and diversity in information retrieval evaluation. In SIGIR (pp. 659–666).
Croft, B., Metzler, D., Strohman, T. (2009). Search engines: Information retrieval in practice (1st ed.) USA: Addison-Wesley Publishing Company.
Google Scholar
Crow, D. (2010). Google Squared: Web scale, open domain information extraction and presentation. In ECIR, industrial track.
DeRose, P., Shen, W., 0002, F.C., Doan, A., Ramakrishnan, R. (2007a). Building structured web community portals: A top-down, compositional, and incremental approach. In VLDB (pp. 399–410).
DeRose, P., Shen, W., 0002, F.C., Lee, Y., Burdick, D., Doan, A., Ramakrishnan, R. (2007b). Dblife: A community information management platform for the database research community (demo). In CIDR (pp. 169–172).
Dong, X., Halevy, A., Madhavan, J. (2005). Reference reconciliation in complex information spaces. In ACM SIGMOD (pp. 85–96).
Etzioni, O., Banko, M., Soderland, S., Weld, D.S. (2008). Open information extraction from the web. Communications of the ACM, 51(12), 68–74.
Article Google Scholar
Feldman, R., Regev, Y., Gorodetsky, M. (2008). A modular information extraction system. Intelligent Data Analysis, 12(1), 51–71.
Google Scholar
Fortune 500 companies (2010). http://money.cnn.com/magazines/fortune (Last visited 01/06/10).
Fung, G.P.C., Yu, J.X., Lu, H. (2002). Discriminative category matching: Efficient text classification for huge document collections. In ICDM (pp. 187–194).
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A. (2001). Declarative data cleaning: language, model, and algorithms. In VLDB (pp. 371–380).
Grishman, R., Huttunen, S., Yangarber, R. (2002). Information extraction for enhanced access to disease outbreak reports. Journal of Biomedical Informatics, 35(4), 236–246.
Article Google Scholar
Halevy, A.Y. (2001). Answering queries using views: a survey. The VLDB Journal, 10, 270–294.
Article Google Scholar
HSQLDB (2011). http://hsqldb.org/ (Last visited 06/14/11).
Ilyas, I.F., Beskales, G., Soliman, M.A. (2008). A survey of top-query processing techniques in relational database systems. ACM Computing Surveys, 40(4).
Ipeirotis, P.G., Agichtein, E., Jain, P., Gravano, L. (2006). To search or to crawl?: towards a query optimizer for text-centric tasks. In SIGMOD conference (pp. 265–276).
Jain, A., Doan, A., Gravano, L. (2008). Optimizing sql queries over text databases. In ICDE (pp. 636–645).
Jain, A., Ipeirotis, P.G., Doan, A., Gravano, L. (2009). Join optimization of information extraction output: quality matters! In ICDE (pp. 186–197).
Jain, A., & Pantel, P. (2010). Factrank: random walks on a web of facts. In COLING (pp. 501–509).
Jain, A., & Srivastava, D. (2009). Exploring a few good tuples from text databases. In ICDE (pp. 616–627).
Kasneci, G., Suchanek, F.M., Ramanath, M., Weikum, G. (2008). The YAGO-NAGA approach to knowledge discovery. SIGMOD Record, 37, 4.
Article Google Scholar
Liu, J., Dong, X., Halevy, A.Y. (2006). Answering structured queries on unstructured data. In WebDB.
Löser, A., Hüske, F., Markl, V. (2008). Situational business intelligence. In BIRTE.
Löser, A., Lutter, S., Düssel, P., Markl, V. (2009). Ad-hoc queries over document collections—a case study. In BIRTE (pp. 50–65).
Löser A., Nagel, C., Pieper, S. (2010). Augmenting tables by self-supervised web search. In BIRTE
Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H. (2004). Robust query processing through progressive optimization. In SIGMOD conference (pp. 659–670).
Naumann, F. (2002). Quality-driven query answering for integrated information systems. Lecture notes in computer science Vol. 2261: Springer.
OpenCalais (2011). www.opencalais.com (Last visited 06/14/11).
Pérez-Martínez, J.M., Llavori, R.B., Cabo, M.J.A., Pedersen, T.B. (2008). Contextualizing data warehouses with documents. Decision Support Systems, 45(1), 77–94.
Article Google Scholar
Riloff, E. (1996). Automatically generating extraction patterns from untagged text. AAAI/IAAI, 2, 1044–1049.
Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G. (1988). Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD international conference on management of data, 30 May–1 June 1979 (pp. 23–34). Boston, Massachusetts.
Wu, F., & Weld, D.S. (2010). Open information extraction using wikipedia. In ACL (pp. 118–127).
Yu, C., Lakshmanan, L.V.S., Amer-Yahia, S. (2009). It takes variety to make a world: diversification in recommender systems. In EDBT (pp. 368–378).

Download references

Acknowledgments

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement nr. FP7-ICT-2009-5-257859, ‘Risk and Opportunity management of huge-scale BUSiness community cooperation’ (ROBUST). Alexander Löser also received funding from the Federal Ministry of Economics and Technology (BMWi) under grant agreement “01MD11014A, ‘MIA-Marktplatz für Informationen und Analysen’ (MIA)”.

Author information

Authors and Affiliations

Database Systems and Information Management Group (DIMA), Technische Universität Berlin (TUB), Einsteinufer 17, 10587, Berlin, Germany
Alexander Löser, Christoph Nagel, Stephan Pieper & Christoph Boden

Authors

Alexander Löser
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Nagel
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Pieper
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Boden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Löser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Löser, A., Nagel, C., Pieper, S. et al. Beyond search: Retrieving complete tuples from a text-database. Inf Syst Front 15, 311–329 (2013). https://doi.org/10.1007/s10796-012-9403-8

Download citation

Published: 23 January 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s10796-012-9403-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond search: Retrieving complete tuples from a text-database

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Boolean interpretation, matching, and ranking of natural language queries in product selection systems

Graph Databases: Their Power and Limitations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Beyond search: Retrieving complete tuples from a text-database

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Boolean interpretation, matching, and ranking of natural language queries in product selection systems

Graph Databases: Their Power and Limitations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation