skip to main content
10.1145/2811222.2811224acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Resolving Common Analytical Tasks in Text Databases

Published: 22 October 2015 Publication History

Abstract

With the convergence of data warehousing, online analytical processing and the Semantic Web, analytical tasks are no longer only designed and executed by experts. Instead, various users expect to query keyword search engines with analytical intentions. One efficient approach to answer these tasks is to leverage the factual information stored in large-scale text databases. These systems enable analysts to access unstructured text sources from the Web with structured query languages. The challenge of mapping keyword queries to structured queries has been approached in various forms. However, these systems are not able to detect the underlying intent of a task. Thus, they cannot infer the user's expectations towards specificity and form of the results. Moreover, a large fraction of queries for retrieving analytical results is rare. As a result, services for intent-aware task recognition perform poorly or are not even triggered on these long-tail queries. We report from a study over 102,360 query and click patterns from a factual search engine. Our analysis reveals six common analytical tasks: explore, relate, resolve, list, compare and answer. To distinguish among these, we study the effects of syntactical structures in the query, methods for interactive entity detection and query segmentation techniques. We evaluate these features on language models and Naive Bayes classifiers. From our evaluation we report a combined F1 score of 90% for the prediction of task intent from keyword queries.

References

[1]
E. Agichtein and L. Gravano. Querying Text Databases for Efficient Information Extraction. In ICDE'03, pages 113--124. IEEE, 2003.
[2]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE'02, pages 5--16, San Jose, CA, USA, 2002. IEEE.
[3]
A. Aula and D. M. Russell. Complex and Exploratory Web Search. In ISSS'08, Chapel Hill, NC, USA, 2008.
[4]
R. Baeza-Yates, L. Calderón Benavides, and C. González Caro. The Intention Behind Web Queries. In String Processing and Information Retrieval, pages 98--109. Springer, 2006.
[5]
S. Bergamaschi, F. Guerra, M. Interlandi, R. Trillo-Lado, and Y. Velegrakis. Quest: A Keyword Search System for Relational Data Based on Semantic and Machine Learning Techniques. PVLDB, 6(12):1222--1225, 2013.
[6]
C. Boden, A. Löser, C. Nagel, and S. Pieper. FactCrawl: a fact retrieval framework for Full-Text indices. In WebDB at ACM SIGMOD, 2011.
[7]
A. Broder. A Taxonomy of Web Search. In ACM SIGIR Forum, volume 36, pages 3--10. ACM, 2002.
[8]
W. B. Croft. Processing Text: Document Parsing. In Search Engines: Information Retrieval in Practice, pages 86--101. Addison-Wesley, Boston, 2010.
[9]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In KDD'14, pages 601--610. ACM Press, 2014.
[10]
T. Fawcett. An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8):861--874, 2006.
[11]
C. González-Caro and R. Baeza-Yates. A Multi-Faceted Approach to Query Intent Classification. In String Processing and Information Retrieval, pages 368--379. Springer, 2011.
[12]
Q. Guo and E. Agichtein. Ready to Buy or Just Browsing?: Detecting Web Searcher Goals from Interaction Data. In SIGIR'10, pages 130--137. ACM, 2010.
[13]
R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An Ontology for Search Applications. VLDB'14, 7(7):505--516, 2014.
[14]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 2006.
[15]
L. Heck and H. Huang. Deep Learning of Knowledge Graph Embeddings for Semantic Parsing of Twitter Dialogs. In GlobalSIP'14, pages 597--601. IEEE, 2014.
[16]
M. R. Herrera, E. S. de Moura, M. Cristo, T. P. Silva, and A. S. da Silva. Exploring Features for the Automatic Identification of User Goals in Web Search. Information Processing & Management, 46(2):131--142, 2010.
[17]
J. Hoffart, D. Milchevski, and G. Weikum. STICS: Searching with Strings, Things, and Cats. In SIGIR'14, pages 1247--1248. ACM Press, 2014.
[18]
H. Huang, L. Heck, and H. Ji. Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation. arXiv, abs/1504.07678, 2015.
[19]
J. Huang, R. W. White, and S. Dumais. No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search. In ACM SIGCHI'11, pages 1225--1234. ACM, 2011.
[20]
P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM'13, pages 2333--2338. ACM, 2013.
[21]
A. Jain, A. Doan, and L. Gravano. Optimizing SQL Queries over Text Databases. In ICDE'08, pages 636--645. IEEE Computer Society, 2008.
[22]
I.-H. Kang and G. Kim. Query Type Classification for Web Document Retrieval. In SIGIR'03, pages 64--71. ACM, 2003.
[23]
G. Kasneci, M. Ramanath, F. Suchanek, and G. Weikum. The YAGO-NAGA approach to knowledge discovery. SIGMOD'09, 37(4):41--47, 2009.
[24]
M. P. Kato, T. Yamamoto, H. Ohshima, and K. Tanaka. Investigating Users' Query Formulations for Cognitive Search Intents. In SIGIR'14, pages 577--586. ACM Press, 2014.
[25]
M. Kellar, C. Watters, and M. Shepherd. A Goal-based Classification of Web Information Tasks. ASIS&T'06, 43(1):1--22, 2006.
[26]
T. Kilias, A. Löser, and P. Andritsos. INDREX: In-Database Relation Extraction. Information Systems, 53:124--144, 2015.
[27]
U. Lee, Z. Liu, and J. Cho. Automatic Identification of User Goals in Web Search. In WWW'05, pages 391--400. ACM, 2005.
[28]
A. Löser, S. Arnold, and T. Fiehn. The GoOLAP Fact Retrieval Framework. In Business Intelligence, pages 84--97. Springer, 2012.
[29]
G. Marchionini. Exploratory Search: From Finding to Understanding. Communications of the ACM, 49(4):41--46, 2006.
[30]
D. Nettleton, L. Calderón Benavides, and R. Baeza-Yates. Analysis of Web Search Engine Query Session and Clicked Documents. In Advances in Web Mining and Web Usage Analysis, pages 207--226. Springer, 2007.
[31]
G. Pass, A. Chowdhury, and C. Torgeson. A Picture of Search. In InfoScale, volume 152, page 1. ACM, 2006.
[32]
D. E. Rose and D. Levinson. Understanding User Goals in Web Search. In WWW'04, pages 13--19. ACM, 2004.
[33]
M. Sokolova and G. Lapalme. A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing & Management, 45(4):427--437, 2009.
[34]
S. Tata and G. M. Lohman. SQAK: Doing More with Keywords. In SIGMOD'08, pages 889--902. ACM, 2008.
[35]
D. Vrandečić and M. Krötzsch. Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM, 57(10):78--85, 2014.
[36]
Wolfram Alpha LLC. Technology of Wolfram Alpha. http://www.wolframalpha.com/faqs9.html, 2015. {accessed 2015-03-08}.
[37]
B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv, abs/1412.6575, 2015.
[38]
J. Zamora, M. Mendoza, and H. Allende. Query Intent Detection Based on Query Log Mining. Journal of Web Engineering, 13(1-2):24--52, 2014.
[39]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001.

Cited By

View all
  • (2021)Dynamic Capabilities of Decision-oriented Service SystemsResearch Anthology on Decision Support Systems and Decision Management in Healthcare, Business, and Engineering10.4018/978-1-7998-9023-2.ch011(240-266)Online publication date: 2021
  • (2021)Dynamic Capabilities of Decision-oriented Service SystemsResearch Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing10.4018/978-1-7998-5339-8.ch045(957-984)Online publication date: 2021
  • (2018)Dynamic Capabilities of Decision-oriented Service SystemsInternational Journal of Information Systems in the Service Sector10.4018/IJISSS.201807010310:3(41-63)Online publication date: 1-Jul-2018
  • Show More Cited By

Index Terms

  1. Resolving Common Analytical Tasks in Text Databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DOLAP '15: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP
    October 2015
    108 pages
    ISBN:9781450337854
    DOI:10.1145/2811222
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. informational search
    2. keywords
    3. query intent
    4. user goals

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    DOLAP '15 Paper Acceptance Rate 8 of 31 submissions, 26%;
    Overall Acceptance Rate 29 of 79 submissions, 37%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Dynamic Capabilities of Decision-oriented Service SystemsResearch Anthology on Decision Support Systems and Decision Management in Healthcare, Business, and Engineering10.4018/978-1-7998-9023-2.ch011(240-266)Online publication date: 2021
    • (2021)Dynamic Capabilities of Decision-oriented Service SystemsResearch Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing10.4018/978-1-7998-5339-8.ch045(957-984)Online publication date: 2021
    • (2018)Dynamic Capabilities of Decision-oriented Service SystemsInternational Journal of Information Systems in the Service Sector10.4018/IJISSS.201807010310:3(41-63)Online publication date: 1-Jul-2018
    • (2015)DOLAP 2015 Workshop SummaryProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806876(1939-1940)Online publication date: 17-Oct-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media