research-article

Text-to-query: dynamically building structured analytics to illustrate textual content

Authors:
Raphaël Thollot

Ecole Centrale Paris

Ecole Centrale Paris
View Profile

,
Falk Brauer

SAP Research CEC Dresden

SAP Research CEC Dresden
View Profile

,
Wojciech M. Barczynski

SAP Research CEC Dresden

SAP Research CEC Dresden
View Profile

,
Marie-Aude Aufaure

Ecole Centrale Paris

Ecole Centrale Paris
View Profile

EDBT '10: Proceedings of the 2010 EDBT/ICDT WorkshopsMarch 2010Article No.: 14Pages 1–8https://doi.org/10.1145/1754239.1754255

Published:22 March 2010Publication History

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

Pages 1–8

ABSTRACT

Successfully structuring information in databases, OLAP cubes, and XML is a crucial element in managing data nowadays. However this process brought new challenges to usability. It is difficult for users to switch from common communication means using natural language to data models (e.g., database schemas) that are hard to work with and understand, especially for occasional users. This important issue is under intense scrutiny in the database community (e.g., keyword search over databases and query relaxation techniques), and the information extraction community (e.g., linking structured and unstructured data). However, there is still no comprehensive solution that automatically generates an OLAP (Online Analytical Processing) query and chooses a visualization based on textual content with high precision. We present such a method. We discuss how to dynamically generate interpretations of a textual content as an OLAP query, select the best visualization, and retrieve on the fly corresponding data from a data warehouse. To provide the most relevant aggregation results, we consider the user's actual context, described by a document's content. Moreover we provide a prototypical implementation of our method, the Text-To-Query system (T2Q) and show how T2Q can be successfully applied to an enterprise scenario as an extension for an office application.

References

Schwarz, S. (2006). A context model for personal knowledge management applications. Modeling and retrieval of context (pp. 18--33). Springer Google ScholarDigital Library
Blumberg R., Atre S., The problem with unstructured data, in DMReview, February 2003.Google Scholar
Schlegel, K., Beyer, M. A., & Hostmann, B. (2009). Predicts 2009: Business Intelligence and Performance Management Will Deliver Greater Business Value. Gartner.Google Scholar
Bhide, M., Chakravarthy, V., Gupta, A., Gupta, H., Mohania, M., Puniyani, K., et al. (2008). Enhanced Business Intelligence using EROCS. International Conference on Data Engineering (ICDE), (pp. 1616--1619). Google ScholarDigital Library
Howson, C. (2006). BusinessObjects XI (Release 2): The Complete Reference, 1 edition. Google ScholarDigital Library
Hull, D. (1999). Xerox TREC-8 question answering track report. TREC-8, (pp. 35--56).Google Scholar
Arasu, A., Chaudhuri, S., & Kaushik, R. (2009). Learning String Transformations From Examples. Very Large Data Bases (VLDB). Lyon. Google ScholarDigital Library
Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge & Data Engineering (TKDE), (pp. 1--16). Google ScholarDigital Library
Chaudhuri, S., Ganti, V., & Xin, D. (2009). Mining Document Collections to Facilitate Accurate Approximate Entity Matching. Very Large Data Bases (VLDB). Lyon. Google ScholarDigital Library
Chakaravarthy, V., Gupta, H., Roy, P., & Mohania, M. (2006). Efficiently Linking Text Documents with Relevant Structured Information. Very Large Data Bases (VLDB). Seoul, (pp. 667--678) Google ScholarDigital Library
Chen, H., Finin, T., & Joshi, A. (2003). An Intelligent Broker for Context Aware Systems. Ubicomp.Google Scholar
Simitsis, A., Baid, A., Sismanis, Y., & Reinwald, B. (2006). Multidimensional Content eXploration. Very Large Data Bases (VLDB), (pp. 660--671). Google ScholarDigital Library
Inokuchi, A., & Takeda, K. (2007). A method for online analytical processing of text data. Conference on Information and Knowledge Management (CIKM). Lisboa. (pp. 455--464). Google ScholarDigital Library
Pérez, J. M., Berlanga, R., Aramburu, M. J., & Pedersen, T. B. (2007). Integrating Data Warehouses with Web Data: A Survey. IEEE Trans. Knowl. Data Eng. 20(7). Google ScholarDigital Library
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., & Sudarshan, S. (2002). Keyword Searching and Browsing in Databases using BANKS. International Conference on Data Engineering (ICDE), (pp. 431--440). Google ScholarDigital Library
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., & Karambelkar, H. (2005). Bidirectional Expansion For Keyword Search On Graph Databases. Very Large Data Bases (VLDB), (pp. 505--516). Google ScholarDigital Library
Luo, Y., Lin, X., Wang, W., & Zhou, X. (2007). Spark: top-k keyword query in relational databases. ACM Special Interest Group on Management Of Data (SIGMOD), (pp. 563--574). Google ScholarDigital Library
Li, G., Ooi, B. C., Feng, J., Wang, J., & Zhou, L. (2008). EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. ACM Special Interest Group on Management Of Data (SIGMOD), (pp. 903--914). Google ScholarDigital Library
Farfan, F., Hristidis, V., Ranganathan, A., & Weiner, M. (2009). XOntoRank: Ontology-Aware Search of Electronic Medical Records. International Conference on Data Engineering (ICDE), (pp. 820--831). Google ScholarDigital Library
Guo, L., Shao, F., Botev, C., & Shanmugasundaram, J. (2003). XRANK: Ranked Keyword Search over XML Documents. ACM Special Interest Group on Management Of Data (SIGMOD), (pp. 16--27). Google ScholarDigital Library
Liu, F., Yu, C., Meng, W., & Chowdhury, A. (2006). Effective Keyword Search in Relational Databases. ACM Special Interest Group on Management Of Data (SIGMOD), (pp. 563--574). Google ScholarDigital Library

Index Terms

Text-to-query: dynamically building structured analytics to illustrate textual content
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
      2. Dictionaries
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Semantics and usage statistics for multi-dimensional query expansion
DASFAA'12: Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II

As the amount and complexity of data keep increasing in data warehouses, their exploration for analytical purposes may be hindered. Recommender systems have grown very popular on the Web with sites like Amazon, Netflix, etc. These systems proved ...
Read More
Self-structured data banks semantic integrity and query assistance interface
RIAO '88: User-Oriented Content-Based Text and Image Handling

SIGMINI is an information system designed between retrieval systems and data base managing systems. The documents handled by this system have a flexible data structure which is a non predeclared hierarchical structure. This is necessary to deal with ...
Read More
Query recommendations for OLAP discovery driven analysis
DOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP

Recommending database queries is an emerging and promising field of investigation. This is of particular interest in the domain of OLAP systems where the user is left with the tedious process of navigating large datacubes. In this paper we present a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops
March 2010
290 pages
ISBN:9781605589909
DOI:10.1145/1754239

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
application
data analysis context
dictionaries
query
recommendation
unstructured
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 240
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text-to-query: dynamically building structured analytics to illustrate textual content

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantics and usage statistics for multi-dimensional query expansion

Self-structured data banks semantic integrity and query assistance interface

Query recommendations for OLAP discovery driven analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Text-to-query: dynamically building structured analytics to illustrate textual content

EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantics and usage statistics for multi-dimensional query expansion

Self-structured data banks semantic integrity and query assistance interface

Query recommendations for OLAP discovery driven analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media