Continuous querying in database-centric Web applications

https://doi.org/10.1016/S1389-1286(00)00080-3Get rights and content

Abstract

Web applications are becoming increasingly database-centric. Unfortunately, the support provided by most Web sites to explore such databases is rather primitive and is based on the traditional database metaphor of submitting an SQL query and packaging the response as an HTML page. Very often, the result set is empty or contains too many records. It is up to the user to refine the query by guessing how the query constraints must be tightened or relaxed and then go through another submit/response cycle. Furthermore, once results are displayed, typically no further exploration capabilities are offered. Web applications requiring interactive exploration of databases (e.g. e-commerce) need that the above submit/response metaphor be replaced with a continuous querying metaphor that seamlessly integrates querying with result browsing. In addition to supporting queries based on predicates on attribute values, queries based on example records should also be supported. We present techniques for supporting this metaphor and discuss their implementation in a Web-based database exploration engine.

Introduction

Web applications are becoming increasingly database-centric. In a 1997 Forrester survey [5], respondent companies indicated that nearly 40% of the content at their Web sites originated from databases. This was expected to rise as high as 65% by 1998, and that this fraction was expected to increase. Many new Web applications require that a user be able to interactively explore these databases over the Internet or an internal network.

A common example of such interactive exploration is the task of finding products or services matching a user’s requirements. While this is a widely performed task [9], the support provided by current Web sites for implementing this functionality is rather primitive. Typically, a server-side database is relied upon for all query processing. The user is presented a form for providing specifications of the desired product in terms of bounds on the values of the product attributes (e.g. a 3.3 V zero delay clock buffer in 16-pin 150-mil SOIC or TSSOP package with output skew less than 250 ps and device skew less than 750 ps having an operating range of 25–100 MHz). On submission, this information is used to construct an SQL query that is in turn submitted to a server-side database. The result is returned to the browser formatted as an HTML page. Very often, the result set is empty or contains too many records. It is up to the user to refine the query by guessing how the query constraints must be tightened or relaxed and then go through another submit/response cycle. Furthermore, once results are displayed, typically no further exploration capabilities are offered. As a result, the user needs knowledge of not only the domain of interest but also the particular dataset. Further aggravating this problem is that the round-trip time between browser and the database server for each submit/response cycle is often frustratingly large.

The problem is that database query technology is targeted at reporting rather than user exploration. In traditional database applications, queries are rigid in that they are intended for asking very specific questions. The query results are interesting regardless of whether they contain zero records or ten thousand. In a sense, an individual query itself is the goal. In user exploration, the goal is not simply an individual query or its results, but rather locating particular records of interest. Rarely can this be achieved with a single query. As a result, users typically issue many related queries before they are finally satisfied.

What is needed is that this `submit a query and wait for a response' metaphor be replaced by a new continuous querying metaphor. The user should be able to combine searching with result browsing so that the user simultaneously sees the current query and the qualifying records in a single view. As the user changes the query constraints, the user should immediately see the impact on the qualifying records in that view.

We present techniques for supporting this continuous querying metaphor. These techniques have been implemented in a database exploration engine, we call Eureka. We present in Section 2the user interface that facilitates database exploration using the continuous querying metaphor. For this metaphor to be successful, it is imperative that as soon as a user manipulates a GUI control, the user sees its effect instantaneously, which in turn requires well-tuned data structures. In Section 3, we present the design and implementation of the Eureka engine. We conclude with a summary and some possible directions for future work in Section 4.

A large number of e-commerce sites provide parametric search capabilities in which users search for desired products by providing bounds on attribute values. Stock screens at investment sites such as Charles Schwab2, travel package selection at travel sites such as Travelocity3, electronic component search at semiconductor sites such as Cypress4 are examples of this type of search. Some e-commerce tools (e.g. Net.Commerce [7]) provide support for implementing such searches. As stated earlier, these sites typically rely entirely on a server-side database for query processing. They are thus limited to the submit/response metaphor and suffer from the problems of long response times and too many or too few answers.

Some newer sites are providing a subset of interactive exploration capability described in this paper. For example, Microsoft's Carpoint5, Cars.com6 and Wireless Dimension7 combine browsing with querying; as the query is changed, the user immediately sees the effect on results. These sites currently do not support querying based on example products. The details of their implementations are not available in published literature. It is doubtful that the Wireless Dimension's Javascript implementation (which uses HTML for its output) is designed to scale to large product sets, and only Carpoint (using native ActiveX code) allows users to explore datasets with more than a few hundred products.

An interesting approach to handle the problem of too many or too few answers was taken by 64K Inc. [1]. Although still an HTML form-based approach, the query pages generated by the 64K engine contain histogram information, showing how records are distributed over each attribute's range of values, as well as a count of the total number of records. This information is meant to provide hints to the user for modifying the query before resubmitting it. If a query results in too many records, rather than showing them to the user, the engine redisplays the query page updated with new histogram and count information. In the case of too few answers, the engine uses domain-specific distance metrics to relax the query and return nearby records. However, the 64K search metaphor is still the standard submit/response metaphor (although perhaps with fewer cycles and richer features).

Another approach to handling too many answers is represented by the FOCUS application described in [10]. In this technique, the results of a query are cached and displayed in a compressed table. Further restrictions to reduce the result set are applied on the cached results. However, as the authors acknowledge, FOCUS is mainly suited for tables with up to a few hundred records and attributes. In the case of Spotfire Pro8, results are presented in various graphical formats (e.g. pie charts, scatter plots, etc.) and users can manipulate sliders and list boxes to interactively search through the data. Spotfire Pro is a client-side application that must be installed locally on the user's machine. Details of their implementation are not available in the published literature.

Other related work includes the dynamic query and starfield work of [2]and [3]. Like Spotfire Pro, users can manipulate numeric sliders and other GUI controls and see the effects of these actions on result sets displayed in 2D scatter plots. This work focuses mainly on the human-interface aspect of this search metaphor and the implementation details are rather sparse.

Section snippets

User interface

Fig. 1 shows Eureka's user interface. Records are displayed in a scrollable list format with a separate column for each numeric and categorical attribute. At the top of each column is a title bar showing the name of the attribute. The names of the attributes are obtained from the database catalog. Beneath each of these titles is an `attribute control' used for specifying attribute restrictions. Categorical attributes are represented by select lists that allow users to (de)select (un)desired

System design and implementation

We now describe the details of how we implement continuous querying in Eureka. In order to obtain interactive response times, we exploit several key observations. First and foremost is that we must cache data records in the local client. There is little chance of interactive response times unless the interaction between the user and data is moved off of the server and out of the network. While this does place a non-trivial memory requirement on the client, it offers an advantage beyond that of

Conclusions

We presented the design of Eureka, a database exploration engine that implements continuous querying in data-centric Web applications. Eureka has extremely fast response time even when exploring hundreds of thousands of records containing hundreds of attributes. Besides predicates on attribute values, Eureka also supports searching using example records. Query borders in Eureka can also be made non-strict such that records that lie just outside the query region can still be included in the

Acknowledgements

We would like to thank Sunita Sarawagi who helped design and implement our first database exploration engine. We also thank Andreas Arning, Daniel Gruhl, Dimitrios Gunopulos, Howard Ho and Magnus Stensmo for their contributions to the discussions.

References (11)

  • 64K Inc., San Jose, DBGuide Introduction and Technology Overview,...
  • C. Ahlberg and B. Shneiderman, Visual information seeking: tight coupling of dynamic filters with starfield displays,...
  • C. Ahlbert, C. Williamson and B. Shneiderman, Dynamic queries for information exploration: an implementation and...
  • S. Dar, M.J. Franklin, B.T. Jonsson, D. Srivastava and M. Tan, Semantic data caching and replacement, in: Proc. of VLDB...
  • D.A. DePalma, J.C. McCarthy and M. Mackenzie, Interactive technology strategies: content road map, Technical Report 2...
There are more references available in the full text version of this article.

Cited by (8)

  • Technology management methodologies and applications: A literature review from 1995 to 2003

    2005, Technovation
    Citation Excerpt :

    Therefore, modern database technologies need to process large volumes of data, multiple hierarchies, and different data formats to discover in-depth experience or knowledge from large databases in order to manage technology. For example, multidimensional data analysis, on-line analytical processing, data warehouses, web and hypermedia databases (Koschel and Lockemann, 1998; Sokolov and Wulff, 1999; Huang et al., 2000; Wilkins and Barrett, 2000; Shafer and Agrawal, 2000). Furthermore, a hierarchical model learning approach for refining and managing concept clusters discovered from databases has been proposed.

  • An RDBMS-only architecture for web applications

    2021, Proceedings - 2021 47th Latin American Computing Conference, CLEI 2021
  • Analytical expense management system

    2009, 2009 1st International Conference on Networked Digital Technologies, NDT 2009
View all citing articles on Scopus
1

E-mail: {shafer, ragrawal}@almaden.ibm.com

View full text