A generic construct based workload model for web search

https://doi.org/10.1016/j.ipm.2009.04.004Get rights and content

Abstract

Benchmarks are vital tools in the performance measurement, evaluation, and comparison of computer hardware and software systems. Standard benchmarks such as the TREC, TPC, SPEC, SAP, Oracle, Microsoft, IBM, Wisconsin, AS3AP, OO1, OO7, XOO7 benchmarks have been used to assess the system performance. These benchmarks are domain-specific and domain-dependent in that they model typical applications and tie to a problem domain. Test results from these benchmarks are estimates of possible system performance for certain pre-determined problem types. When the user domain differs from the standard problem domain or when the application workload is divergent from the standard workload, they do not provide an accurate way to measure the system performance of the user problem domain. System performance of the actual problem domain in terms of data and transactions may vary significantly from the standard benchmarks.

In this research, we address the issue of generalization and precision of benchmark workload model for web search technology. The current performance measurement and evaluation method suffers from the rough estimate of system performance which varies widely when the problem domain changes. The performance results provided by the vendors cannot be reproduced nor reused in the real users’ environment. Hence, in this research, we tackle the issue of domain boundness and workload boundness which represents the root of the problem of imprecise, ir-representative, and ir-reproducible performance results. We address the issue by presenting a domain-independent and workload-independent workload model benchmark method which is developed from the perspective of the user requirements and generic constructs. We present a user-driven workload model to develop a benchmark in a process of workload requirements representation, transformation, and generation via the common carrier of generic constructs. We aim to create a more generalized and precise evaluation method which derives test suites from the actual user domain and application setting.

The workload model benchmark method comprises three main components. They are a high-level workload specification scheme, a translator of the scheme, and a set of generators to generate the test database and the test suite. They are based on the generic constructs. The specification scheme is used to formalize the workload requirements. The translator is used to transform the specification. The generator is used to produce the test database and the test workload. We determine the generic constructs via the analysis of search methods. The generic constructs form a page model, a query model, and a control model in the workload model development. The page model describes the web page structure. The query model defines the logics to query the web. The control model defines the control variables to set up the experiments.

In this study, we have conducted ten baseline research experiments to validate the feasibility and validity of the benchmark method. An experimental prototype is built to execute these experiments. Experimental results demonstrate that the method based on generic constructs and driven by the perspective of user requirements is capable of modeling the standard benchmarks as well as more general benchmark requirements.

Introduction

A benchmark is a standard by which something can be measured or judged. A computer system benchmark is a set of executable instructions to be enforced in controlled experiments to compare two or more computer hardware and software systems. Hence, benchmarking is the process of evaluating different hardware systems or reviewing different software systems on the same or different hardware platforms. A web search service benchmark is a standard set of executable instructions which are used to measure and compare the relative and quantitative performance of two or more systems through the execution of controlled experiments. Benchmark data such as throughput, jobs per time unit, response time, time per job unit, price and performance ratio, and other measures serve to predict price and performance and help us to procure systems, plan capacity, uncover bottlenecks, and govern information resources for various user, developer, and management groups (Anon et al., 1985, Bitton et al., 1983, Bohme and Rahm, 2001, Can et al., 2004, David et al., 2001).

Examples are the TREC (Text Retrieval Conference), TPC (Transaction Processing Performance Council), SPEC (Stanford Performance Evaluation Corporation), SAP, Oracle, Microsoft, IBM, Wisconsin, AS3AP, OO1, OO7, XOO7 standard benchmarks that have been used to assess the system performance. These benchmarks are domain-specific in that they model typical applications and tie to one specific problem domain. Test results from these benchmarks are estimates of possible system performance for certain pre-determined problem setting. When the user domain differs from the standard problem domain or when the application workload is divergent from the standard workload, they do not provide an accurate way to measure the system performance of the user application domain. System performance of the actual problem domain in terms of data and transactions may vary significantly from the standard benchmarks. Workload model represents the core of a benchmark method. It determines the scope, scale, and significance of the performance evaluation method. The workload design decides the representativeness and reproducibility of the performance study. Hence, the workload development and improvement is vital to precision and performance in benchmarking (Cardenas, 1973, Carey et al., 1993, Cattell and Skeen, 1992, DeWitt et al., 1990).

Domain boundness means the performance study is bound by a pre-determined problem set. Because the system performance depends on the determined problem domain, domain boundness is also called domain dependency issue. Because of domain boundness and dependency, the system performance results cannot be reproduced and sustained and the test results vary when the applications change and when the user requirements evolve. Workload boundness refers to the performance study that is bound by the pre-fixed workload characteristics and components. Because the system performance study depends on the particular application workload, workload boundness is also called workload dependency issue. Because of domain boundness and dependency, the test results are not comparable and portable between different user settings and application contexts. The workload is not compatible and scalable. Domain boundness and workload boundness are the research issues we aim to address in this research (Gray, 1993, Seng, 2003, Yao et al., 1987, Yu et al., 1992).

As described above, standard benchmarks model certain application types in a pre-determined problem domain. They represent a fixed problem set presented to the proposed system. When the user domain differs from the standard domain or when the user workload deviates from the standard workload, the test results vary significantly in the real setting and under the actual application context. Users cannot reproduce the test results and predict the performance. The reason is because benchmark results are highly dependent upon the real workload and the actual application. The standard test workload cannot represent the real workload and the test suite cannot accommodate the application requirement. Standard benchmarks cannot measure the effects of the user problem on the target system nor generate the realistic and meaningful test results (Bar-Ilan, 2005, SanJuan and Ibekwe-SanJuan, 2006).

Performance measurement, evaluation, and comparison is vital in the development and improvement of web search service technology (Phusavat et al., 2007, Seng and Lin, 2007, Wu et al., 2009). A desirable benchmark method must be domain-independent and workload-independent. It means the core, workload model, ought to be flexible, scalable, reproducible, portable, and representative, and provide precision and performance profile. Hence, in this research, a more generic and user requirements-driven workload model of benchmark will be developed to tackle the research issues of domain dependency and workload dependency when the benchmark method is domain bound and workload bound (Jansen & Spink, 2006; Kraaij et al., 2002, Vaughan, 2004).

In this paper, we present a domain-independent and workload-independent benchmark method which is developed from the perspective of the user requirements and application settings of web search. We propose to develop a more generalized and more precise performance evaluation method from the perspective of the common carriers of Internet search. We create a user-driven approach which models the benchmark development in a process of workload requirements representation, transformation, and generation.

Benchmarks can be synthetic or empirical. Synthetic benchmarks model the typical applications in a problem domain and create the synthetic workload. Empirical benchmarks utilize the real data and tests. Though real workloads are ideal tests, the costs of re-implementation of the actual systems usually outweigh the benefits to be secured. Synthetic benchmarks are therefore the common approach chosen by users, developers, and managers (Gray, 1993, Martinez-Gonzalez and Fuente, 2007, Poess and Floyd, 2000, SanJuan and Ibekwe-SanJuan, 2006, Vaughan, 2004).

Further, benchmark experiments are composed of the experimental factors and the performance metrics. Experimental factors represent the variables which can affect the performance of the systems. Performance metrics are the quantitative measurements to be collected and observed in the benchmark experiments. They represent the set of independent variables and dependent variables to be modeled and formulated in the benchmark.

Workload represents the flux of a benchmark method. Workload model establishes the build-up and constructs in a benchmark method. It is hence the core of benchmarking process. A workload is the amount of work assigned to or performed by a worker or unit of workers in a given time period. The workload is the amount of work assigned to or performed by a system in a given period of time. The workloads are best described by the amount of work, the rate at which the work is created, and the characteristics, distribution, and content of the system work. For instance, the amount of work means the number of transactions. The rate at which the work is created refers to the frequency and the arrival rate of transactions. The characteristics, distribution, and content of the workload contains the attributes, data types, value distributions, number of records, and number of data files. Conventionally, workload modeling and characterization start with the domain survey, observation, and data collection, and continue with a study of the main components and their characteristics. In general, the workload components will be developed by a series of analysis and design in the aspects of data, operation, and control. That is the reason why the concept of common carrier and user requirements are introduced to this research. Workload model development is perceived as a process of workload requirements determination, transformation, and generation.

Furthermore, workload requirements analysis involves the data analysis, the operation analysis, and the control analysis. We aim to analyze the size of the data, the number of records, the length of records, the types of attributes, the value distributions and correlations, the keys and indexing, the hit ratios, and the selectivity factors. In regard to the operation analysis, we examine the complexity of operations, the correlation of operation, the data input into the operation, the attributes and objects used by the operation, the result size, and the output mode. These are further investigated with the control analysis in the aspects of the duration of test, the number of user, the order of test, the number of repetition, the frequency and distribution of test, and the performance metrics.

Under the web search service context, we develop a workload model that is comprised with a set of common carriers in the form of generic construct. Generic constructs are the basic units, parts, components, elements, or steps in an algorithm, a method, and a structure that can be decomposed and extracted in a logic manner. Generic constructs normally are used to build a new and more complex algorithm, method, and structure. They represent the foundation of a set of theories and practices that have been accepted and applied in general.

We develop a workload requirements specification scheme, a scheme translator, and a set of benchmark generators in the research method. We use the common carrier of generic constructs to collect and capture the workload requirements. Then we apply the scheme, translator, and generator to build a computer-aided web search benchmarking environment.

We extract the generic constructs from the main research literature. We analyze each main web search algorithm. We deduce the generic constructs from the literature and form the common carrier. The web page structure and query structure are decomposed into building blocks. The main idea is to describe the data model and the operation model of workload model in terms of the page structure and query structure. It is not tied to any particular web search scenario and usage. Instead, a more domain-independent and workload-independent requirements collection and compilation approach can be developed.

  • Workload specification scheme

The workload specification scheme is designed to model the application requirements. It is a high-level generic construct concept to delineate requirements concerning data, operation, and control. A generic construct is the basic unit of operand. An operation is the basic unit of operator. The collection of a generic construct and an operation form a workload unit. Each workload unit then becomes a building block to develop a larger workload unit.

  • Scheme translator

The scheme translator is created with a set of lexical rules and a set of syntactical rules to translate the workload specification. It performs the code generation and produces three output specifications. One is the data specification. The other is the operation specification. Another is the control specification.

  • Data generator

The data generator is made up of a set of data generation procedures which are used to create the test database according to the data distribution specification.

  • Operation generator

The operation generator is made up of a set of operation generation procedures to generate the search operations. These procedures select operations, determine operation precedence, schedule arrivals, prepare input data, issue tests, handle queues, gather and report time statistics.

  • Control generator

The control generator is made up of a set of control generation procedures to generate the control scripts which are used to drive and supervise the experiment execution.

This paper is organized into five sections. Section one introduces this research with motivation, issue, and approach. Section two reviews the main web search method to establish the basis of generic constructs development. Section three presents the requirements-driven and generic-constructs-based workload model benchmark method. Section four then delineates the ten baseline research experiments to prove the feasibility and validity of the method. Finally, section five discusses the contribution and limitation, and concludes the paper with a brief summary and future research work.

Section snippets

Literature review

In this section, we review the main web search methods and benchmark approaches to establish the basis of common carrier, and to extract the generic constructs of the page structure and query structure. The key set of search algorithms consisting of PageRank, HITS, BHITS, WBHITS, VSM, Okapi, CDR, and TLS based on the survey described and conducted in (Hastie et al., 2001, Martinez-Gonzalez and Fuente, 2007, Shafi and Rather, 2005, Vaughan, 2004). ACM SIGIR TREC is generally viewed and used as

Components of research model

The research model is made up of a page model, a query model, and a control model. The page model and the query model are based on the requirements determination of the classic algorithms of the web search. The generic constructs, the operations of the generic constructs, and the constraints are extracted, developed, and derived for each algorithm. The research model is shown in Fig. 1. The page model describes a generic page layout structure. The query model defines a set of criteria to query

Experiment design

A set of baseline experiments to verify the feasibility and validity of the research method is conducted via a prototype system. Due to the larger set of APIs offered by Yahoo! Web Search, we chose Yahoo! Web Search APIs to carry out the experiments and selected any corresponding APIs from Google for a basic comparative study. The set of baseline experiments is composed of ten synthetic test suites. Each test suite is made up of a simple case and a specific function of a web search. The

Research discussion and conclusions

In this research, we have accomplished four main missions. First, an analysis framework of web search and benchmark literature to lay the basis of generic construct creation is developed. Our aim was to survey related literature on the classic web search algorithms and the benchmark methods. We have collected and compiled the key characteristics of the main web search algorithms and the benchmark methods. We have created and congealed the steps to extract and formulate the generic constructs

Acknowledgement

This research is sponsored by National Science Council research Grant No. NSC95-2416-H-004-006.

References (39)

  • Bharat, K., & Henzinger, M. R. (1998). Improved algorithms for topic distillation in hyperlinked environment. In SIGIR...
  • Bitton, D., DeWitt, D. J., & Turbyfill, C. (1983). Benchmarking data management systems – a systematic approach. In...
  • Bohme, T. & Rahm, E. (2001). Xmach-1: Benchmark for XML data management. In Proceedings of the German data management...
  • S. Brin et al.

    The anatomy of a large-scale hypertextual web search engine [Electronic version]

    Computer Networks and ISDN Systems

    (1998)
  • A.F. Cardenas

    Evaluation and selection of file organization – a model and system

    Communications of the ACM

    (1973)
  • Carey, M. J., DeWitt, D. J., & Naughton, J. F. (1993). The 007 benchmark. In Proceedings of the 1993 ACM SIGMOD...
  • R.G.G. Cattell et al.

    Engineering data management benchmark

    ACM Operations on Data Management Systems

    (1992)
  • S. Clarke et al.

    Estimating the recall performance of search engines

    ASLIB Proceedings

    (1997)
  • H. David et al.

    Measuring search engine quality [Electronic version].

    Information Retrieval

    (2001)
  • Cited by (5)

    • Comparing open data benchmarks: Which metrics and methodologies determine countries’ positions in the ranking lists?

      2021, Telematics and Informatics
      Citation Excerpt :

      Examples of such benchmarks include the Open Data Readiness Assessment (Global Delivery Initiative, 2020; The World Bank Group, 2019), the Open Data Inventory (Open Data Watch, 2020), and the Global Open Data Index (Open Knowledge Foundation, 2019a, 2019b). In general terms, benchmarks are defined as standards “by which something can be measured or judged” (Seng et al., 2009, p. 530). The activity of benchmarking concerns comparing countries' or organizations' progress in a particular area using analysis and assessment (Maheshwari and Janssen, 2014).

    • Research of the user search behavior on multiple agent intelligent agent technology

      2013, Proceedings - 2013 Chinese Automation Congress, CAC 2013
    • Research searching behavior based multi-agent technology

      2012, Proceedings of 2012 9th IEEE International Conference on Networking, Sensing and Control, ICNSC 2012
    • Distribution of cognitive load in Web search

      2010, Journal of the American Society for Information Science and Technology
    View full text