A high performance distributed database system for enhanced Internet services

https://doi.org/10.1016/S0167-739X(98)00084-3Get rights and content

Abstract

Using a distributed database system as a part of the distributed web sever architecture has obvious advantages. It is shown that a first phase distributed database system can be build by extending an existing object oriented database system with application-specific additions. A web database is implemented, as a part of the traditional HTTP-based distributed web server, using this distributed database system.

Introduction

Users of the Internet impose increasing demand on network bandwidth, web server performance and reliability of services. The number of new users and different kinds of services is stll increasing every day as well. In only a few years, the Hypertext Transfer Protocol (HTTP) [1] based World Wide Web (www) has become the main service bypassing older services like ftp, e-mail and telnet. As the Internet is maturing, it is very likely that new services, for instance, audio and video streaming, telephony, secure payment transactions and news casting, will make the spectrum more diverse and may even challenge the current WWW dominance. Also, it is likely that the definition of the WWW services will change over time: originally, the WWW was strictly bound to the HTTP, but nowadays most web browsers support addition of components for dedicated services, like audio streaming, and are able to run Java applets. This enables web clients to use a wide range of protocols including UDP and TCP itself. Consequently, the distinction between services is fading.

The current ad hoc web server architecture is barely able to meet the increasing demands. The traditional way of accessing data on the web is through a web server machine running an HTTP daemon that reads the data from the local file system, which was later enhanced with Common Gateway Interface (CGI) [2]. The CGI standard defines an interface between external gateway elements and the HTTP daemon. While the HTTP daemon retrieves static data, such as an HTML document, from the file system, a CGI element is invoked every time it is addressed by a web client, allowing for dynamic data generation and database access. This architecture does have some inherent inefficiencies:

  • 1.

    Lack of data management: As data on the web server becomes more complex and inter-dependent, data management gets more important. Take for instance a set of HTML documents pointing to each other: if one document is removed, the referring documents must be updated as well. In this case, storing data on the file system does not enforce HTML link consistency.

  • 2.

    CGI is stateless: No state information is preserved while making calls to CGI elements. Suppose a CGI element is used to connect to a database module and suppose that a web client makes multiple transactions on the same database module. Then it is necessary to connect, authenticate and disconnect them separately for every transaction.

The standard implementation of this architecture has the following drawbacks:
  • 1.

    CGI diminishes performance: Normally, CGI elements are implemented in such a way that for every request sent to them a new process is created. From the underlying operating system point of view, this is costly.

  • 2.

    No scalability: No specific provisions are made for parallelism to enable scalability.

Currently, the use of frameworks supporting the implementation of distributed object oriented Internet applications, such as the Common Object Request Broker Architecture (CORBA) [3], [4] and Java’s Remote Method Invocation (RMI) [5], is becoming more and more accepted. Such frameworks would enable a web client to conveniently connect to, for instance, database server objects, thereby bypassing the CGI mechanism. In general, however, traditional HTTP-based web servers do not fit well the client server paradigm those frameworks rely on, making them less suitable to host (server) objects for a distributed object oriented application. A sign that a framework like CORBA is gaining momentum is the inclusion of a Java-based Object Request Broker (ORB) in the latest release of the Netscape communicator web browser.

Given the previous considerations, an object oriented, high performance, scalable and fault tolerant database system embedded in the web server architecture may remove some of the inefficiencies associated with the traditional web server, especially the lack of data management and scalability, and may enable the support of the employment of distributed object frameworks.

This paper describes the first phase realization of a distributed database system as one of a number of possible next generation web server architectures. An iterative process is followed to build the distributed database system. Every phase in this building process has a clearly defined set of objectives, spans a limited amount of time, adds fuctionality, and the output of every phase serves as input for the next phase. The first phase will implement a minimum level of functionality, concentrating on inter-node communication and not on general distributed transaction and query mangement mechanisms.

This paper starts with a brief discussion on possible parallel web server architectures in which an object oriented database can be used, followed by a description of the first phase distributed database design and implementation.

Section snippets

Distributed web server architectures

It is expected that future WWW services will not only be based on HTTP but also on a framework that supports the deployment of distributed object oriented applications in a wide area network such as the Internet. Therefore, at this point, two approaches can be used to introduce parallelism in the web server architecture: take the traditional HTTP-based server and add parallelism or make a shift to the object oriented paradigm and use the concept of distributed objects. Both approaches can

Requirements

The objectives for the distributed database system in general are to make it specifically suited to be embedded in a high-performance and scalable web server architectre. The web server may consist of a heterogeneous set of machines. Database clients must be able to connect to a single node of the distributed database system to execute all its database actions. This allows a client to go only once through the process of setting up a connection with the database and it reduces the total number

Matisse

The Matisse database system is based on the client–server paradigm: client applications connect via a network to the server, where the data is stored, in order to perform actions like object search, data retrieval, data updates, etc. The server, called MTS, supports these actions with a relatively simple set of possible operations on a limited number of server data structures, as defined by the Server Engine Services (SES) API [9]. A client application may need services at a higher level, for

Design

The main idea behind the design is to run an extended Matisse database server as a node server in the distributed database system. The extension must offer functionality to let the node server cooperate in such a way that together they form a distributed database system. This means that the extension must offer a communication mechanism.

The Matisse server is extensible but was not otherwise designed for a distributed environment. An object is referred to by an object identifier (oid) but each

Implementation

The server extension is application-dependent; every application using the distributed database imposes its own functioal requirement on the extension. Two different extensions were implemented. A simple object store, called MP-PCRUD, allowing for create, read, update and delete actions on database objects associated with a unique name is used for testing and performance measurements. It is based on the Matisse CRUD. A simple web object store (WEBDB) is used in the first phase parallel web

Results

Performance tests are executed for the CH, MPPCRUD, and WEBDB. The most important parameters for these tests are the number of database nodes and the object size. Tests for the traditional distributed web server configurations are ongoing.

The CH on the Parytec CC is tested for MPI on the HSlink under EPX (single thread of control) and TCP/IP on a 10 Mbps Ethernet. It indicates a lower throughput and a lower latency for the TCP/IP implementation.

The MPPCRUD object read performance test on the

Future work

To be able to describe the execution characteristics of the system, more measurements must be executed on faster networking harware, i.e. Gigabit Ethernet. Additionally, performance tests will be run for the CH and server extensions based on thread safe MPI.

While the work presented here is a first phase implementation, a future distributed database system may be based on globally unique object identifiers to allow for full object oriented database services. This next phase system would require

Conclusions

By building an initial implementation of the distributed database sysem, hands-on experience has been acquired. It shows that a first phase distributed database system can be built with an acceptable level of functionality, by extending the existing Matisse database server. It also indicates that the Matisse database system offers good support for server extensions. The system can be used in both a traditional and a distributed object-based parallel web server architecture. Performance tests

Acknowledgements

We would like to thank Edouard Duvillier for his support and contributions to the discussions we had on the design of the system. The work presented in this paper is a part of the JERA project funded by the Dutch HPCN foundation.

References (14)

  • T. Berners-Lee, Hypertext Transfer Protocol – HTTP/1.0, RFC 1945, May...
  • National Center for Supercomputing Aplications, The Common Gateway Interface,...
  • Object Management Group, The Common Object Request Broker: Architecture and Specification 2.0, http://www.mog.org/,...
  • Steve Vinoski, CORBA: Integrating Diverse Applications Within Distributed Heterogeneous Environments, IEEE...
  • Sun Microsystems, Java Development Kit 1.1.4 Documentation,...
  • E.D. Katz, M. Butler, R. McGrath, A Scalable HTTP Server: The NCSA Prototype,...
  • T.H. Harrison, D.C. Schmidt, Evaluating the performance of O–O network programming toolkits, C++ Report, SIGS, vol. 8,...
There are more references available in the full text version of this article.

Cited by (0)

View full text