Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Motivation

The number of services that are provided on the Internet increases continuously. Concepts like “Big Data” and “cloud computing” play an increasingly important role. With Cloud computing, resources can be used more effectively. This allows more services to be operated with the same number of servers. This reduces the costs of procuring the servers that are necessary for the operation of new services. Furthermore, cloud computing offers the advantage that it is possible that resources can be added dynamically to the current task. Thus applications can be scaled-up better. Consequently, cloud computing is one of the most effective ways to provide scalable and robust services.

Availability plays an increasingly important role in the provision of services. This can only limited be realized in a simple client-server system. This is because that in such a case, there is always a “single point of failure”. Furthermore, the data is constantly growing. This increases the need for a dynamic “growing” system. Cloud computing offers a good alternative to a standard client-server system. It provides the ability to scale applications and to achieve a higher availability. The problem is to make an existing system scalable. Still, the system should have a high availability. In addition, various client applications should use this system. It is therefore necessary to define a “clean” system API is. At the same time the system should be easily expandable.

The aim of this work is the extension of an existing implementation of a geo-server system. The system should be scalable and additionally have high availability. These are the first steps on the way to a cloud system. For this purpose an API must be developed, which individual client applications can access. To meet the scalability other ways of data persistence must be taken into consideration. We present different scenarios for the system. Full results can be found in [6].

2 Cloud Computing

In cloud computing (CC) IT resources and services on the Internet are provided. So far is meant by cloud computing outsourcing of IT services. Meanwhile, many companies, such as Amazon, IBM, Google has established itself as a platform provider in the cloud computing market. There are also many companies that rely on their own applications and consulting services.

Generally, cloud computing is an IT development, deployment and distribution model, which makes it possible to provide services, products and solutions over the Internet in real time [1, 9]. The term cloud is defined as a large collection of easily usable and accessible resources (such as hardware, platforms or services). In order to allow optimal utilization of resources, these can be adapted dynamically to variable loads and configured accordingly. These models are based on consumption-based billing models. The assurance in the form of service level agreements (SLAs) will be covered within the infrastructure. Cloud Camputing ultimately forms the technical platform to offer cloud services with a consumption-based billing. This includes, for example, infrastructure, system and application software [10, 15]. The NIST definition [12] of cloud computing in this case provides the following five characteristics: On-demand Self-Service, Broad Network Access, Resource Pooling, Rapid Elasticity, and Measured Service.

These services can be provided through a variety as a Service models. Here, three models have been developed [4, 12]: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Providing cloud application provides the seller several advantages for compliance / achievement of specific service-level agreement (SLA) or service agreement (DLV) [2, 18]: availability, scalability, redundancy, and a fault-tolerant and robust behavior of the system.

Despite the high popularity of cloud computing still exist challenges for developing cloud applications. For the persistence of data, there is no uniform APIs to address the database systems. Thus, each developer is forced to provide their own database management system. The same problem exists for the APIs of the services (PaaS) [2].

3 Distributed Database

Usually there are applications that access a database on a computer. These requests are started from there. Unlike old known relational database systems, the approach of distributed databases going in a different direction. Here, the database is distributed to several computers in a system. Many software applications require access to data stored. The distributed database systems provide application programs and users to access interface, as on a central database. In order to achieve transparency of the distribution, however, from the side of the database software, to manage a number of technical problems. The use of distributed database systems has some advantages. They allow an adjustment of the system structure of an organizational structure, without affecting the database property. Through the interaction of database systems to application programs and users all the factors of the distribution remain hidden. There is no change in the access interface for them. This distribution transparency is the great advantage of distributed database systems. Furthermore, the distribution of data from the database system of distributed database is removed. In distributed database systems increases performance because multiple servers for processing the data are available. Thus, the response times are shortened. A further advantage is that in a computer failure can continue to access the database.

3.1 CAP Theorem

The CAP Theorem or also called Brewer theorem states that may be satisfied simultaneously only two of the following three characteristics: Consistency (all nodes have the same data at the same time), Availability (all requests that are sent to the system get, always an answer), and Partition tolerance (when a part of the system has a malfunction, the system must not collapse). A proof of this theorem is provided in the work of Gilbert and Lynch [7].

3.2 ACID vs BASE

For the persistence of data, there are several concepts. ACID The concept, which is mainly used for relational databases is defined as follows: 1. Atomic: A transaction is completed when all the operations are completed, otherwise a rollback is performed. By conducted rollback the database is to achieve a consistent state. 2. Consistent: A transaction may not lead to the collapse of the database. However, should this be the case, so the operation is not permitted and it is carried out a rollback. 3. Isolated: All transactions are independent of each other and can not influence each other. 4. Durable: If a transaction has been executed successfully, should it be guaranteed that the data is permanently stored in the database. This must also be the case when a system error occurs.

Compared to the ACID concept BASE concept for scalable, distributed database used by abandoning the consistency of the data. BASE is in this case composed of the following terms: Basically availiable: All data are available, even if an error exists in the system. Soft State: The states of the system may change over time, even if no transaction is executed on the system. Eventually consistent: The system guarantees, when data is not consistent, that they will after a time. Thus, BASE relies on a highly available system, the abandonment of the consistency of the data, after each transaction. This concept is used by many distributed databases used, including in NoSQL databases.

3.3 NoSQL

called “Not only SQL” or “noseequel” [11], is a database approach, which relies on the distribution of databases instead of keeping everything on a central database server. In 1998, an open-source database was introduced that did not provide SQL access options. Due to the lack of interfaces it did the name “NoSQL”. The special thing about it was not the lack of interface, but approaches that have broken the relational database concepts. After a short time the term was forgotten. 2009 the term NoSQL came when Eric Evans was looking for a name for a distributed open source DBMS. Since the beginning of 2010, the name “Not only SQL” established in the Community. Database management systems (DBMS) are now regarded as a NoSQL system, if they have the property of a horizontal scaling. Here, no-SQL databases differ in types [8, 16]: Key-Value, Document, Column-family/BigTable, and Graph.

3.4 MongoDB

MongoDB is developed in C ++ as an open source project and was published in the year of 2009. MongoDB is a NoSQL database that falls under the type of document persistence. The documents are in so-called “Collections” grouped. In the document the data as BSON (Binary - JSON) are stored. Whereas indexation MongoDB uses the “_id” field and in addition also generates a unique index (the unique id). These indexes are then held by MongoDB as B-tree structure.

MongoDB supports using replica sets, only the load distribution for the reading of data. The master alone is responsible for writing. This offers the advantage that the master alone on the persistence of data is responsible. Likewise awards the master is the only new ID’s when writing new records. Thus, in this process does not occur in collisions Replica Set. Reading on the other hand, there are several strategies to make a load distribution. These are: Primary-Only: In this strategy, the Primary for the reading and writing of all data is responsible. Thus, it acts as a normal client / server system. The slaves in the replica set to work only as a backup in case the primary fails. Primary and Secondary: This allows all the users that has permission to read, in the Replica Set read the data. Thus, a uniform distribution of the read accesses are achieved. Secondary-Only: In the Secondary-only strategy all read accesses are distributed to the secondaries. Thus, the Primary is relieved and is purely responsible for writing the data.

Sharding is a method for distributing data across multiple servers. This provides the possibility to persist a large number of data (Big Data) in the system effectively. Sharding can also be used in combination with Replicas. In this case, both a data distribution as well as a data replication can be achieved. However, this requires a high number of servers (> 6) to operate this application useful.

4 Service API

The Internet and its data is steadily increasing. Distributed systems reach more and more popularity among the service providers. In the growing process of distributed systems some interfaces have been developed. These were designed to part for specific problems and are therefore difficult reusable. Others can be expanded only part. Thus standardization measures have formed in time. The interfaces can be divided into two architectures. Both architectures have the aim to decouple the client from the server. Thus, the two systems can be developed independently.

1. Service Oriented Architecture (SOA). In SOA, the focus is on performing actions on the server as a function. Here messages are sent with the desired call to a service endpoint. This message is then interpreted and routed to the appropriate instance. This approach has been around longer outside of the Web. Due to the long existence of SOA already several standards have been established. These can be highly optimized, but are relatively cumbersome expandable.

2. Resource Oriented Architecture (ROA). Unlike SOA are the resources in the front point in ROA. Here no services are addressed, but directly addressed any requests to the resources. On this resource base operations can be defined. The operations may be extended to the addressed resource. This involves a simple expansion of the interface, since the functions can be implemented separately from other resources. The most famous ROA technology is Representational State Transfer (REST), on which we will discuss in Subsect. 4.2.

To explain the choice of API architecture, briefly the pros and cons of SOAP and REST are attached explained. SOAP and REST interfaces are the most used.

4.1 SOAP

SOAP defines a messaging architecture that is based on XML. The XML schema is used to interpret SOAP messages to the endpoints (unmarshall) and to create queries (marshall). The Web Services Description Language (WSDL) is an interface description language. Its purpose is to define Web service interfaces. It indicates, for example, which operations the client can perform. Given this description SOAP requests can be created and sent to the server.

Advantages. SOAP and WSDL have a good use in heterogeneous middleware systems because of their complexity. The advantage is the transparency and independence of systems to one another. Interfaces can be defined, and must not follow appropriate standard. Furthermore, both synchronous and asynchronous connections are supported.

Disadvantages. Due to the high freedom and thickness often occur interoperability problems with different systems. Also has SOAP XML due to performance problems that are discussed in Sect. 4.2 detail. Furthermore, the creation of Web Services with stable marshalling not trivial and takes a lot of time [13].

4.2 REST

Representational State Transfer (REST) was originally developed to create large scalable distributed hypermedia systems [5]. REST case has four basic characteristics: Addressability of resources by URI, Uniform interface, Statelessness, and Support for multiple representations

Advantages. REST is a combination of several existing standards (HTTP, XML, JSON, URI, MIME), which can be used easily and quickly. Thus, the cost of implementation of RESTful Web services will be lower than what the SOAP. Furthermore REST supports the building of dynamic websites. Due to the unique identification of resources and the stateless access RESTful web services using scalable caching and load balancing.

Disadvantages. A problem that may occur by the strict separation of POST and GET is that certain requests may be for the URL is too long. Another challenge is to meet the client authentication.

4.3 HATEOAS

“Hypermedia as the Engine of Application State” [5] is a design principle for REST APIs. Here, the idea is as follows: “The client thus moves through a set of pages; what this may be, is set by the server and thus limits; which are requested specifically, the client (or its user) decides. At any time the resources of the server have a defined status ” [17]. The URIs to resources is passed as “href” attribute. The relation to this resource will be supplied as “rel”. Other attributes of the resources are dealt with separately in different description languages.

For HATEOAS there are already some description languages. Basically, individual markup languages can be divided into two categories:

XML (Extensible Markup LanguageFootnote 1) results from the Standard Generalized Markup lanaguage (SGML) language. Here were some of the design decisions, such as “XML shall be straightforwardly usable over the Internet.” And “XML shall support a wide variety of applications.”. Because of the extensibility of XML, it is used today in many areas. One is the presenting of resources. In the source code 1, an example is shown that a resource layer is in XML form. JSON (JavaScript Object Notation) is a simple data exchange format that is easily readable for humans and machines. JSON uses key-value pairs and provides a simple display of objects. Parsing of objects is up to a hundred times faster than XML (http://json.org/). To the detriment belongs the poor extensibility of JSON. Furthermore, JSON provides no validation options. There are other markup languages that define multiple attributes and build on JSON.

HAL (Hypertext Application Language) is a by Mike Kelley developed standard, which is used for web APIs. He himself describesFootnote 2 it as “HAL is a generic media type with Which Web APIs can be developed and exposed as series of links. Clients of thesis APIs can select left by Their link relation type and traverse them in order to progress through the application.”

Siren, “a hypermedia specification for representing entities” provides the same functionality as HAL. Siren also offers the possibilities to define entities sent as classes. Additionally, the links are broken. A distinction is made between actions and links to other resources. In the Actions is additionally defined, assumes what types of data the server. These are the standard HTML5 - specify input types.

Collection + JSON is a JSON-based read / write hypermedia-type designed to support management and querying of simple collections. Just as HAL and Siren supports Collection + JSON hypermedia types. Unlike the other two supports Collection + JSON queries. In this case, in addition in addition to the “href” and “rel” attribute, also indicated with data that can be sent to the server. It answers the Collection + JSON serve as templates for new requests to the server.

Comparison. Compared between JSON and XML JSON offers several advantages for using REST interfaces over XML. XML is used primarily for SOA interface description. JSON, however, is already used in many REST interfaces. Furthermore JSON due to the origin provides a JavaScript support, which many web applications are developed. Due to the ease of implementation, the faster processing time by JSON [14] and better support in Spring Framework3 a JSON representation is supported first. Because of HATEOAS approach can fetch additional forms of presentation, such as HAL or XML, are added.

5 Existing System: CityServer3D

“Our world is becoming increasingly recognized in three dimensions. 3D computer models play an increasingly important role in urban planning, tourism and knowledge transfer. By CityServer3D it is first possible to use 3D city models alive. The software can manage two and three dimensional geographic data and link together. The CityServer3D automatically creates three-dimensional models and so performs simulations in the 3D world.” Here, describing the product itself as follows: “The technology of CityServer3D consists of a geo-database, a server with numerous interfaces for import and export of data and applications for the development of the landscape models. A management software allows to process the data and the web viewer brings these internet users to the screen Fig. 1.”

Fig. 1.
figure 1

The existing CityServer3D - System.

Due to the prolonged existence of the product, relational databases were used at the beginning. Over time, the distributed databases began to play a greater role. Thus, already first basic elements for the use of MongoDB were laid. This created a MongoDB driver MVCC (Multi Version concurrency control) supports. Thus, already a first step towards data distribution was done. Currently a total of 5 databases are supported, including MongoDB and MySQL.

At the moment a number of API’s are used to CityServer3D. This was due to the development of the Web. It always more technologies have been developed to have accumulated at the end of a set of APIs with different technologies. Among them are, for example, JSF calls or REST interfaces.

A number of data are necessary for the pure operation of the server. The individual display levels are called “Layer” shown. This may be certain neighborhoods or different heights, as above-ground structures or underground structures. Each layer consists of so-called “features”. This is a group of models and metadata. This can for example be building complexes, which consist of several blocks, which in turn are a feature. Furthermore, the information is stored on the features as “metadata”. It can be stored a few details, such as “Year” or “style”. The individual city models are then persists as a “model”. This finally have a set of images (Image) that represent the model eventually.

6 System Availability

The availability is a measure, by increasing the availability of the system is measured in percent. The availability is a quality criterion and is therefore defined as a property in “Service Level Agreement” (SLA). Experiments [6] show that two MongoDB servers have a negative impact on the availability. This is because at least half of the MongoDB server must be accessible in order to elect a new Primary server. If in two servers, an unreachable (for example, when a network error), so you can edit both of no requests. A high availability of MongoDB component can already be achieved by three servers, as long as the availability of the server is not under 99 % (normal availability) lies. It would also be sufficient to operate a MongoDB server that is itself highly available (99.999 %). High availability is achieved when at least two CityServer3D and three MongoDB servers are used in the system.

Fig. 2.
figure 2

The CityServer3D - System.

7 Extension of City Server Systems

To make the system capable cloud that had properties: robustness, scalability, and availability are achieved. The following should model was being considered, such as found in Fig. 2. To achieve this, the following points were addressed:

The first objective was to obtain the persistence of data robust and scalable. So I put a MongoDB because it is one of the most common distributed databases and the properties are thus achieve robustness, scalability and availability of well [3]. In order to achieve the robustness of the persistence, the MongoDB Replica Set were used. The data in the replica set to replicate to every MongoDB server. Thus was achieved a high availability of data, which was shown in the Sect. 6 In addition MongoDB provides the ability to allow a load distribution when reading the data. This has the advantage that in data reading under load takes place a faster response time (see Sect. 3). If no more resources to be free to handle the load, more MongoDB servers can be added in replica set. Thus scaled the persistence of CityServer3D system. Thus, the persistence of CityServer3D system meets the requirement to a cloud system.

The MongoDB accessed through the MongoMVCC plugin from the IGD. This in turn used the official MongoDB Java driver. The driver provides the advantage that in the future, old / overwritten data could be read. But this is not yet implemented and will be discussed further in subsect. 10.6. Furthermore, they offer MongoDB driver an iterative learning about MongoDB system. For access to the distributed database only a single arbitrary server is necessary. The MongoDB client automatically learns about the MongoDB server know the network and can access on the other in case of failure of a MongoDB server, due to the iterative learning over the network.

Another step towards the realization of cloud services is defining the API. In order to enable a distributed access to the system, a uniform service API is defined. This service API is used to access the PaaS interface. Since there are no uniform standards for cloud APIs [2], different approaches have been presented and compared in the Sect. 4 The aim is to develop a stateless API in order to better isolate the requests between client and server. The Service API to be easily expanded to later develop client applications can use this API. Thus, the PaaS product in combination with client applications can be offered as a SaaS product.

8 Evaluation of the System

For the experiment, the VMware cluster at the Fraunhofer IGD is used. This five Virtual Machines (VM), each with 2 CPU cores, 4 GB RAM and 15 GB of disk space used. On a VM both CityServer3D, as well as a MonogDB (v2.6.4) instance can be operated. The operating system used is “Ubuntu” in the version of “4.14 LTS (GNU / Linux 3.13.0-37-generic x86_64)”. Therefore for each scenario different constellations of instances on the VM to operate. Read more in the following chapter. The tests are run on a VM, the computer-operation Group (RBG) of the computer science area of the Technical University of Darmstadt. To determine the bandwidth between the two servers, a 1 GB file was sent to ten times the respective servers. Here, an average transfer rate of 44.8 MB/s between the test system and the VMware cluster revealed. Here, a maximum data transfer rate of 65.3 MB/s and a minimum speed of 27.4 MB/s was achieved. In the other direction, an average speed of 43.6 MB/s was achieved. In this case, the minimum value was 38 B/s and a maximum at 57.2 MB/s. The latency of 9.8 ms in both directions.

Scenarios: To check the behavior of the system, four scenarios were tested: S1: A simple server system with a database and a CityServer3D (1:1), S2: A server system with a CityServer3D and multiple databases (1:4), S3: Several CityServer3D with a database (4:1), and S4: Several CityServer3D with multiple databases (5:5).

Tests: Three different tests were carried out on each scenario. Here, the tests differed in the visits to the data. In the first test (Complete) the information has been retrieved from the server. The second test (Metadata) the metadata of all available data sets were queried. This represents the display of buildings and objects information. In the third test (model) all textures and models have been downloaded from the server. The model test represents loading the data to display a city. Here all information has been loaded, which were necessary for the display. The metadata of the buildings were not also loaded. To get confirmed, the results of each test was repeated three times (trials). Here, the first attempt was always compared.

User Number: In a normal operating environment, a number of 10 users are usually achieved. In weddings, it can also be the 20th To test the scaling of the system better, a maximum of 50 users have been simulated. Here each test started with a simulated users. Every 36 s was the User number increased by one. This was done as long until the number of users 50 min. Subsequently, the number of simulated users held for a further 120 s. Thus, a test was run a total of 32 min. After the test was terminated.

Utilized Program: The tests were performed using one of JMeter7. If the users were simulated. Each simulated users running as a separate thread on the RBG-VM. Each of these threads held exactly one active request to the server open. Once the thread was told that the message has been parsed later and it opened up the next requests.

Fig. 3.
figure 3

All scenarios.

Inquire: First, all the layers were always requested. Then, the client received a list of available layers, which he queried afterwards. In response, the query to the models- and feature metadata addresses that were requested directly afterwards were. The same thing happened for the Models. Plus provided the models for the address of pictures from the model, which were eventually charged. Standen in the test several CityServer3D available so random (uniform distribution) was chosen a CityServer3D for each request.

Assumptions: To perform the test, the following assumptions were used: Acceptance: All addresses of the available CityServer3D are known to all clients. Adoption: For each request is selected a random available (under the uniform distribution) CityServer3D.

Scenarios: At first glance, the big difference between scenarios 1 and 2 versus 3 and 4 can be clearly seen in Fig. 3. Due to the high capacity utilization in the first two scenarios of CityServer3D these scenarios could not keep up with the results of the last two scenarios. The fourth scenario scored a slower results in terms of response time, but is compared to the most robust. The availability of the scenarios can be found in Table 1. Compared to the first and third scenario, the scalability of the CityServer3D can be seen. Three additional servers improved the system to an average response time to 336 %, or more than three times.

Hypotheses: Based on the number of tests that were carried out in the different scenarios, we created a number of hypotheses in advance. (H1) In the first and second scenario CityServer3D will achieve a CPU utilization of 100 %. Thus, the response time will increase: We found that the CityServer3D already achieved a 75 % occupancy with a single user. In the following tests was simulated with up to 50 users. So a 100 % CityServer3D component were achieved in scenario 1 and 2 as early as two users. In the third and fourth scenario was achieved due to the distribution of these amounts only from a number of ten simulated users. Despite the high utilization, the system did not collapse. (H2) Due to the high utilization of CityServer3D in the second scenario, no improvement over the first scenario is achieved: This is what the test results showed. This is mainly due to the CPU utilization of CityServer3D component. In order to achieve a power ramp, the implementation of the CityServer3D system would have to be checked. (H3) The third and fourth scenario will provide a better response time than the first two scenarios: Due to the distribution of CityServer3D significant improvements compared to the first and second scenario were obtained.

Table 1. Availability in %

9 Conclusion

By analyzing the system it becomes clear which benefits distributed applications in the cloud can have. This requires a distribution of every component, but rewards the operator with a highly available service. In this work, a model for determining the availability of a distributed system was introduced. This model has been evaluated for CityServer3D, with the use of a distributed database (MongoDB). A high availability (>99.999 %) can be achieved with less about 5 servers, assuming a normal availability of each server. In addition, both components can scale: increasing number of users as well as growing volumes of data. This allows the entire system to be provided as “Platform as a Service” in the cloud. Furthermore, different ways of cloud API designs were presented. These mainly provide the opportunity to develop more cloud applications for the end user. Thus, the entire system can be implemented as a “Software as a Service” in the cloud.