Abstract:
Requirements for data storage and processing have reached new levels, with applications relying on the analysis of large amounts of data in order to support everyday life...Show MoreMetadata
Abstract:
Requirements for data storage and processing have reached new levels, with applications relying on the analysis of large amounts of data in order to support everyday life services to end users. Since the costs of maintaining and managing databases are significant, change data capture (CDC) techniques can be used to determine which parts of a data source have changed, and thus assist in the management of large volumes of data in data warehouses. In this paper we investigate a number of CDC techniques suitable for NoSQL databases. CDC techniques can be used to track modifications in a source database, which later can be made available to a target database. Our base system and testbed are based on Apache Cassandra, which is a NoSQL database that offers high performance and scalability. Cassandra is combined with a MapReduce framework, which is used to implement the logic of each CDC technique and is suitable for highly distributed and parallel computing. This paper also presents both a functional comparison of the different CDC techniques, as well as a performance evaluation in a real testbed.
Date of Conference: 06-09 July 2015
Date Added to IEEE Xplore: 15 February 2016
ISBN Information: