Algebraic Optimization of RDF Graph Pattern Queries on MapReduce

ABSTRACT

The growing success of the Semantic Web and Web of data initiatives has ushered in the era of “Big Semantic Web Data.” Data sets such as the Billion Triple Challenge [2] are in the order of billions of triples and scientic data collections like the Open Science Data cloud [3] are approaching petabyte scale. A crucial question now is how to meet the scalability challenges of processing such data collections. Further, emerging applications are introducing nontraditional scalability requirements where scalability needs are elastic, varying signicantly at different periods. For example, a biologist may want to analyze their protein data by linking to other publicly available related data. This data maybe from their domain, or other domains, for example, data about chemical compounds for helping interdisciplinary research is increasingly demanding such holistic perspectives on data.