Export Citations
No abstract available.
Proceeding Downloads
Distributed indexing of web scale datasets for the cloud
In this paper, we present a distributed architecture for indexing and serving large and diverse datasets. It incorporates and extends the functionality of Hadoop, the open source MapReduce framework, and of HBase, a distributed, sparse, NoSQL database, ...
A novel approach to multiple sequence alignment using hadoop data grids
Multiple alignment of protein sequences is an essential tool in molecular biology. It aids to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of ...
Beyond online aggregation: parallel and incremental data mining with online Map-Reduce
There are only few data mining algorithms that work in a massively parallel and yet online (i.e. incremental) fashion. A combination of both features is essential for mining of large data streams and adds scalability to the concept of Online Aggregation ...
Extracting user profiles from large scale data
In this work we present the details of a large scale user profiling framework that we developed here in IBM on top of Apache Hadoop. We address the problem of extracting and maintaining a very large number of user profiles from large scale data. We ...
Towards scalable RDF graph analytics on MapReduce
In order to exploit the growing amount of RDF data in decision-making, there is an increasing demand for analytics-style processing of such data. RDF data is modeled as a labeled graph that represents a collection of binary relations (triples). In this ...
SPARQL basic graph pattern processing with iterative MapReduce
There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data ...
Efficient updates for a shared nothing analytics platform
In this paper we describe a cloud-based data-warehouselike system especially targeted to time series data. Apart from the benefits that a distributed storage built on top of a shared-nothing architecture offers, our system is designed to efficiently ...
Parallelizing Random Walk with Restart for large-scale query recommendation
Random Walk with Restart (abbreviated as RWR) has been widely employed in Web search and recommendation systems and several performance enhancement approaches for RWR have been proposed to save storage costs and improve the on-line response time. In ...
Recommendations
WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningThe SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...