It is our great pleasure to welcome you to the Sixth International Workshop on Data-intensive Distributed Computing (DIDC 2016), which is held in conjunction with the International ACM Symposium on High Performance Distributed Computing (HPDC 2016).
The data needs of scientific as well as commercial applications from a diverse range of fields have been increasing exponentially over the recent years. Digital data generated from various sources such as scientific instruments, sensors, internet transactions, email, video and click streams can be large, diverse, longitudinal and distributed which poses new challenges and requirements for offline and real time processing where extraction of meaningful information can open novel application areas and lead to new breakthroughs. This data deluge and the increase in the demand for large-scale data processing has necessitated collaboration and sharing of data collections among the world's leading education, research, and industrial institutions and use of distributed resources owned by collaborating parties. In a widely distributed environment, data is often not locally accessible and has thus to be remotely retrieved and stored. While traditional distributed systems work well for computation that requires limited data handling, they may fail in unexpected ways when the computation accesses, creates, and moves large amounts of data especially over wide-area networks. Further, data accessed and created is often poorly described, lacking both metadata and provenance. Scientists, researchers, and application developers are often forced to solve basic data-handling issues, such as physically locating data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis. Although many efforts have been made to develop new programming paradigms and models that can handle the data needs of the application automatically, the results are far from being optimized.
DIDC focuses on the challenges imposed by data-intensive applications on distributed systems, and on the different state-of-the-art solutions proposed to overcome these challenges. It brings together the collaborative and distributed computing community and the data management community in an effort to generate productive conversations on the planning, management, and scheduling of data handling tasks and data storage resources
This year's workshop continues with the tradition of gathering distinguished speakers and providing a diverse program with a variety of topics ranging from data staging and indexing models for data-intensive applications to high-performance genomics and Cloud scheduling.
Proceeding Downloads
Towards Convergence of Extreme Computing and Big Data Centers
Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of ...
Minimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud
Scheduling jobs with deadlines, each of which defines the latest time that a job must be completed, can be challenging on the cloud due to the incurred costs and unpredictable performance. This problem is further complicated when there is not enough ...
Experiences with Performing MapReduce Analysis of Scientific Data on HPC Platforms
The growing interest in being able to apply Big Data techniques to scientific data generated using HPC simulations led to the question of whether this is achievable on the same HPC platform, and if so, what is the performance that can be obtained on ...
Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations
A growing number of "second generation" high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O ...
Efficient and Scalable Workflows for Genomic Analyses
- Subho S. Banerjee,
- Arjun P. Athreya,
- Liudmila S. Mainzer,
- C. Victor Jongeneel,
- Wen-Mei Hwu,
- Zbigniew T. Kalbarczyk,
- Ravishankar K. Iyer
Recent growth in the volume of DNA sequence data and associated computational costs of extracting meaningful information warrants the need for efficient computational systems at-scale. In this work, we propose the Illinois Genomics Execution Environment ...
Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows
- Melissa Romanus,
- Fan Zhang,
- Tong Jin,
- Qian Sun,
- Hoang Bui,
- Manish Parashar,
- Jong Choi,
- Saloman Janhunen,
- Robert Hager,
- Scott Klasky,
- Choong-Seock Chang,
- Ivan Rodero
Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex ...
SIDI: A Scalable in-Memory Density-based Index for Spatial Databases
With wide-spread use of location-based services, spatial data is becoming popular. As the data is usually huge in volume and continuously arriving to the storage in real-time, designing systems for efficiently storing this type of data is challenging. ...
Index Terms
- Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
DIDC '14 | 12 | 7 | 58% |
Overall | 12 | 7 | 58% |