skip to main content
10.1145/2534645.2534655acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Toward a data scalable solution for facilitating discovery of scientific data resources

Published: 18 November 2013 Publication History

Abstract

Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of "data scaling" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.

References

[1]
T. Berners-Lee, R. T. Fielding, and L. Masinter. Uniform resource identifier (URI): Generic syntax. RFC3986, IETF, Jan. 2005. http://tools.ietf.org/html/rfc3986.
[2]
P. Boncz and M.-D. Pham. BSBM V3.1 results (April 2013). Online, Apr. 2013. http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/index.html.
[3]
M. Cai and M. Frank. RDFPeers: A scalable distributed RDF repository baesd on a structured peer-to-peer network. In Proceedings of the 13th International World Wide Web Conference, 2004.
[4]
V. G. Castellana, A. Tumeo, O. Villa, D. Haglin, and J. Feo. Composing data parallel code for a SPARQL graph engine. In ASE/IEEE International Conference on Big Data, Sept. 2013.
[5]
S. Ceri, G. Gottlob, and L. Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1): 146--166, 1989.
[6]
E. L. Goodman, E. Jimenez, D. Mizell, S. al Saffar, B. Adolf, and D. Haglin. High-performance computing applied to semantic databases. In Proceedings of the 8th Extended Semantic Web Conference, 2011.
[7]
S. Harris, N. Lamb, and N. Shadbolt. The design and implementation of a clustered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2009.
[8]
S. Harris, A. Seaborne, and E. Prud'hommeaux. SPARQL 1.1 query language. W3C Recommendation, W3C, Cambridge, MA, Mar. 2013. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/.
[9]
A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A federated repository for querying graph structured data from the web. In Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007.
[10]
G. Klyne, J. J. Carroll, and B. McBride. Resource description framework (RDF): Concepts and abstract syntax. W3C Recommendation, W3C, Cambridge, MA, Feb. 2004. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/.
[11]
S. Kotoulas, J. Urbani, P. A. Boncz, and P. Mika. Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig. In Proceedings of the 11th International Semantic Web Conference, pages 247--262, 2012.
[12]
E. Liarou, S. Idreos, and M. Koubarakis. Continuous RDF query processing over DHTs. In Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007.
[13]
National Aeronautics and Space Administration. Global change master directory. Online, 2013. Version 9.9. http://gcmd.nasa.gov/.
[14]
E. Oren, C. Gueret, and S. Schlobach. Anytime query answering in RDF through evolutionary algorithms. In Proceedings of the 7th International Semantic Web Conference, 2008.
[15]
A. Schätzle, M. Przyjaciel-Zablocki, and G. Lausen. PigSPARQL: mapping SPARQL to pig latin. In Proceedings of the International Workshop on Semantic Web Information Management, 2011.
[16]
M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In Proceedings of the 13th International Conference on Database Theory, pages 4--33, 2010.
[17]
J. Weaver. A scalability metric for parallel computations on large, growing datasets (like the web). In Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems, 2012.
[18]
J. Weaver and G. T. Williams. Scalable RDF query processing on clusters and supercomputers. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2009.

Cited By

View all
  • (2016)Semantic catalog of things, services, and data to support a wind data management facilityInformation Systems Frontiers10.1007/s10796-015-9546-518:4(679-691)Online publication date: 1-Aug-2016
  • (2015)Deep web scientific sensor measurements usage: A standards-based approach2015 International Conference on Collaboration Technologies and Systems (CTS)10.1109/CTS.2015.7210386(1-2)Online publication date: Jun-2015
  • (2014)Toward a data scalable solution for facilitating discovery of science resourcesParallel Computing10.1016/j.parco.2014.08.00240:10(682-696)Online publication date: 1-Dec-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
November 2013
66 pages
ISBN:9781450325066
DOI:10.1145/2534645
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13

Acceptance Rates

DISCS-2013 Paper Acceptance Rate 10 of 19 submissions, 53%;
Overall Acceptance Rate 19 of 34 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Semantic catalog of things, services, and data to support a wind data management facilityInformation Systems Frontiers10.1007/s10796-015-9546-518:4(679-691)Online publication date: 1-Aug-2016
  • (2015)Deep web scientific sensor measurements usage: A standards-based approach2015 International Conference on Collaboration Technologies and Systems (CTS)10.1109/CTS.2015.7210386(1-2)Online publication date: Jun-2015
  • (2014)Toward a data scalable solution for facilitating discovery of science resourcesParallel Computing10.1016/j.parco.2014.08.00240:10(682-696)Online publication date: 1-Dec-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media