skip to main content
10.1145/2882903.2882962acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers

Published: 14 June 2016 Publication History

Abstract

Users in many domains, including urban planning, transportation, and environmental science want to execute analytical queries over continuously updated spatial datasets. Current solutions for large-scale spatial query processing either rely on extensions to RDBMS, which entails expensive loading and indexing phases when the data changes, or distributed map/reduce frameworks, running on resource-hungry compute clusters. Both solutions struggle with the sequential bottleneck of parsing complex, hierarchical spatial data formats, which frequently dominates query execution time. Our goal is to fully exploit the parallelism offered by modern multi-core CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine. We describe AT-GIS, a highly-parallel spatial query processing system that scales linearly to a large number of CPU cores. AT-GIS integrates the parsing and querying of spatial data using a new computational abstraction called associative transducers (ATs). ATs can form a single data-parallel pipeline for computation without requiring the spatial input data to be split into logically independent blocks. Using ATs, AT-GIS can execute, in parallel, spatial query operators on the raw input data in multiple formats, without any pre-processing. On a single 64-core machine, AT-GIT provides 3x the performance of an 8-node Hadoop cluster with 192 cores for containment queries, and 10x for aggregation queries.

References

[1]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow., 6(11), 2013.
[2]
I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. NoDB: Efficient query execution on raw data files. In SIGMOD, 2012.
[3]
J. C. Anderson, J. Lehnardt, and N. Slater. CouchDB: The definitive guide. O'Reilly, 2010.
[4]
H. Andoyer. Cours D'Astronomie. 1909.
[5]
M. Batty. Big data, smart cities and city planning. Dialogues in Human Geography, 3(3), 2013.
[6]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, volume 5, 2005.
[7]
M. Botts, G. Percivall, C. Reed, and J. Davidson. OGC sensor web enablement: Overview and high level architecture. In GeoSensor networks. 2008.
[8]
T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159, 2014.
[9]
J. R. Davis. IBM's DB2 spatial extender: Managing geo-spatial information within the DBMS. IBM Corporation, 1998.
[10]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 2008.
[11]
DEBS Grand Challenge. http://www.debs2015.org/call-grand-challenge.html, 2015.
[12]
M. J. Egenhofer. Toward the semantic geospatial web. In SIGSPATIAL. ACM, 2002.
[13]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce framework for spatial data. In ICDE, 2015.
[14]
Esper Stream Processing Engine. http://www.espertech.com/esper, 2015.
[15]
Y. Fang, M. Friedman, G. Nair, M. Rys, and A.-E. Schmid. Spatial indexing in Microsoft SQL Server 2008. In SIGMOD, 2008.
[16]
G. Garbis, K. Kyzirakos, and M. Koubarakis. Geographica: A benchmark for geospatial RDF stores. In ISWC. 2013.
[17]
B. Gehrels, B. Lalande, M. Loskot, and A. Wulkiewicz. Boost geometry library, 2014.
[18]
GeoCouch. http://github.com/couchbase/geocouch, 2015.
[19]
GeoJSON specification. http://geojson.org/geojson-spec.html, 2015.
[20]
V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE TPDS, 23(12), 2012.
[21]
M. Haklay and P. Weber. OpenStreetMap: User-generated street maps. IEEE Pervasive Computing, 7(4), 2008.
[22]
S. E. Hampton, C. A. Strasser, J. J. Tewksbury, W. K. Gram, A. E. Budden, A. L. Batcheller, C. S. Duke, and J. H. Porter. Big data and the future of ecology. Frontiers in Ecology and the Environment, 11(3), 2013.
[23]
S. I. Hay, D. B. George, C. L. Moyes, and J. S. Brownstein. Big data opportunities for global infectious disease surveillance. PLoS Med, 10(4), 2013.
[24]
E. G. Hoel and H. Samet. Data-parallel spatial join algorithms. In ICPP, 1994.
[25]
S. Holl and H. Plum. PostGIS. GeoInformatics, 3(2009), 2009.
[26]
M. Isard and Y. Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD, 2009.
[27]
M. Isenburg and P. Lindstrom. Streaming meshes. In IEEE Visualization, 2005.
[28]
T. Johnson, S. Muthukrishnan, and I. Rozenbaum. Monitoring regular expressions on out-of-order streams. In ICDE, 2007.
[29]
S. J. Kazemitabar, U. Demiryurek, M. Ali, A. Akdogan, and C. Shahabi. Geospatial stream query processing using Microsoft SQL Server StreamInsight. Proc. VLDB Endow., 3(1--2), 2010.
[30]
A. Koliousis, M. Weidlich, R. C. Fernandez, A. Wolf, P. Costa, and P. Pietzuch. SABER: Window-based hybrid stream processing for heterogeneous architectures. In SIGMOD, 2016.
[31]
J. Kong, L. A. Cooper, F. Wang, D. Gutman, J. Gao, C. Chisolm, A. Sharma, T. Pan, E. G. Van Meir, T. M. Kurc, et al. Integrative, multimodal analysis of glioblastoma using TCGA molecular data, pathology images, and clinical outcomes. IEEE Trans. on Biomedical Engineering, 58(12), 2011.
[32]
R. K. V. Kothuri, S. Ravada, and D. Abugov. Quadtree and R-tree indexes in Oracle Spatial: A comparison using GIS data. In SIGMOD, 2002.
[33]
J.-G. Lee and M. Kang. Geospatial big data: Challenges and opportunities. Big Data Research, 2(2), 2015.
[34]
X. Liu, J. Han, et al. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. In CLUSTER, 2009.
[35]
W. Lu, K. Chiu, and Y. Pan. A parallel approach to XML parsing. In Grid Computing, 2006.
[36]
A. Meduna. Finite and pushdown transducers. In Automata and Languages. 2000.
[37]
V. Mische. GeoCouch: A spatial index for CouchDB. Presentation at FOSS4G, 2010.
[38]
MonetDB GeoSpatial. https://www.monetdb.org/Documentation/Extensions/GIS, 2015.
[39]
MySQL 5.0 Reference Manual (11.5. Extensions for Spatial Data). https://dev.mysql.com/doc/refman/5.0/en/, 2015.
[40]
T. T. Nguyen. Indexing PostGIS databases and spatial query performance evaluations. International Journal of Geoinformatics, 5(3), 2009.
[41]
P. Ogden, D. Thomas, and P. Pietzuch. Scalable XML query processing using parallel pushdown transducers. Proc. VLDB Endow., 6(14), 2013.
[42]
Open Geospatial Consortium, Simple feature access specification. http://www.opengeospatial.org/standards/sfa, 2015.
[43]
OpenDStreetMap XML format. http://wiki.openstreetmap.org/wiki/OSM_XML, 2015.
[44]
OpenStreetMap mirror, 2015/05/18. ftp://ftp.spline.de/pub/openstreetmap/planet/2015/planet-150518.osm.bz2, 2015.
[45]
Oracle Corporation. Oracle Spatial and Graph: Advanced data management. 2014.
[46]
Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. In IPDPS, 2008.
[47]
A. Papadopoulos and Y. Manolopoulos. Parallel bulk-loading of spatial data. Parallel Computing, 29(10), 2003.
[48]
J. M. Patel and D. J. DeWitt. Partition-based spatial-merge join. In SIGMOD, volume 25, 1996.
[49]
F. Peng and S. S. Chawathe. XPath queries on streaming data. In SIGMOD, 2003.
[50]
RapidJSON. https://github.com/miloyip/rapidjson, 2015.
[51]
S. Ray, B. Simion, and A. Demke Brown. Jackpine: A benchmark to evaluate spatial database performance. In ICDE, 2011.
[52]
S. Shekhar, V. Gunturi, M. R. Evans, and K. Yang. Spatial big-data challenges intersecting mobility and cloud computing. In MobiDE, 2012.
[53]
A. Silberschatz, H. Korth, and S. Sudarshan. Database systems concepts. McGraw-Hill, Inc., 6 edition, 2010.
[54]
B. Simion, D. N. Ilha, A. D. Brown, and R. Johnson. The price of generality in spatial indexing. In BigSpatial, 2013.
[55]
Simple API for XML. http://sax.sourceforge.net/, 2015.
[56]
R. Sinya, K. Matsuzaki, and M. Sassa. Simultaneous finite automata: An efficient data-parallel model for regular expression matching. In ICPP, 2013.
[57]
H. Sutter. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb's journal, 30(3), 2005.
[58]
O. Tange et al. Gnu parallel: The command-line power tool. The USENIX Magazine, 36(1), 2011.
[59]
M. Vermeij, W. Quak, M. Kersten, and N. Nes. MonetDB: A novel spatial column store DBMS. In FOSS4G, 2008.
[60]
F. Wang, J. Kong, L. Cooper, T. Pan, T. Kurc, W. Chen, A. Sharma, C. Niedermayr, T. Oh, D. Brat, A. Farris, D. Foran, and J. Saltz. A data model and database for high-resolution pathology analytical image informatics. Journal of Pathology Informatics, 2(1), 2011.
[61]
L. Xiao and Z. Wang. Internet of things: A new application for intelligent traffic monitoring system. Journal of networks, 6(6), 2011.
[62]
C.-H. You and S.-D. Wang. A data parallel approach to XML parsing and query. In HPCC, 2011.
[63]
Q. Zhou and J. Zhang. Research prospect of Internet of Things geography. In Geoinformatics, 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. JSON
  2. NODB
  3. XML
  4. multi-core CPUs
  5. parallel automata
  6. spatial query processing

Qualifiers

  • Research-article

Funding Sources

  • Engineering and Physical Sciences Research Council

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 355
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media