skip to main content
research-article

GeoGraph: A Framework for Graph Processing on Geometric Data

Published: 06 June 2021 Publication History

Abstract

In many applications of graph processing, the input data is often generated from an underlying geometric point data set. However, existing high-performance graph processing frameworks assume that the input data is given as a graph. Therefore, to use these frameworks, the user must write or use external programs based on computational geometry algorithms to convert their point data set to a graph, which requires more programming effort and can also lead to performance degradation.
In this paper, we present our ongoing work on the Geo- Graph framework for shared-memory multicore machines, which seamlessly supports routines for parallel geometric graph construction and parallel graph processing within the same environment. GeoGraph supports graph construction based on k-nearest neighbors, Delaunay triangulation, and b-skeleton graphs. It can then pass these generated graphs to over 25 graph algorithms. GeoGraph contains highperformance parallel primitives and algorithms implemented in C++, and includes a Python interface. We present four examples of using GeoGraph, and some experimental results showing good parallel speedups and improvements over the Higra library. We conclude with a vision of future directions for research in bridging graph and geometric data processing.

References

[1]
Pargeo, an open source library for parallel algorithms in computational geometry. https://github.com/ wangyiqiu/pargeo, 2021.
[2]
David J. Aldous and Julian Shun. Connected Spatial Networks over Random Points and a Route-Length Statistic. Statistical Science, 25(3):275--288, 2010.
[3]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87:101374, 2020.
[4]
Marc Barthelemy. Spatial networks. Physics Reports, 499(1--3):1--101, Feb 2011.
[5]
Marc Barthelemy. Morphogenesis of Spatial Networks. Jan 2018.
[6]
Maciej Besta, Dimitri Stanojevic, Johannes de Fine Licht, Tal Ben-Nun, and Torsten Hoefler. Graph processing on FPGAs: Taxonomy, survey, challenges. CoRR, abs/1903.06697, 2019.
[7]
Siddharth Bhatia and Rajiv Kumar. Review of graph processing frameworks. In IEEE International Conference on Data Mining Workshops, pages 998--1005, 2018.
[8]
Guy E. Blelloch, Daniel Anderson, and Laxman Dhulipala. ParlayLib - a toolkit for parallel algorithms on shared-memory multicore machines. In ACM Symposium on Parallelism in Algorithms and Architectures, page 507--509, 2020.
[9]
Angela Bonifati, George Fletcher, Jan Hidders, and Alexandru Iosup. A Survey of Benchmarks for Graph- Processing Systems, pages 163--186. 2018.
[10]
Antoine Boutet, Anne-Marie Kermarrec, Nupur Mittal, and François Taïani. Being prepared in a sparse world: the case of KNN graph construction. In IEEE International Conference on Data Engineering, pages 241--252, 2016.
[11]
Maria R. Brito, Edgar L. Chávez, Adolfo J. Quiroz, and Joseph E. Yukich. Connectivity of the mutual k-nearestneighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35(1):33--42, 1997.
[12]
Ricardo Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, pages 5:1--5:51, 2015.
[13]
Edgar Chávez and Eric Sadit Tellez. Navigating k-nearest neighbor graphs to solve nearest neighbor searches. In Advances in Pattern Recognition, pages 270--280, 2010.
[14]
Daniel Chemla, Frédéric Meunier, and Roberto Wolfler Calvo. Bike sharing systems: Solving the static rebalancing problem. Discrete Optimization, 10(2):120--146, 2013.
[15]
Jie Chen, Haw-ren Fang, and Yousef Saad. Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. Journal of Machine Learning Research, 10(9), 2009.
[16]
Miguel E. Coimbra, Alexandre P. Francisco, and Luís Veiga. An analysis of the graph processing landscape. 44 Journal of Big Data, 8(1):55, 2021.
[17]
Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, and Shangtong Zhang. mlpack 3: a fast, flexible machine learning library. Journal of Open Source Software, 3:726, 2018.
[18]
Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2008.
[19]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 393--404, 2018.
[20]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. Low-latency graph streaming using compressed purelyfunctional trees. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 918--934, 2019.
[21]
Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E. Blelloch, and Julian Shun. The graph based benchmark suite (GBBS). In Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems and Network Data Analytics, 2020.
[22]
Wei Dong, Charikar Moses, and Kai Li. Efficient knearest neighbor graph construction for generic similarity measures. In International Conference on World Wide Web, page 577--586, 2011.
[23]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 226--231, 1996.
[24]
Efi Fogel and Monique Teillaud. The computational geometry algorithms library CGAL. ACM Commun. Comput. Algebra, 49(1):10--12, June 2015.
[25]
Pasi Franti, Olli Virmajoki, and Ville Hautamaki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1875--1881, 2006.
[26]
Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209--226, 7 1976.
[27]
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endow., 12(5):461--474, January 2019.
[28]
John C. Gower and Gavin J. S. Ross. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 18(1):54--64, 1969.
[29]
Chuang-Yi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xin-Yu Chen, Xiao-Fei Liao, and Hai Jin. A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 34(2):339--371, 2019.
[30]
Charles R. Harris et al. Array programming with NumPy. Nature, 585(7825):357--362, September 2020.
[31]
Ville Hautamaki, Ismo Karkkainen, and Pasi Franti. Outlier detection using k-nearest neighbour graph. In International Conference on Pattern Recognition, volume 3, pages 430--433, 2004.
[32]
Safiollah Heidari, Yogesh Simmhan, Rodrigo N. Calheiros, and Rajkumar Buyya. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv., 51(3), June 2018.
[33]
JerzyW. Jaromczyk and Godfried T. Toussaint. Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9):1502--1517, 1992.
[34]
George Karypis, Eui-Hong Han, and Vipin Kumar. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68--75, 1999.
[35]
David G. Kirkpatrick and John D. Radke. A framework for computational morphology. In Computational Geometry, volume 2 of Machine Intelligence and Pattern Recognition, pages 217--248. 1985.
[36]
Ning Liu, Dong-sheng Li, Yi-ming Zhang, and Xionglve Li. Large-scale graph processing systems: a survey. Frontiers of Information Technology & Electronic Engineering, 21(3):384--404, 2020.
[37]
Ma?gorzata Luci´nska and S?awomir T.Wierzcho´n. Spectral clustering based on k-nearest neighbor graph. In Computer Information Systems and Industrial Management, pages 254--265, 2012.
[38]
Markus Maier, Matthias Hein, and Ulrike von Luxburg. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theoretical Computer Science, 410(19):1749--1764, 2009.
[39]
Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824--836, 2020.
[40]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[41]
Robert Ryan McCune, Tim Weninger, and Greg Madey. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv., 48(2):25:1--25:39, October 2015.
[42]
Ulrich Meyer and Peter Sanders. D-stepping: a parallelizable shortest path algorithm. J. Algorithms, 49(1):114-- 152, 2003.
[43]
Nicholas Monath, Avinava Dubey, Guru Guruganesh, Manzil Zaheer, Amr Ahmed, Andrew McCallum, Gokhan Mergen, Marc Najork, Mert Terzihan, Bryon Tjanaka, Yuan Wang, and Yuchen Wu. Scalable 45 bottom-up hierarchical clustering. arXiv preprint arXiv:2010.11821, 2020.
[44]
Rodrigo Paredes and Edgar Chávez. Using the k-nearest neighbor graph for proximity searching in metric spaces. In String Processing and Information Retrieval, pages 127--138, 2005.
[45]
Fabian Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[46]
Benjamin Perret, Giovanni Chierchia, Jean Cousty, Silvio J. Guimaraes, Yukiko Kenmochi, and Laurent Najman. Higra: Hierarchical graph analysis. SoftwareX, 10:100335, 2019.
[47]
Franco P. Preparata and Michael I. Shamos. Computational Geometry. Springer, 1990.
[48]
John Radke and Anders Flodmark. The use of spatial decompositions for constructing street centerlines. Geographic Information Sciences, 5(1):15--23, 1999.
[49]
Thomas B. Sebastian and Benjamin B. Kimia. Metricbased shape retrieval in large databases. In Proceedings of the International Conference on Pattern Recognition (ICPR), 2002.
[50]
Jessica Shi, Laxman Dhulipala, and Julian Shun. Parallel clique counting and peeling algorithms. arXiv preprint arXiv:2002.10047, 2020.
[51]
Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. Graph processing on GPUs: A survey. ACM Comput. Surv., 50(6):81:1--81:35, January 2018.
[52]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 68--70, 2012.
[53]
Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In IEEE Data Compression Conference, pages 403--412, 2015.
[54]
Amarnag Subramanya and Partha Pratim Talukdar. Graph-Based Semi-Supervised Learning. Morgan & Claypool Publishers, 2014.
[55]
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. Rand-NSG: Fast accurate billion-point nearest neighbor search on a single node. In Conference on Neural Information Processing Systems, pages 13748-- 13758, 2019.
[56]
Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.
[57]
Godfried T. Toussaint and Constantin Berzan. Proximity-graph instance-based learning, support vector machines, and high dimensionality: An empirical comparison. In Machine Learning and Data Mining in Pattern Recognition, pages 222--236, 2012.
[58]
Tom Tseng, Laxman Dhulipala, and Julian Shun. Parallel index-based structural graph clustering and its approximation. In ACM SIGMOD International Conference on Management of Data, 2021.
[59]
Vijay V. Vazirani. Approximation Algorithms. Springer Publishing Company, Incorporated, 2010.
[60]
Remco C. Veltkamp. The g-neighborhood graph. Computational Geometry, 1(4):227--246, 1992.
[61]
Pauli Virtanen et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261--272, 2020.
[62]
Peng-Jun Wan, Grucia C?alinescu, Xiang-Yang Li, and Ophir Frieder. Minimum-energy broadcasting in static ad hoc wireless networks. Wireless Networks, 8(6):607-- 617, 2002.
[63]
Yiqiu Wang, Yan Gu, and Julian Shun. Theoreticallyefficient and practical parallel DBSCAN. In ACM SIGMOD International Conference on Management of Data, page 2555--2571, 2020.
[64]
YiqiuWang, Shangdi Yu, Yan Gu, and Julian Shun. Fast parallel algorithms for euclidean minimum spanning tree and hierarchical spatial clustering. In ACM SIGMOD International Conference on Management of Data, 2021.
[65]
Peter Willett. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577--597, 1988.
[66]
Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. Big graph analytics platforms. Foundations and Trends in Databases, 7(1--2):1--195, 2017.

Cited By

View all
  • (2024)An Energy-Efficient In-Memory Accelerator for Graph Construction and UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335503843:6(1781-1793)Online publication date: 18-Jan-2024
  • (2023)Fast and Space-Efficient Parallel Algorithms for Influence MaximizationProceedings of the VLDB Endowment10.14778/3632093.363210417:3(400-413)Online publication date: 1-Nov-2023
  • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
  • Show More Cited By
  1. GeoGraph: A Framework for Graph Processing on Geometric Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 55, Issue 1
      SIGOPS
      July 2021
      107 pages
      ISSN:0163-5980
      DOI:10.1145/3469379
      Issue’s Table of Contents
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 June 2021
      Published in SIGOPS Volume 55, Issue 1

      Check for updates

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)An Energy-Efficient In-Memory Accelerator for Graph Construction and UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335503843:6(1781-1793)Online publication date: 18-Jan-2024
      • (2023)Fast and Space-Efficient Parallel Algorithms for Influence MaximizationProceedings of the VLDB Endowment10.14778/3632093.363210417:3(400-413)Online publication date: 1-Nov-2023
      • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023
      • (2023)Provably Fast and Space-Efficient Parallel BiconnectivityProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577483(52-65)Online publication date: 25-Feb-2023
      • (2022)Retrieving Top-N Weighted Spatial k-cliques2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10021071(4952-4961)Online publication date: 17-Dec-2022
      • (2022)Efficient Retrieval of Top-k Weighted Triangles on Static and Dynamic Spatial DataIEEE Access10.1109/ACCESS.2022.317762010(55298-55307)Online publication date: 2022
      • (2022)Efficient Retrieval of Top-k Weighted Spatial TrianglesDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_17(224-231)Online publication date: 11-Apr-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media