Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios

Vargas-Solar, Genoveva; Zechinelli-Martini, José-Luis; Espinosa-Oviedo, Javier A.

doi:10.1007/978-3-030-55814-7_23

Genoveva Vargas-Solar²³,
José-Luis Zechinelli-Martini²⁴ &
Javier A. Espinosa-Oviedo²⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1260))

Included in the following conference series:

720 Accesses
3 Citations

Abstract

This paper proposes a study of existing environments used to enact data science pipelines applied to graphs. Data science pipelines are a new form of queries combining classic graph operations with artificial intelligence graph analytics operations. A pipeline defines a data flow consisting of tasks for querying, exploring and analysing graphs. Different environments and systems can be used for enacting pipelines. They range from graph NoSQL stores, programming languages extended with libraries providing graph processing and analytics functions, to full machine learning and artificial intelligence studios. The paper describes these environments and the design principles that they promote for enacting data science pipelines intended to query, process and explore data collections and particularly graphs.

Partially supported by the Auvergne-Rhône-Alpes Pack-Ambition project SUMMIT (http://summit.imag.fr). The work is part of the activities of the working group DOING-MADICS https://www.madics.fr/ateliers/doing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
GraphDB Lite, https://www.ontotext.com/products/graphdb/graphdb-free/.
2.
Neo4J, https://neo4j.com.
3.
OrientDB, https://orientdb.com/why-orientdb/.
4.
GraphEngine, https://www.graphengine.io.
5.
Mapgraph, http://mapgraph.io.
6.
ArangoDB, https://www.arangodb.com.
7.
Titan, http://titan.thinkaurelius.com.
8.
BrightStarDB, http://brightstardb.com.
9.
CayLayGraph, https://github.com/cayleygraph/cayley.
10.
WhiteDB, http://whitedb.org.
11.
Orly, https://github.com/orlyatomics/orly.
12.
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction.
13.
https://giraph.apache.org/.
14.
https://networkx.github.io.
15.
https://guides.github.com/features/mastering-markdown/.
16.
http://www.kaggle.com.
17.
https://colab.research.google.com.
18.
https://notebooks.azure.com.
19.
https://aws.amazon.com/fr/sagemaker/.
20.
https://azure.microsoft.com/en-us/services/machine-learning/.
21.
https://cloud.google.com/ai-platform.
22.
https://www.ibm.com/cloud/machine-learning.
23.
https://www.tensorflow.org.
24.
https://keras.io capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
25.
Scikit-learn is a machine learning library for Python.
26.
Gluon is a deep learning interface by AWS and Microsoft.
27.
https://pytorch.org.
28.
MS cognitive services is a set of machine learning algorithms.
29.
https://github.com/dmlc/xgboost.
30.
Caffe2 includes new features such as Recurrent Neural Networks and it is merged into PyTorch, see Caffe2 Merges With PyTorch". Medium. May 16, 2018, https://medium.com/@Synced/caffe2-merges-with-pytorch-a89c70ad9eb7.
31.
https://chainer.org.
32.
https://mxnet.apache.org.

References

Abdelhamid, E., Canim, M., Sadoghi, M., Bhattacharjee, B., Chang, Y.C., Kalnis, P.: Incremental frequent subgraph mining on large evolving graphs. IEEE Trans. Knowl. Data Eng. 29(12), 2710–2723 (2017)
Article Google Scholar
Ahmed, Z., et al.: Machine learning at microsoft with ml. net. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2448–2458 (2019)
Google Scholar
Barabási, A.L., et al.: Network Science. Cambridge University Press, Cambridge (2016)
MATH Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011)
Google Scholar
Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N.: Querying graphs. Synthesis Lect. Data Manage. 10(3), 1–184 (2018)
Article Google Scholar
Bonifati, A., Holubovà, I., Prat-Pérez, A., Sakr, S.: Graph generators: state of the art and open challenges. arXiv preprint arXiv:2001.07906 (2020)
Dayarathna, M., Suzumura, T.: Towards scalable distributed graph database engine for hybrid clouds. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 1–8 (2014)
Google Scholar
Desikan, P., Srivastava, J.: Mining temporally evolving graphs. In: Proceedings of the the Sixth WEBKDD Workshop in Conjunction with the 10th ACM SIGKDD Conference, vol. 22. Citeseer (2004)
Google Scholar
Dinari, H.: A survey on graph queries processing: techniques and methods. Int. J. Comput. Netw. Inf. Secur. 9(4), 48 (2017)
Google Scholar
Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)
Article Google Scholar
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp. 405–418. ACM (2008). https://doi.org/10.1145/1376616.1376660
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1. Morgan & Claypool, 1st ed., html version edn, February 2011. https://doi.org/10.2200/S00334ED1V01Y201102WBE001
Kalmegh, P., Navathe, S.B.: Graph database design challenges using HPC platforms. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1306–1309 (2012)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)
Google Scholar
Mattson, T.G., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2 (2013)
Google Scholar
Mayer, R., Mayer, C., Tariq, M.A., Rothermel, K.: Graphcep: real-time data analytics using parallel complex event and graph processing. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 309–316 (2016)
Google Scholar
Muntés-Mulero, V., Martínez-Bazán, N., Larriba-Pey, J.-L., Pacitti, E., Valduriez, P.: Graph partitioning strategies for efficient bfs in shared-nothing parallel systems. In: Shen, H.T., et al. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 13–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16720-1_2
Chapter Google Scholar
Patil, N.S., Kiran, P., Kiran, N.P., Patel, K.N.: A survey on graph database management techniques for huge unstructured data (2018)
Google Scholar
Paulheim, H.: Ontology-based Application Integration. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1430-8
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic Web 8(3), 489–508 (2017)
Article Google Scholar
Rawat, D.S., Kashyap, N.K.: Graph database: a complete gdbms survey. Int. J. 3, 217–226 (2017)
Google Scholar
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly, Beijing, 2 edn (2015). https://www.safaribooksonline.com/library/view/graph-databases-2nd/9781491930885/
Segaran, T., Evans, C., Taylor, J., Toby, S., Colin, E., Jamie, T.: Programming the Semantic Web. O’Reilly Media Inc., 1st edn. (2009)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)
Google Scholar
Zhang, S., Hu, M., Yang, J.: Treepi: a novel graph indexing method. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 966–975. IEEE (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG-LAFMIA, 38000, Grenoble, France
Genoveva Vargas-Solar
Fundación Universidad de las Américas Puebla, 72810, San Andrés Cholula, Mexico
José-Luis Zechinelli-Martini
Univ. of Lyon, LAFMIA, 69008, Lyon, France
Javier A. Espinosa-Oviedo

Authors

Genoveva Vargas-Solar
View author publications
You can also search for this author in PubMed Google Scholar
José-Luis Zechinelli-Martini
View author publications
You can also search for this author in PubMed Google Scholar
Javier A. Espinosa-Oviedo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Genoveva Vargas-Solar .

Editor information

Editors and Affiliations

ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche
Slovak University of Technology, Bratislava, Slovakia
Mária Bieliková
Université Lumière Lyon 2, Lyon, France
Omar Boussaïd
University of Genova, Genova, Italy
Barbara Catania
Université Lumière Lyon 2, Lyon, France
Jérôme Darmont
Leibniz University of Hannover, Hannover, Niedersachsen, Germany
Elena Demidova
Université Claude Bernard Lyon 1, Lyon, France
Fabien Duchateau
The Open University, Milton Keynes, UK
Mark Hall
University of Ljubljana, Ljubljana, Slovenia
Tanja Merčun
National Research University Higher School of Economics, St. Petersburg, Russia
Boris Novikov
Ionian University, Corfu, Greece
Christos Papatheodorou
Goethe University Frankfurt, Frankfurt am Main, Hessen, Germany
Thomas Risse
Universitat Politècnica de Catalunya, Barcelona, Spain
Oscar Romero
AgroParisTech, Montpellier, France
Lucile Sautot
University of Lyon, Lyon, France
Guilaine Talens
Poznań University of Technology, Poznań, Poland
Robert Wrembel
University of Ljubljana, Ljubljana, Slovenia
Maja Žumer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vargas-Solar, G., Zechinelli-Martini, JL., Espinosa-Oviedo, J.A. (2020). Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-55814-7_23
Published: 18 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55813-0
Online ISBN: 978-3-030-55814-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics