Abstract
This paper proposes a study of existing environments used to enact data science pipelines applied to graphs. Data science pipelines are a new form of queries combining classic graph operations with artificial intelligence graph analytics operations. A pipeline defines a data flow consisting of tasks for querying, exploring and analysing graphs. Different environments and systems can be used for enacting pipelines. They range from graph NoSQL stores, programming languages extended with libraries providing graph processing and analytics functions, to full machine learning and artificial intelligence studios. The paper describes these environments and the design principles that they promote for enacting data science pipelines intended to query, process and explore data collections and particularly graphs.
Partially supported by the Auvergne-Rhône-Alpes Pack-Ambition project SUMMIT (http://summit.imag.fr). The work is part of the activities of the working group DOING-MADICS https://www.madics.fr/ateliers/doing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
GraphDB Lite, https://www.ontotext.com/products/graphdb/graphdb-free/.
- 2.
Neo4J, https://neo4j.com.
- 3.
OrientDB, https://orientdb.com/why-orientdb/.
- 4.
GraphEngine, https://www.graphengine.io.
- 5.
Mapgraph, http://mapgraph.io.
- 6.
ArangoDB, https://www.arangodb.com.
- 7.
Titan, http://titan.thinkaurelius.com.
- 8.
BrightStarDB, http://brightstardb.com.
- 9.
CayLayGraph, https://github.com/cayleygraph/cayley.
- 10.
WhiteDB, http://whitedb.org.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
https://keras.io capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
- 25.
Scikit-learn is a machine learning library for Python.
- 26.
Gluon is a deep learning interface by AWS and Microsoft.
- 27.
- 28.
MS cognitive services is a set of machine learning algorithms.
- 29.
- 30.
Caffe2 includes new features such as Recurrent Neural Networks and it is merged into PyTorch, see Caffe2 Merges With PyTorch". Medium. May 16, 2018, https://medium.com/@Synced/caffe2-merges-with-pytorch-a89c70ad9eb7.
- 31.
- 32.
References
Abdelhamid, E., Canim, M., Sadoghi, M., Bhattacharjee, B., Chang, Y.C., Kalnis, P.: Incremental frequent subgraph mining on large evolving graphs. IEEE Trans. Knowl. Data Eng. 29(12), 2710–2723 (2017)
Ahmed, Z., et al.: Machine learning at microsoft with ml. net. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2448–2458 (2019)
Barabási, A.L., et al.: Network Science. Cambridge University Press, Cambridge (2016)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011)
Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N.: Querying graphs. Synthesis Lect. Data Manage. 10(3), 1–184 (2018)
Bonifati, A., Holubovà, I., Prat-Pérez, A., Sakr, S.: Graph generators: state of the art and open challenges. arXiv preprint arXiv:2001.07906 (2020)
Dayarathna, M., Suzumura, T.: Towards scalable distributed graph database engine for hybrid clouds. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 1–8 (2014)
Desikan, P., Srivastava, J.: Mining temporally evolving graphs. In: Proceedings of the the Sixth WEBKDD Workshop in Conjunction with the 10th ACM SIGKDD Conference, vol. 22. Citeseer (2004)
Dinari, H.: A survey on graph queries processing: techniques and methods. Int. J. Comput. Netw. Inf. Secur. 9(4), 48 (2017)
Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp. 405–418. ACM (2008). https://doi.org/10.1145/1376616.1376660
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1. Morgan & Claypool, 1st ed., html version edn, February 2011. https://doi.org/10.2200/S00334ED1V01Y201102WBE001
Kalmegh, P., Navathe, S.B.: Graph database design challenges using HPC platforms. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1306–1309 (2012)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)
Mattson, T.G., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2 (2013)
Mayer, R., Mayer, C., Tariq, M.A., Rothermel, K.: Graphcep: real-time data analytics using parallel complex event and graph processing. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 309–316 (2016)
Muntés-Mulero, V., Martínez-Bazán, N., Larriba-Pey, J.-L., Pacitti, E., Valduriez, P.: Graph partitioning strategies for efficient bfs in shared-nothing parallel systems. In: Shen, H.T., et al. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 13–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16720-1_2
Patil, N.S., Kiran, P., Kiran, N.P., Patel, K.N.: A survey on graph database management techniques for huge unstructured data (2018)
Paulheim, H.: Ontology-based Application Integration. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1430-8
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic Web 8(3), 489–508 (2017)
Rawat, D.S., Kashyap, N.K.: Graph database: a complete gdbms survey. Int. J. 3, 217–226 (2017)
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly, Beijing, 2 edn (2015). https://www.safaribooksonline.com/library/view/graph-databases-2nd/9781491930885/
Segaran, T., Evans, C., Taylor, J., Toby, S., Colin, E., Jamie, T.: Programming the Semantic Web. O’Reilly Media Inc., 1st edn. (2009)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)
Zhang, S., Hu, M., Yang, J.: Treepi: a novel graph indexing method. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 966–975. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vargas-Solar, G., Zechinelli-Martini, JL., Espinosa-Oviedo, J.A. (2020). Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-55814-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55813-0
Online ISBN: 978-3-030-55814-7
eBook Packages: Computer ScienceComputer Science (R0)