Skip to main content

Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios

  • Conference paper
  • First Online:
ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium (TPDL 2020, ADBIS 2020)

Abstract

This paper proposes a study of existing environments used to enact data science pipelines applied to graphs. Data science pipelines are a new form of queries combining classic graph operations with artificial intelligence graph analytics operations. A pipeline defines a data flow consisting of tasks for querying, exploring and analysing graphs. Different environments and systems can be used for enacting pipelines. They range from graph NoSQL stores, programming languages extended with libraries providing graph processing and analytics functions, to full machine learning and artificial intelligence studios. The paper describes these environments and the design principles that they promote for enacting data science pipelines intended to query, process and explore data collections and particularly graphs.

Partially supported by the Auvergne-Rhône-Alpes Pack-Ambition project SUMMIT (http://summit.imag.fr). The work is part of the activities of the working group DOING-MADICS https://www.madics.fr/ateliers/doing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GraphDB Lite, https://www.ontotext.com/products/graphdb/graphdb-free/.

  2. 2.

    Neo4J, https://neo4j.com.

  3. 3.

    OrientDB, https://orientdb.com/why-orientdb/.

  4. 4.

    GraphEngine, https://www.graphengine.io.

  5. 5.

    Mapgraph, http://mapgraph.io.

  6. 6.

    ArangoDB, https://www.arangodb.com.

  7. 7.

    Titan, http://titan.thinkaurelius.com.

  8. 8.

    BrightStarDB, http://brightstardb.com.

  9. 9.

    CayLayGraph, https://github.com/cayleygraph/cayley.

  10. 10.

    WhiteDB, http://whitedb.org.

  11. 11.

    Orly, https://github.com/orlyatomics/orly.

  12. 12.

    https://docs.microsoft.com/en-us/azure/cosmos-db/introduction.

  13. 13.

    https://giraph.apache.org/.

  14. 14.

    https://networkx.github.io.

  15. 15.

    https://guides.github.com/features/mastering-markdown/.

  16. 16.

    http://www.kaggle.com.

  17. 17.

    https://colab.research.google.com.

  18. 18.

    https://notebooks.azure.com.

  19. 19.

    https://aws.amazon.com/fr/sagemaker/.

  20. 20.

    https://azure.microsoft.com/en-us/services/machine-learning/.

  21. 21.

    https://cloud.google.com/ai-platform.

  22. 22.

    https://www.ibm.com/cloud/machine-learning.

  23. 23.

    https://www.tensorflow.org.

  24. 24.

    https://keras.io capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.

  25. 25.

    Scikit-learn is a machine learning library for Python.

  26. 26.

    Gluon is a deep learning interface by AWS and Microsoft.

  27. 27.

    https://pytorch.org.

  28. 28.

    MS cognitive services is a set of machine learning algorithms.

  29. 29.

    https://github.com/dmlc/xgboost.

  30. 30.

    Caffe2 includes new features such as Recurrent Neural Networks and it is merged into PyTorch, see Caffe2 Merges With PyTorch". Medium. May 16, 2018, https://medium.com/@Synced/caffe2-merges-with-pytorch-a89c70ad9eb7.

  31. 31.

    https://chainer.org.

  32. 32.

    https://mxnet.apache.org.

References

  1. Abdelhamid, E., Canim, M., Sadoghi, M., Bhattacharjee, B., Chang, Y.C., Kalnis, P.: Incremental frequent subgraph mining on large evolving graphs. IEEE Trans. Knowl. Data Eng. 29(12), 2710–2723 (2017)

    Article  Google Scholar 

  2. Ahmed, Z., et al.: Machine learning at microsoft with ml. net. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2448–2458 (2019)

    Google Scholar 

  3. Barabási, A.L., et al.: Network Science. Cambridge University Press, Cambridge (2016)

    MATH  Google Scholar 

  4. Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011)

    Google Scholar 

  5. Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N.: Querying graphs. Synthesis Lect. Data Manage. 10(3), 1–184 (2018)

    Article  Google Scholar 

  6. Bonifati, A., Holubovà, I., Prat-Pérez, A., Sakr, S.: Graph generators: state of the art and open challenges. arXiv preprint arXiv:2001.07906 (2020)

  7. Dayarathna, M., Suzumura, T.: Towards scalable distributed graph database engine for hybrid clouds. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 1–8 (2014)

    Google Scholar 

  8. Desikan, P., Srivastava, J.: Mining temporally evolving graphs. In: Proceedings of the the Sixth WEBKDD Workshop in Conjunction with the 10th ACM SIGKDD Conference, vol. 22. Citeseer (2004)

    Google Scholar 

  9. Dinari, H.: A survey on graph queries processing: techniques and methods. Int. J. Comput. Netw. Inf. Secur. 9(4), 48 (2017)

    Google Scholar 

  10. Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)

    Article  Google Scholar 

  11. He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Wang, J.T. (ed.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp. 405–418. ACM (2008). https://doi.org/10.1145/1376616.1376660

  12. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1. Morgan & Claypool, 1st ed., html version edn, February 2011. https://doi.org/10.2200/S00334ED1V01Y201102WBE001

  13. Kalmegh, P., Navathe, S.B.: Graph database design challenges using HPC platforms. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1306–1309 (2012)

    Google Scholar 

  14. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)

    Google Scholar 

  15. Mattson, T.G., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2 (2013)

    Google Scholar 

  16. Mayer, R., Mayer, C., Tariq, M.A., Rothermel, K.: Graphcep: real-time data analytics using parallel complex event and graph processing. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 309–316 (2016)

    Google Scholar 

  17. Muntés-Mulero, V., Martínez-Bazán, N., Larriba-Pey, J.-L., Pacitti, E., Valduriez, P.: Graph partitioning strategies for efficient bfs in shared-nothing parallel systems. In: Shen, H.T., et al. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 13–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16720-1_2

    Chapter  Google Scholar 

  18. Patil, N.S., Kiran, P., Kiran, N.P., Patel, K.N.: A survey on graph database management techniques for huge unstructured data (2018)

    Google Scholar 

  19. Paulheim, H.: Ontology-based Application Integration. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1430-8

  20. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic Web 8(3), 489–508 (2017)

    Article  Google Scholar 

  21. Rawat, D.S., Kashyap, N.K.: Graph database: a complete gdbms survey. Int. J. 3, 217–226 (2017)

    Google Scholar 

  22. Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly, Beijing, 2 edn (2015). https://www.safaribooksonline.com/library/view/graph-databases-2nd/9781491930885/

  23. Segaran, T., Evans, C., Taylor, J., Toby, S., Colin, E., Jamie, T.: Programming the Semantic Web. O’Reilly Media Inc., 1st edn. (2009)

    Google Scholar 

  24. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)

    Google Scholar 

  25. Zhang, S., Hu, M., Yang, J.: Treepi: a novel graph indexing method. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 966–975. IEEE (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Genoveva Vargas-Solar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vargas-Solar, G., Zechinelli-Martini, JL., Espinosa-Oviedo, J.A. (2020). Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55814-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55813-0

  • Online ISBN: 978-3-030-55814-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics