Skip to main content

Extracting Schemas from Large Graphs with Utility Function and Parallelization

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10829))

Included in the following conference series:

  • 919 Accesses

Abstract

Unlike relational databases and XML documents, most of graphs are not given their own schemas. If we can extract a schema from a graph efficiently, we can take advantage of the extracted schema for query optimization, structure browsing, and so on. In this paper, we consider extracting schemas from large graphs by using utility function. Although reasonable schemas can be extracted by the utility function, the major problem of the utility function is its computation cost. In this paper, we propose a schema extraction algorithm based on (a) a novel utility function called local utility function and (b) parallelization. Experimental results show that our algorithm can extract schemas from graphs more efficiently without losing quality of schemas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/grosser/parallel.

  2. 2.

    http://benchmark.dbpedia.org/.

References

  1. Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 564–575 (2004)

    Google Scholar 

  2. Fisher, D.: Knowledge acquisition via incremental conceptual clustering. In: Shavlik, J., Dietterich, T. (eds.) Readings in Machine Learning. Morgan Kaufmann Publishers (1990)

    Google Scholar 

  3. Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: a system for extracting document type descriptors from XML documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 165–176 (2000)

    Google Scholar 

  4. Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997), pp. 436–445 (1997)

    Google Scholar 

  5. Goldman, R., Widom, J.: Approximate DataGuides. In: Proceedings of the Workshop on Query Processing for Semistructured Data and Non-standard Data Formats, vol. 97, pp. 436–445 (1999)

    Google Scholar 

  6. Hegewald, J., Naumann, F., Weis, M.: XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, p. 81 (2006)

    Google Scholar 

  7. Luo, Y., Fletcher, G.H., Hidders, J., Wu, Y., De Bra, P.: External memory k-bisimulation reduction of big graphs. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), pp. 919–928 (2013)

    Google Scholar 

  8. Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 419–432 (2008)

    Google Scholar 

  9. Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 295–306 (1998)

    Google Scholar 

  10. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\({^{2}}\)Bench: a SPARQL performance benchmark. In: Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE 2009), pp. 222–233. IEEE (2009)

    Google Scholar 

  11. Sekine, Y., Suzuki, N.: An algorithm for extracting schemas from external memory graphs. In: Proceedings of the First Workshop on Big Network Analytics (in Conjunction with CIKM 2016) (2016)

    Google Scholar 

  12. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 302–314 (1999)

    Google Scholar 

  13. Wang, Q.Y., Yu, J.X., Wong, K.-F.: Approximate graph schema extraction for semi-structured data. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 302–316. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46439-5_21

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP17K00150.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nobutaka Suzuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sekine, Y., Suzuki, N. (2018). Extracting Schemas from Large Graphs with Utility Function and Parallelization. In: Liu, C., Zou, L., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10829. Springer, Cham. https://doi.org/10.1007/978-3-319-91455-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91455-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91454-1

  • Online ISBN: 978-3-319-91455-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics