Extracting Schemas from Large Graphs with Utility Function and Parallelization

Sekine, Yoshiki; Suzuki, Nobutaka

doi:10.1007/978-3-319-91455-8_13

Yoshiki Sekine¹⁶ &
Nobutaka Suzuki¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10829))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

919 Accesses

Abstract

Unlike relational databases and XML documents, most of graphs are not given their own schemas. If we can extract a schema from a graph efficiently, we can take advantage of the extracted schema for query optimization, structure browsing, and so on. In this paper, we consider extracting schemas from large graphs by using utility function. Although reasonable schemas can be extracted by the utility function, the major problem of the utility function is its computation cost. In this paper, we propose a schema extraction algorithm based on (a) a novel utility function called local utility function and (b) parallelization. Experimental results show that our algorithm can extract schemas from graphs more efficiently without losing quality of schemas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 564–575 (2004)
Google Scholar
Fisher, D.: Knowledge acquisition via incremental conceptual clustering. In: Shavlik, J., Dietterich, T. (eds.) Readings in Machine Learning. Morgan Kaufmann Publishers (1990)
Google Scholar
Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: a system for extracting document type descriptors from XML documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 165–176 (2000)
Google Scholar
Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997), pp. 436–445 (1997)
Google Scholar
Goldman, R., Widom, J.: Approximate DataGuides. In: Proceedings of the Workshop on Query Processing for Semistructured Data and Non-standard Data Formats, vol. 97, pp. 436–445 (1999)
Google Scholar
Hegewald, J., Naumann, F., Weis, M.: XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, p. 81 (2006)
Google Scholar
Luo, Y., Fletcher, G.H., Hidders, J., Wu, Y., De Bra, P.: External memory k-bisimulation reduction of big graphs. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), pp. 919–928 (2013)
Google Scholar
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 419–432 (2008)
Google Scholar
Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 295–306 (1998)
Google Scholar
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\({^{2}}\)Bench: a SPARQL performance benchmark. In: Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE 2009), pp. 222–233. IEEE (2009)
Google Scholar
Sekine, Y., Suzuki, N.: An algorithm for extracting schemas from external memory graphs. In: Proceedings of the First Workshop on Big Network Analytics (in Conjunction with CIKM 2016) (2016)
Google Scholar
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 302–314 (1999)
Google Scholar
Wang, Q.Y., Yu, J.X., Wong, K.-F.: Approximate graph schema extraction for semi-structured data. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 302–316. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46439-5_21
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP17K00150.

Author information

Authors and Affiliations

University of Tsukuba, 1-2 Kasuga, Tsukuba, Ibaraki, 305-8550, Japan
Yoshiki Sekine & Nobutaka Suzuki

Authors

Yoshiki Sekine
View author publications
You can also search for this author in PubMed Google Scholar
Nobutaka Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nobutaka Suzuki .

Editor information

Editors and Affiliations

Swinburne University of Technology, Hawthorn, VIC, Australia
Chengfei Liu
Peking University, Beijing, China
Lei Zou
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sekine, Y., Suzuki, N. (2018). Extracting Schemas from Large Graphs with Utility Function and Parallelization. In: Liu, C., Zou, L., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10829. Springer, Cham. https://doi.org/10.1007/978-3-319-91455-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-91455-8_13
Published: 12 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91454-1
Online ISBN: 978-3-319-91455-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics