Summarizing Large-Scale Database Schema Using Community Detection

Wang, Xue; Zhou, Xuan; Wang, Shan

doi:10.1007/s11390-012-1240-1

Summarizing Large-Scale Database Schema Using Community Detection

Regular Paper
Published: 19 May 2012

Volume 27, pages 515–526, (2012)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xue Wang¹,
Xuan Zhou² &
Shan Wang^1,2

176 Accesses
7 Citations
Explore all metrics

Abstract

Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of density based clustering algorithms

Article 29 September 2020

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

A survey on visualization approaches for exploring association relationships in graph data

Article 02 April 2019

References

Newman M E J, Girvan M. Finding and evaluating community structure in networks. Physical Review E, 2004, 69(2): 026113.
Article Google Scholar
Newman M E J, Fast algorithm for detecting community structure in networks. Physical Review E, 2004, 69(6): 066133.
Article Google Scholar
Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P. Community detection in social media. Data Mining and Knowledge Discovery, 2012, 24(3): 515–554.
Article Google Scholar
Shi J, Malik . Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905.
Google Scholar
Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395–416.
Article MathSciNet Google Scholar
Rahn E, Bernstein P A. A survey of approaches to automatic schema matching. J. Very Large Data Base, 2001, 10(4): 334–350.
Article Google Scholar
Yang X, Procopiuc C M, Srivastava D. Summarizing relational databases. PVLDB, 2009, 2(1): 634–645.
Google Scholar
www.freebase.com, September 2011.
Wu W, Reinwald B, Sismannis Y, Manjrekar B. Discovering topical structures of databases. In Proc. SIGMOD2008, June 2008, pp.1019–1030.
Dyer M E, Fireze A M. A simple heuristic for the p-center problem. Operations Research Letters, 1985, 3(6): 285–288.
Article MathSciNet MATH Google Scholar
Clauset A, Newman M E J, Moore C. Finding community structure in very large networks. Physical Review E, 2004, 70(6): 066111.
Article Google Scholar
Lancichinetti A, Fortunato S. Community detection algorithms: A comparative analysis. Physical Review E, 2009, 80(5): 056117.
Article Google Scholar
Campbell L J, Halpin T A, Proper H A. Conceptual schemas with abstractions making flat conceptual schemas more comprehensible. Data & Knowledge Engineering, 1996, 20(1): 39–85.
Article MATH Google Scholar
Feldman P, Miller D. Entity model clustering: Structuring a data model by abstraction. The Computer Journal, 1986, 29(4): 348–360.
Article Google Scholar
Teorey T, Wei G, Bolton D, Koenig J. ER model clustering as an aid for user communication and documentation in database design. Communications of the ACM, 1989, 32(8): 975–987.
Article Google Scholar
Huffman S B, Zoeller R V. A rule-based system tool for automated ER model clustering. In Proc. the 8th International Conference on Entity-Relationship Approach to Database Design and Querying, Oct. 1990, pp.221–236.
Campbell L J, Halpin T A, Proper H A. CA ERwin data modeler, www.ca.com.
Yu C, Jagadish H V. Schema summarization. In Proc. the 32nd International Conference on Very Large Data Bases, Sep. 2006, pp.319–330.
Motwani R, Raghavan P. Randomized Algorithms. Cambridge Univ. Press, 1995.
Han J, Kamber M. Data Mining: Concepts and Techniques (2nd edition). Morgan Kaufmann, 2006.
Domingos P, Richardson M. Mining the network value of customers. In Proc. the 7th ACM SIGKDD, Aug. 2001, pp.57–66.
Richardson M, Domingos P. Mining knowledge-sharing sites for viral marketing. In Proc. the 8th ACM SIGKDD, July 2002, pp.61–70.
Kempe D, Kleinberg J M, Tardos E. Maximizing the spread of influence through a social network. In Proc. the 9th ACM SIGKDD, Aug. 2003, pp.137–146.

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Xue Wang & Shan Wang (Senior Member, CCF, Member, ACM)
Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Beijing, 100872, China
Xuan Zhou & Shan Wang (Senior Member, CCF, Member, ACM)

Authors

Xue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xue Wang.

Additional information

This work is partly supported by the “HGJ” National Science and Technology Major Project of China under Grant No. 2010ZX01042-001-002, the National Natural Science Foundation of China under Grant No. 61070054, the National High Technology Research and Development 863 Program of China under Grant No. 2009AA01Z149, the Research Funds of Renmin University of China under Grant No. 10XNI018 and the Postgraduate Science & Research Funds of Renmin University of China under Grant No. 12XNH177.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 103 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Zhou, X. & Wang, S. Summarizing Large-Scale Database Schema Using Community Detection. J. Comput. Sci. Technol. 27, 515–526 (2012). https://doi.org/10.1007/s11390-012-1240-1

Download citation

Received: 04 September 2011
Revised: 31 December 2011
Published: 19 May 2012
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11390-012-1240-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Summarizing Large-Scale Database Schema Using Community Detection

Abstract

Access this article

Similar content being viewed by others

A survey of density based clustering algorithms

A comprehensive survey on community detection methods and applications in complex information networks

A survey on visualization approaches for exploring association relationships in graph data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 103 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Summarizing Large-Scale Database Schema Using Community Detection

Abstract

Access this article

Similar content being viewed by others

A survey of density based clustering algorithms

A comprehensive survey on community detection methods and applications in complex information networks

A survey on visualization approaches for exploring association relationships in graph data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 103 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation