Skip to main content
Log in

Summarizing database schema based on graph partition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As the underlying database schemas become larger and more complex, it is difficult for casual users to understand the schemas and contents of databases. Therefore, it has become an essential task to summarize the database schemas. However, most prior approaches pay little attention to the topological characteristics between tables, ignore the effect of the user feedback, and fail to accurately predict the number of clusters in the output. This seriously limits their accuracy of schema summarization. To deal with the problems, we propose a new schema summarization method based on a graph partition mechanism. First, we introduce a novel strategy to construct a similarity matrix between tables, which is based on the topology compactness, content similarity and query logs. Then we provide a calculation formula for table importance and a detection scheme of the most important nodes in local areas. Both are used for selecting the initial cluster centers and predicting the number of clusters in the graph partition mechanism. Finally, we evaluate the proposed method over the database TPC-E, and results demonstrate that it achieves high performance in summarizing accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Alborzi F, Chirkova R, Doyle J, Fathi Y (2015) Determining query readiness for structured data. In: 17th International Conference on Big Data Analytics and Knowledge Discovery, Valencia, Spain, 2015. pp 3-14

  2. Beneventano D, Guerra F, Velegrakis Y (2017) Data exploration on large amount of relational data through keyword queries. In: 15th International Conference on High Performance Computing and Simulation, Genoa, Italy, 2017. pp 70-73

  3. Bergamaschi S, Guerra F, Simonini G (2014) Keyword search over relational databases: Issues, approaches and open challenges. In: 2013 PROMISE Winter School: Bridging Between Information Retrieval and Databases, Bressanone, Italy, 2013. pp 54-73

  4. Bergamaschi S, Ferrari D, Guerra F, Simonini G, Velegrakis Y (2016) Providing insight into data source topics. Journal on Data Semantics 5(4):211–228

    Article  Google Scholar 

  5. Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308

    Article  MathSciNet  Google Scholar 

  6. Dimitroff G, Georgiev G, Toloi L, Popov B (2014) Efficient F measure maximization via weighted maximum likelihood. Mach Learn 98(3):435–454

    Article  MathSciNet  Google Scholar 

  7. Kahng M, Navathe SB, Stasko JT, Chau DH (2016, 2016) Interactive browsing and navigation in relational databases. In: 42nd international conference on very large data bases. New Delhi, India:1017–1028

  8. Kargar M, An A, Cercone N, Godfrey P, Szlichta J, Yu X (2015) Meaningful keyword search in relational databases with large and complex schema. In: 31st IEEE International Conference on Data Engineering, Seoul, Korea, 2015. pp 411-422

  9. Kruse S, Hahn D, Walter M, Naumann F (2017) Metacrate: Organize and analyze millions of data profiles. In: 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017. pp 2483-2486

  10. Liu D, Liu G, Zhao W, Hou Y (2017) Top-k keyword search with recursive semantics in relational databases. Int J Comput Sci Eng 14(4):359–369

    Google Scholar 

  11. Luo Y, Lin X, Wang W, Zhou X (2007) Spark: top-k keyword query in relational databases. In: SIGMOD 2007: ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007. pp 115-126

  12. Sampaio M, Quesado J, Barros S (2013) Relational schema summarization: A context-oriented approach. In: 16th East-European Conference on Advances in Databases and Information Systems, Poznan, Poland, 2013. pp 217-228

  13. Taheriyan M, Knoblock CA, Szekely P, Ambite JL (2016) Learning the semantics of structured data sources. Journal of Web Semantics 37-38:152–169

    Article  Google Scholar 

  14. TPCE. http://www.tpc.org/tpce/default.asp#top

  15. Troullinou G, Kondylakis H, Daskalaki E, Plexousakis D (2015) RDF digest: Efficient summarization of RDF/S KBs. In: 12th European Semantic Web Conference, Portoroz, Slovenia, 2015. pp 119-134

  16. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(4):141–188

    Article  MathSciNet  Google Scholar 

  17. Van Gennip Y, Hunter B, Ahn R, Elliott P, Luh K, Halvorson M, Reid S, Valasik M, Wo J, Tita GE, Bertozzi AL, Brantingham PJ (2013) Community detection using spectral clustering on sparse geosocial data. SIAM J Appl Math 73(1):67–83

    Article  MathSciNet  Google Scholar 

  18. Wang N, Tian T (2016) Summarizing personal dataspace based on user interests. Int J Software Engineer Knowledge Engineer 26(5):691–713

    Article  Google Scholar 

  19. Wang X, Zhou X, Wang S (2012) Summarizing large-scale database schema using community detection. J Comput Sci Technol 27(3):515–526

    Article  Google Scholar 

  20. Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Disc 28(1):1–30

    Article  MathSciNet  Google Scholar 

  21. Wang Z, Chen Z, Zhao Y, Niu Q (2014) A novel local maximum potential point search algorithm for topology potential field. International Journal of Hybrid Information Technology 7(2):1–8

    Article  Google Scholar 

  22. Wu W, Reinwald B, Sismanis Y, Manjrekar R (2008) Discovering topical structures of databases. In: 2008 ACM SIGMOD International Conference on Management of Data 2008, Vancouver, Canada, 2008. pp 1019-1030

  23. Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576

    Article  Google Scholar 

  24. Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089

    Article  Google Scholar 

  25. Yan N, Hasani S, Asudeh A, Li C (2016) Generating preview tables for entity graphs. In: 2016 ACM SIGMOD International Conference on Management of Data, San Francisco, United states, 2016. pp 1797-1811

  26. Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2018) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229

    Article  Google Scholar 

  27. Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 19(1):284–295

    Article  Google Scholar 

  28. Yang X, Procopiuc CM, Srivastava D (2009) Summarizing relational databases. Proceedings of the VLDB Endowment 2(1):634–645

    Article  Google Scholar 

  29. Yang X, Procopiuc CM, Srivastava D (2011) Summary graphs for relational database schemas. Proceedings of the VLDB Endowment 4(11):899–910

    Google Scholar 

  30. Yu C, Jagadish HV (2006) Schema summarization. In: 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006. pp 319-330

  31. Yuan X, Li X, Yu M, Cai X, Zhang Y, Wen Y (2014) Summarizing Relational Database Schema Based on Label Propagation. In: 16th Asia-Pacific Web Conference on Web Technologies and Applications, Changsha, China, 2014. pp 258-269

Download references

Acknowledgements

This work is sponsored by the National Natural Science Foundation of China under Grant No. 61772152 and 61502037, and the Basic Research Project (No. JCKY2016206B001, JCKY2014206C002 and JCKY2017604C010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianke Zhou.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Zhou, L. & Wang, N. Summarizing database schema based on graph partition. Multimed Tools Appl 78, 10077–10096 (2019). https://doi.org/10.1007/s11042-018-6543-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6543-y

Keywords

Navigation