A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas

Batini, Carlo; Bonizzoni, Paola; Comerio, Marco; Dondi, Riccardo; Pirola, Yuri; Salandra, Francesco

doi:10.1007/s11390-015-1514-5

A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas

Regular Paper
Published: 21 January 2015

Volume 30, pages 214–224, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Carlo Batini¹,
Paola Bonizzoni¹,
Marco Comerio¹,
Riccardo Dondi²,
Yuri Pirola¹ &
…
Francesco Salandra¹

140 Accesses
4 Citations
Explore all metrics

Abstract

When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an efficient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t. the ones manually defined by an expert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Making data visualization more efficient and effective: a survey

Article 19 November 2019

Xuedi Qin, Yuyu Luo, … Guoliang Li

References

Batini C, Lenzerini M, Navathe S B. A comparative analysis of methodologies for database schema integration. ACM Comput. Surv., 1986, 18(4): 323-364.
Article Google Scholar
Spaccapietra S, Parent C, Dupont Y. Model independent assertions for integration of heterogeneous schemas. The VLDB J., 1992, 1(1): 81-126.
Article Google Scholar
Spaccapietra S, Parent C. View integration: A step forward in solving structural conflicts. IEEE Trans. Knowl. Data Eng., 1994, 6(2): 258-274.
Article Google Scholar
Yang X, Procopiuc C, Srivastava D. Summarizing relational databases. Proc. VLDB Endowment, 2009, 2(1): 634-645.
Article Google Scholar
Wang X, Zhou X, Wang S. Summarizing large-scale database schema using community detection. J. Comput. Sci. Technol., 2012, 27(3): 515-526.
Article Google Scholar
Yasir A, Kumara Swamy M, Krishna Reddy P. Exploiting schema and documentation for summarizing relational databases. In Proc. the 1st Int. Conf. Big Data Analytics, Dec. 2012, pp.77-90.
Algergawy A, Schallehn E, Saake G. A schema matchingbased approach to XML schema clustering. In Proc. the 10th Int. Conf. Information Integration and Web-Based Applications Services, Nov. 2008, pp.131-136.
Lee M L, Yang L H, Hsu W, Yang X. XClust: Clustering XML schemas for effective integration. In Proc. the 11th CIKM, Nov. 2002, pp.292-299.
Batini C, Ceri S, Navathe S B. Conceptual Database Design: An Entity-Relationship Approach (1st edition). Benjamin/Cummings Publishing Co., 1992.
Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Comput. Surv., 1999, 31(3): 264-323.
Article Google Scholar
Moody D L, Flitman A R. A decomposition method for entity relationship models: A systems theoretic approach. In Proc. the 1st Int. Conf. Systems Thinking in Management, Nov. 2000, pp.462-469.
Batini C, Di Battista G, Santucci G. Structuring primitives for a dictionary of entity relationship data schemas. IEEE Trans. Software Engineering, 1993, 19(4): 344-365.
Article Google Scholar
Smith K, Mork P, Seligman L et al. The role of schema matching in large enterprises. In Proc. the 4th Biennial Conf. Innovative Data Systems Research, Jan. 2009.
Nayak R, Iryadi W. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems, 2007, 20(4): 336-349.
Article Google Scholar
Banek M, Vrdoljak B, Min Tjoa A, Skocir Z. Automated integration of heterogeneous data warehouse schemas. Int. J. Data Warehousing and Mining, 2008, 4(4): 1-21.
Article Google Scholar
Guerra F, Olaru M O, Vincini M. Mapping and integration of dimensional attributes using clustering techniques. In Proc. the 13th Int. Conf. E-Commerce and Web Technologies, Sept. 2012, pp.38-49.
Mahmoud H A, Aboulnaga A. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In Proc. Int. Conf. Management of Data, Jun. 2010, pp.411-422.
Otham R, Deris S, Illias R, Zakaria Z, Mohamed S. Automatic clustering of gene ontology by genetic algorithm. Int. J. Information Technology, 2006, 3(1): 37-46.
Google Scholar
Hu W, Qu Y, Cheng G. Matching large ontologies: A divide-and-conquer approach. Data & Knowledge Engineering, 2008, 67(1): 140-160.
Article Google Scholar
Zhao Y, Karypis G, Fayyad U. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 2005, 10(2): 141-168.
Article MathSciNet Google Scholar
Bansal N, Blum A, Chawla S. Correlation clustering. Machine Learning, 2004, 56(1/2/3): 89-113.
Bonizzoni P, Della Vedova G, Dondi R, Jiang T. On the approximation of correlation clustering and consensus clustering. J. Comput. Syst. Sci., 2008, 74(5): 671-696.
Article MATH MathSciNet Google Scholar
Charikar M, Guruswami V, Wirth A. Clustering with qualitative information. J. Comput. Syst. Sci., 2005, 71(3): 360-383.
Article MATH MathSciNet Google Scholar
Demaine E, Emanuel D, Fiat A, Immorlica N. Correlation clustering in general weighted graphs. Theoretical Computer Science, 2006, 361(2): 172-187.
Article MATH MathSciNet Google Scholar
Papadimitriou C, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. Dover Publications, 1998.
Ausiello G, Crescenzi P, Gambosi G, Kann V, Marchetti-Spaccamela A, Protasi M. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (1st edition). Springer-Verlag, 1999.
Batini C, Comerio M, Viscusi G. Managing quality of large set of conceptual schemas in public administration: Methods and experiences. In Proc. the 2nd Int. Conf. Model and Data Engineering, Oct. 2012, pp.31-42.

Download references

Author information

Authors and Affiliations

Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, 20126, Italy
Carlo Batini, Paola Bonizzoni, Marco Comerio, Yuri Pirola & Francesco Salandra
Department of Human and Social Sciences, University of Bergamo, Bergamo, 24129, Italy
Riccardo Dondi

Authors

Carlo Batini
View author publications
You can also search for this author in PubMed Google Scholar
Paola Bonizzoni
View author publications
You can also search for this author in PubMed Google Scholar
Marco Comerio
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Dondi
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Pirola
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Salandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Batini.

Additional information

The work was partially supported by the Italian Project PON01 00861 SMART (Services and Meta-services for smART eGovernment) and by the Project (CUP E41l13000220009) SPAC3 (Smart services of the new Public Administration for the Citizen-Centricity in the Cloud) co-financed by the Lombardy region.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batini, C., Bonizzoni, P., Comerio, M. et al. A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas. J. Comput. Sci. Technol. 30, 214–224 (2015). https://doi.org/10.1007/s11390-015-1514-5

Download citation

Received: 19 December 2013
Revised: 29 July 2014
Published: 21 January 2015
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11390-015-1514-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Making data visualization more efficient and effective: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Making data visualization more efficient and effective: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation