Research on deduplication method of multiple relations based on hierarchical clustering algorithm
by Ying Wang; Weiwei Cheng; Chang Liu
International Journal of Information and Communication Technology (IJICT), Vol. 22, No. 2, 2023

Abstract: In order to overcome the problems of low efficiency and large error in traditional data deduplication methods, a multi relational data deduplication method based on hierarchical clustering algorithm is proposed. According to the inter class relationship information of duplicate data, different types of closely related class clusters are merged. Through hierarchical clustering algorithm, all the duplicate data are clustered according to the data similarity. After finding the similar class in the first level index, the super eigenvalue is used to complete the detection of multi relationship duplicate data. According to the specific situation at that time, the detected duplicate data is deleted by automatic, semi-automatic or manual methods. Experimental results show that the method has low error rate and good deletion effect, and improves the efficiency of multi relational data deduplication, with the highest deletion rate of 99%.

Online publication date: Thu, 02-Feb-2023

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Information and Communication Technology (IJICT):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com