Clustering based on median and closest string via rank distance with applications on DNA

Dinu, Liviu P.; Ionescu, Radu Tudor

doi:10.1007/s00521-013-1468-x

Clustering based on median and closest string via rank distance with applications on DNA

ICONIP 2012
Published: 03 August 2013

Volume 24, pages 77–84, (2014)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Liviu P. Dinu¹ &
Radu Tudor Ionescu¹

342 Accesses
13 Citations
Explore all metrics

Abstract

This paper aims to present several clustering methods based on rank distance. Rank distance has applications in many different fields such as computational linguistics, biology and computer science. The K-means algorithm represents each cluster by a single mean vector. The mean vector is computed with respect to a distance measure. Two K-means algorithms based on rank distance are described in this paper. Hierarchical clustering builds models based on distance connectivity. This paper describes two hierarchical clustering techniques that use rank distance. Experiments using mitochondrial DNA sequences extracted from several mammals are performed to compare the results of the clustering methods. Results demonstrate the clustering performance and the utility of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

References

Chimani M, Woste M, Bocker S (2011) A closer look at the closest string and closest substring problem. In: Proceedings of ALENEX, pp 13–24
de la Higuera C, Casacuberta F (2000) Topology of strings: median string is np-complete. Theor Comput Sci 230:39–48
Article MATH Google Scholar
Diaconis P, Graham RL (1977) Spearman footrule as a measure of disarray. J R Stat Soc Ser B (Methodological) 39(2):262–268
MATH MathSciNet Google Scholar
Dinu LP (2003) On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Informaticae 55(1):39–50
MATH MathSciNet Google Scholar
Dinu A, Dinu LP (2005) On the syllabic similarities of romance languages. In: Proceedings of CICLing 3406, pp 785–788
Dinu LP, Ionescu RT (2012) An efficient rank based approach for closest string and closest substring. PLoS One 7(6):e37576
Article Google Scholar
Dinu LP, Ionescu RT (2012a) Clustering based on rank distance with applications on DNA. In: Proceedings of ICONIP 7667
Dinu LP, Ionescu RT (2012b) Clustering methods based on closest string via rank distance. In: Proceedings of SYNASC, pp 207–214
Dinu LP, Manea F (2006) An efficient approach for the rank aggregation problem. Theor Comput Sci 359(1–3):455–461
Article MATH MathSciNet Google Scholar
Dinu LP, Popa A (2012) On the closest string via rank distance. In: Proceedings of CPM 7354, pp 413–426
Dinu LP, Sgarro A (2006) A low-complexity distance for DNA strings. Fundamenta Informaticae 73(3):361–372
MATH MathSciNet Google Scholar
Frances M, Litman A (1997) On covering problems of codes. Theory Comput Syst 30(2):113–119
MATH MathSciNet Google Scholar
Huang Z (1998) Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Kailing K, Kriegel HP, Kroger P (2004) Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM international conference on data mining
Koonin EV (1999) The emerging paradigm and open problems in comparative genomics. Bioinformatics 15:265–266
Article Google Scholar
Lanctot KJ, Li M, Ma B, Wang S, Zhang L (2003) Distinguishing string selection problems. Inf Comput 185(1):41–55
Article MATH MathSciNet Google Scholar
Li M, Chen X, Li X, Ma B, Vitanyi PMB (2004) The similarity metric. IEEE Trans Inf Theory 50(12):3250–3264
Article MathSciNet Google Scholar
Liew AW, Yan H, Yang M (2005) Pattern recognition techniques for the emerging field of bioinformatics: a review. Pattern Recognit 38(11):2055–2073
Article Google Scholar
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of ACM SIGKDD, pp 169–178
Nicolas F, Rivals E (2003) Complexities of centre and median string 2676:315–327
Google Scholar
Nicolas F, Rivals E (2005) Hardness results for the center and median string problems under the weighted and unweighted edit distances. J Discret Algorithms 3(2–4):390–415
MATH MathSciNet Google Scholar
Palmer J, Herbon L (1988) Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evolut 28:87–89
Article Google Scholar
Popov YV (2007) Multiple genome rearrangement by swaps and by element duplications. Theor Comput Sci 385(1–3):115–126
Article MATH Google Scholar
Reyes A, Gissi C, Pesole G, Catzeflis FM, Saccone C (2000) Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris. Mol Biol Evol 17(6):979–983
Article Google Scholar
Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell PAMI 6(1):81–87
Google Scholar
Smith T, Waterman M (1981) Comparison of biosequences. Adv Appl Math 2(4):482–489
Article MATH MathSciNet Google Scholar
States DJ, Agarwal P (1996) Compact encoding strategies for DNA sequence similarity search. In: Proceedings of the 4th international conference on intelligent systems for molecular biology, pp 211–217
Tian TZ, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114
Article Google Scholar
Wooley JC (1999) Trends in computational biology: a summary based on a recomb plenary lecture. J Comput Biol 6:459–474
Article Google Scholar
Yin C, Zhao X, Mu S, Tian S (2013) A fast multiclass classification algorithm based on cooperative clustering. Neural Process Lett 1–14. doi:10.1007/s11063-013-9278-9

Download references

Acknowledgments

The contribution of the authors to this paper is equal. Authors thank anonymous reviewers for helpful comments. The research of Liviu P. Dinu was supported by a grant of the Romanian National Authority for Scientific Research, CNCS UEFISCDI, project number PN-II-ID-PCE-2011-3-0959. Radu Tudor Ionescu thanks his Ph.D. supervisor Denis Enachescu from the University of Bucharest, for helpful discussions.

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, University of Bucharest, No. 14 Academiei Street, Bucharest, Romania
Liviu P. Dinu & Radu Tudor Ionescu

Authors

Liviu P. Dinu
View author publications
You can also search for this author in PubMed Google Scholar
Radu Tudor Ionescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Tudor Ionescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dinu, L.P., Ionescu, R.T. Clustering based on median and closest string via rank distance with applications on DNA. Neural Comput & Applic 24, 77–84 (2014). https://doi.org/10.1007/s00521-013-1468-x

Download citation

Received: 05 March 2013
Accepted: 15 July 2013
Published: 03 August 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s00521-013-1468-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering based on median and closest string via rank distance with applications on DNA

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering based on median and closest string via rank distance with applications on DNA

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation