Abstract
In various research fields a common task is to summarize the information shared by a collection of objects and to find a consensus of them. In many scenarios, the object items for which a consensus needs to be determined are rankings, and the process is called rank aggregation. Common applications are electoral processes, meta-search engines, document classification, selecting documents based on multiple criteria, and many others. This paper is focused on a particular application of such aggregation schemes, that of finding motifs or common patterns in a set of given DNA sequences. Among the conditions that a string should satisfy to be accepted as consensus, are the median string and closest string. These approaches have been intensively studied separately, but only recently, the work of [1] tries to combine both problems: to solve the consensus string problem by minimizing both distance sum and radius.
The aim of this paper is to investigate the consensus string in the rank distance paradigm. Theoretical results show that it is not possible to identify a consensus string via rank distance for three or more strings. Thus, an efficient genetic algorithm is proposed to find the optimal consensus string. To show an application for the studied problem, this work also exhibits a clustering algorithm based on consensus string, that builds a hierarchy of clusters based on distance connectivity. Experiments on DNA comparison are presented to show the efficiency of the proposed genetic algorithm for consensus string. Phylogenetic experiments were also conducted to show the utility of the proposed clustering method. In conclusion, the consensus string is indeed an interesting problem with many practical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Landau, G.M., Na, J.C., Park, H., Park, K., Sim, J.S.: Consensus optimizing both distance sum and radius. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 234–242. Springer, Heidelberg (2009)
Chimani, M., Woste, M., Bocker, S.: A closer look at the closest string and closest substring problem. In: Proceedings of ALENEX, pp. 13–24 (2011)
Diaconis, P., Graham, R.L.: Spearman footrule as a measure of disarray. Journal of Royal Statistical Society. Series B (Methodological) 39(2), 262–268 (1977)
Dinu, L.P.: On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Informaticae 55(1), 39–50 (2003)
Dinu, L.P., Ionescu, R.-T.: Clustering Based on Rank Distance with Applications on DNA. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part V. LNCS, vol. 7667, pp. 722–729. Springer, Heidelberg (2012)
Dinu, L.P., Ionescu, R.T.: Clustering Methods Based on Closest String via Rank Distance. In: Proceedings of SYNASC, pp. 207–214 (2012)
Dinu, L.P., Ionescu, R.T.: An efficient rank based approach for closest string and closest substring. PLoS ONEÂ 7(6), 37576 (2012)
Dinu, L.P., Manea, F.: An efficient approach for the rank aggregation problem. Theoretical Computer Science 359(1-3), 455–461 (2006)
Dinu, L.P., Popa, A.: On the closest string via rank distance. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 413–426. Springer, Heidelberg (2012)
Dinu, L.P., Sgarro, A.: A Low-complexity Distance for DNA Strings. Fundamenta Informaticae 73(3), 361–372 (2006)
Dinu, L.P., Sgarro, A.: Estimating Similarities in DNA Strings Using the Efficacious Rank Distance Approach, Systems and Computational Biology – Bioinformatics and Computational Modeling. InTech (2011)
Frances, M., Litman, A.: On covering problems of codes. Theory of Computing Systems 30(2), 113–119 (1997)
Koonin, E.V.: The emerging paradigm and open problems in comparative genomics. Bioinformatics 15, 265–266 (1999)
Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding consensus and optimal alignment of circular strings. Theoretical Computer Science 468, 92–101 (2013)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)
Liew, A.W., Yan, H., Yang, M.: Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern recognition 38(11), 2055–2073 (2005)
Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 315–327. Springer, Heidelberg (2003)
Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. Journal of Discrete Algorithms 3, 390–415 (2005)
Popov, Y.V.: Multiple genome rearrangement by swaps and by element duplications. Theoretical Computer Science 385(1-3), 115–126 (2007)
Reyes, A., Gissi, C., Pesole, G., Catzeflis, F.M., Saccone, C.: Where Do Rodents Fit? Evidence from the Complete Mitochondrial Genome of Sciurus vulgaris. Molecular Biology Evolution 17(6), 979–983 (2000)
States, D.J., Agarwal, P.: Compact encoding strategies for dna sequence similarity search. In: Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, pp. 211–217 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Dinu, L.P., Ionescu, R.T. (2013). An Efficient Algorithm for Rank Distance Consensus. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds) AI*IA 2013: Advances in Artificial Intelligence. AI*IA 2013. Lecture Notes in Computer Science(), vol 8249. Springer, Cham. https://doi.org/10.1007/978-3-319-03524-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-03524-6_43
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03523-9
Online ISBN: 978-3-319-03524-6
eBook Packages: Computer ScienceComputer Science (R0)