Abstract
This paper considers the constrained longest common subsequence problem with an arbitrary set of input strings and an arbitrary set of pattern strings as input. The problem has applications, for example, in computational biology, serving as a measure of similarity among different molecules that are characterized by common putative structures. We develop an exact A\(^*\) search to solve it. Our A\(^*\) search is compared to the only existing competitor from the literature, an Automaton approach. The results show that A\(^*\) is very efficient for real-world benchmarks, finding provenly optimal solutions in run times that are an order of magnitude lower than the ones of the competitor. Even some of the large-scale real-world instances were solved to optimality by A\(^*\) search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adi, S.S.: Repetition-free longest common subsequence. Discr. Appl. Math. 158(12), 1315–1324 (2010)
Blum, C., Blesa, M.J., López-Ibáñez, M.: Beam search for the longest common subsequence problem. Comput. Oper. Res. 36(12), 3178–3186 (2009)
Chowdhury, S.R., Hasan, M., Iqbal, S., Rahman, M.S.: Computing a longest common palindromic subsequence. Fund. Inform. 129(4), 329–340 (2014)
Deorowicz, S.: Bit-parallel algorithm for the constrained longest common subsequence problem. Fund. Inform. 99(4), 409–433 (2010)
Deorowicz, S., Obstój, J.: Constrained longest common subsequence computing algorithms in practice. Comput. Inf. 29(3), 427–445 (2012)
Djukanovic, M., Berger, C., Raidl, G.R., Blum, C.: On solving a generalized constrained longest common subsequence problem. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds.) OPTIMA 2020. LNCS, vol. 12422, pp. 55–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62867-3_5
Djukanovic, M., Berger, C., Raidl, G.R., Blum, C.: An A\(^*\) search algorithm for the constrained longest common subsequence problem. Inf. Process. Lett. 166, 106041 (2021)
Djukanovic, M., Kartelj, A., Matic, D., Grbic, M., Blum, C., Raidl, G.: Graph search and variable neighborhood search for finding constrained longest common subsequences in artificial and real gene sequences. Technical report AC-TR-21-008 (2021)
Djukanovic, M., Raidl, G.R., Blum, C.: A beam search for the longest common subsequence problem guided by a novel approximate expected length calculation. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 154–167. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_14
Djukanovic, M., Raidl, G.R., Blum, C.: Anytime algorithms for the longest common palindromic subsequence problem. Comput. Oper. Res. 114, 104827 (2020)
Djukanovic, M., Raidl, G.R., Blum, C.: Finding longest common subsequences: new anytime A\(^*\) search results. Appl. Soft Comput. 95, 106499 (2020)
Farhana, E., Rahman, M.S.: Constrained sequence analysis algorithms in computational biology. Inf. Sci. 295, 247–257 (2015)
Gotthilf, Z., Hermelin, D., Landau, G.M., Lewenstein, M.: Restricted LCS. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 250–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_26
Gotthilf, Z., Hermelin, D., Lewenstein, M.: Constrained LCS: hardness and approximation. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 255–262. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69068-9_24
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
Jiang, T., Lin, G., Ma, B., Zhang, K.: The longest common subsequence problem for arc-annotated sequences. J. Discrete Algorithms 2(2), 257–270 (2004)
Li, Y., Wang, Y., Zhang, Z., Wang, Y., Ma, D., Huang, J.: A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments. In: Proceedings of the 32nd International Conference on Data Engineering, ICDE 2019, pp. 1170–1181 (2016)
Liu, W., Chen, L.: A fast longest common subsequence algorithm for biosequences alignment. In: Li, D. (ed.) CCTA 2007. TIFIP, vol. 258, pp. 61–69. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-77251-6_8
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)
Martínez-Porchas, M., Vargas-Albores, F.: An efficient strategy using k-mers to analyse 16s rRNA sequences. Heliyon 3(7), e00370 (2017)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis, 2nd edn. Cold Spring Harbour Laboratory Press, Cold Spring Harbour (2004)
Tang, C.Y.: Constrained multiple sequence alignment tool development and its application to RNase family alignment. J. Bioinf. Comput. Biol. 01(02), 267–287 (2003)
Tsai, Y.-T.: The constrained longest common subsequence problem. Inf. Process. Lett. 88(4), 173–176 (2003)
Wang, Q., Korkin, D., Shang, Y.: A fast multiple longest common subsequence (MLCS) algorithm. IEEE Trans. Knowl. Data Eng. 23(3), 321–334 (2011)
Acknowledgements
Christian Blum was funded by project CI-SUSTAIN of the Spanish Ministry of Science and Innovation (PID2019-104156GB-I00). Dragan Matić is partially supported by Ministry for Scientific and Technological Development, Higher Education and Information Society, Government of Republic of Srpska, B&H under the Project “Development of artificial intelligence methods for solving computer biology problems”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Djukanovic, M., Matic, D., Blum, C., Kartelj, A. (2022). Application of A\(^*\) to the Generalized Constrained Longest Common Subsequence Problem with Many Pattern Strings. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13364. Springer, Cham. https://doi.org/10.1007/978-3-031-09282-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-09282-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09281-7
Online ISBN: 978-3-031-09282-4
eBook Packages: Computer ScienceComputer Science (R0)