skip to main content
research-article

Sparse Dynamic Programming on DAGs with Small Width

Published: 06 February 2019 Publication History

Abstract

The minimum path cover problem asks us to find a minimum-cardinality set of paths that cover all the nodes of a directed acyclic graph (DAG). We study the case when the size k of a minimum path cover is small, that is, when the DAG has a small width. This case is motivated by applications in pan-genomics, where the genomic variation of a population is expressed as a DAG. We observe that classical alignment algorithms exploiting sparse dynamic programming can be extended to the sequence-against-DAG case by mimicking the algorithm for sequences on each path of a minimum path cover and handling an evaluation order anomaly with reachability queries.
Namely, we introduce a general framework for DAG-extensions of sparse dynamic programming. This framework produces algorithms that are slower than their counterparts on sequences only by a factor k. We illustrate this on two classical problems extended to DAGs: longest increasing subsequence and longest common subsequence. For the former, we obtain an algorithm with running time O(k|E|log |V|). This matches the optimal solution to the classical problem variant when the input sequence is modeled as a path. We obtain an analogous result for the longest common subsequence problem. We then apply this technique to the co-linear chaining problem, which is a generalization of the above two problems. The algorithm for this problem turns out to be more involved, needing further ingredients, such as an FM-index tailored for large alphabets and a two-dimensional range search tree modified to support range maximum queries. We also study a general sequence-to-DAG alignment formulation that allows affine gap costs in the sequence.
The main ingredient of the proposed framework is a new algorithm for finding a minimum path cover of a DAG (V,E) in O(k|E|log |V|) time, improving all known time-bounds when k is small and the DAG is not too dense. In addition to boosting the sparse dynamic programming framework, an immediate consequence of this new minimum path cover algorithm is an improved space/time tradeoff for reachability queries in arbitrary directed graphs.

References

[1]
Mohamed Ibrahim Abouelhoda. 2007. A chaining algorithm for mapping cDNA sequences to multiple genomic sequences. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval (LNCS), Vol. 4726. Springer, 1--13.
[2]
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., Upper Saddle River, NJ.
[3]
Amihood Amir, Moshe Lewenstein, and Noa Lewenstein. 2000. Pattern matching in hypertext. J. Algor. 35, 1 (2000), 82--99.
[4]
Arturs Backurs and Piotr Indyk. 2015. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC’15). ACM, 51--58.
[5]
Djamal Belazzougui. 2014. Linear time construction of compressed text indices in compact space. In Proceedings of the Symposium on Theory of Computing (STOC’14). ACM, 148--193.
[6]
Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, and Veli Mäkinen. 2013. Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In Proceedings of the 21st Annual European Symposium on Algorithms (ESA’13) (LNCS), Vol. 8125. Springer, 133--144.
[7]
Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Computational Geometry: Algorithms and Applications (3rd ed.). Springer-Verlag TELOS, Santa Clara, CA.
[8]
Y. Chen and Y. Chen. 2008. An efficient algorithm for answering graph reachability queries. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE, 893--902.
[9]
Y. Chen and Y. Chen. 2014. On the graph decomposition. In Proceedings of the IEEE Fourth International Conference on Big Data and Cloud Computing. IEEE, 777--784.
[10]
Deanna M. Church, Valerie A. Schneider, Karyn Meltz Steinberg, Michael C. Schatz, Aaron R. Quinlan, Chen-Shan Chin, Paul A. Kitts, Bronwen Aken, Gabor T. Marth, Michael M. Hoffman, et al. 2015. Extending reference assembly models. Genome Biol. 16, 1 (2015), 13.
[11]
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5 (2003), 1338--1355.
[12]
Maxime Crochemore and Ely Porat. 2010. Fast computation of a longest increasing subsequence and application. Info. Comput. 208, 9 (2010), 1054--1059.
[13]
David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. 1992. Sparse dynamic programming I: Linear cost functions. J. ACM 39, 3 (July 1992), 519--545.
[14]
David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. 1992. Sparse dynamic programming II: Convex and concave cost functions. J. ACM 39, 3 (1992), 546--567.
[15]
Stefan Felsner, Vijay Raghavan, and Jeremy Spinrad. 2003. Recognition algorithms for orders of small width and graphs of small Dilworth number. Order 20, 4 (Nov. 2003), 351--364.
[16]
Paolo Ferragina and Giovanni Manzini. 2005. Indexing compressed text. J. ACM 52, 4 (July 2005), 552--581.
[17]
Michael L. Fredman. 1975. On computing the length of longest increasing subsequences. Discrete Math. 11, 1 (1975), 29--35.
[18]
D. R. Fulkerson. 1956. Note on Dilworth’s decomposition theorem for partially ordered sets. Proc. Amer. Math. Soc. 7, 4 (1956), 701--702.
[19]
Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. 1984. Scaling and related techniques for geometry problems. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC’84). ACM, New York, NY, 135--143.
[20]
Erik Garrison, Jouni Sirén, Adam M. Novak, Glenn Hickey, Jordan M. Eizenga, Eric T. Dawson, William Jones, Shilpa Garg, Charles Markello, Michael F. Lin, Benedict Paten, and Richard Durbin. 2018. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnol. 36 (Aug. 2018), 875.
[21]
Steffen Heber, Max Alekseyev, Sing-Hoi Sze, Haixu Tang, and Pavel A. Pevzner. 2002. Splicing graphs and EST assembly problem. Bioinformatics 18, Suppl. 1 (2002), S181--S188.
[22]
Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. 2009. Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput. 38, 6 (2009), 2162--2178.
[23]
John E. Hopcroft and Richard M. Karp. 1973. An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2, 4 (1973), 225--231.
[24]
H. V. Jagadish. 1990. A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15, 4 (Dec. 1990), 558--598.
[25]
Ruoming Jin, Ning Ruan, Yang Xiang, and Haixun Wang. 2011. Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36, 1 (Mar. 2011), 7:1--7:44.
[26]
Anna Kuosmanen, Topi Paavilainen, Travis Gagie, Rayan Chikhi, Alexandru I. Tomescu, and Veli Mäkinen. 2018. Using minimum path cover to boost dynamic programming on DAGs: Co-linear chaining extended. In Proceedings of the 22nd Annual International Conference on Research in Computational Molecular Biology (RECOMB’18) (Lecture Notes in Computer Science), Vol. 10812. Springer, 105--121.
[27]
Antoine Limasset, Bastien Cazaux, Eric Rivals, and Pierre Peterlongo. 2016. Read mapping on de Bruijn graphs. BMC Bioinform. 17, 1 (2016), 237.
[28]
Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. 2015. Genome-Scale Algorithm Design. Cambridge University Press.
[29]
Veli Mäkinen, Leena Salmela, and Johannes Ylinen. 2012. Normalized N50 assembly metric using gap-restricted co-linear chaining. BMC Bioinform. 13 (2012), 255.
[30]
U. Manber and S. Wu. 1992. Approximate string matching with arbitrary costs for text and hypertext. In Proceedings of the IAPR Workshop on Structural and Syntactic Pattern Recognition. 22--33.
[31]
Tobias Marschall et al. 2018. Computational pan-genomics: Status, promises and challenges. Brief. Bioinform. 19, 1 (2018), 118--135.
[32]
Gene Myers and Webb Miller. 1995. Chaining multiple-alignment fragments in sub-quadratic time. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 38--47. http://dl.acm.org/citation.cfm?id=313651.313661.
[33]
Gonzalo Navarro. 2000. Improved approximate pattern matching on hypertext. Theor. Comput. Sci. 237, 1--2 (2000), 455--463.
[34]
Adam M. Novak, Erik Garrison, and Benedict Paten. 2016. A graph extension of the positional Burrows-Wheeler transform and its applications. In Proceedings of the International Workshop on Algorithms in Bioinformatics (LNCS), Vol. 9838. Springer, 246--256.
[35]
S. C. Ntafos and S. Louis Hakimi. 1979. On path cover problems in digraphs and applications to program testing. IEEE Trans. Softw. Eng. 5, 5 (1979), 520--529.
[36]
James B. Orlin. 2013. Max flows in O(nm) time, or better. In Proceedings of the 45th Annual ACM Symposium on the Theory of Computing (STOC’13). ACM, New York, NY, 765--774.
[37]
Kunsoo Park and Dong Kyue Kim. 1995. String matching in hypertext. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching (CPM’95) (LNCS), Vol. 937. Springer, 318--329.
[38]
Rob Patro, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods 14, 4 (Apr. 2017), 417--419.
[39]
Mikko Rautiainen and Tobias Marschall. 2017. Aligning sequences to general graphs in O(V+mE) time, Mikko Rautiainen and Tobias Marschall (Eds.). bioRxiv 216127.
[40]
Romeo Rizzi, Alexandru I. Tomescu, and Veli Mäkinen. 2014. On the complexity of minimum path cover with subpath constraints for multi-assembly. BMC Bioinform. 15, S-9 (2014), S5.
[41]
Claus-Peter Schnorr. 1978. An algorithm for transitive closure with linear expected time. SIAM J. Comput. 7, 2 (1978), 127--133.
[42]
Tetsuo Shibuya and Igor Kurochkin. 2003. Match chaining algorithms for cDNA mapping. In Proceedings of the Workshop on Algorithms in Bioinformatics (WABI’03) (LNCS), Vol. 2812. Springer, 462--475.
[43]
Jouni Sirén. 2017. Indexing variation graphs. In Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX’17). SIAM, 13--27.
[44]
Jouni Sirén, Niko Välimäki, and Veli Mäkinen. 2014. Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinform. 11, 2 (2014), 375--388.
[45]
J. Su, Q. Zhu, H. Wei, and J. X. Yu. 2017. Reachability querying: Can it be even faster? IEEE Trans. Knowl. Data Eng. 29, 3 (Mar. 2017), 683--697.
[46]
Alexandru I. Tomescu, Travis Gagie, Alexandru Popa, Romeo Rizzi, Anna Kuosmanen, and Veli Mäkinen. 2015. Explaining a weighted DAG with few paths for solving genome-guided multi-assembly. IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 6 (2015), 1345--1354.
[47]
Raluca Uricaru, Célia Michotey, Hélène Chiapello, and Eric Rivals. 2015. YOC, A new strategy for pairwise alignment of collinear genomes. BMC Bioinform. 16, 1 (Apr. 2015), 111.
[48]
Peter van Emde Boas. 1977. Preserving order in a forest in less than logarithmic time and linear space. Info. Process. Lett. 6, 3 (1977), 80--82.
[49]
Peter van Emde Boas, R. Kaas, and E. Zijlstra. 1977. Design and implementation of an efficient priority queue. Math. Syst. Theory 10 (1977), 99--127.
[50]
Vijay V. Vazirani. 2001. Approximation Algorithms. Springer-Verlag.
[51]
Michaël Vyverman, Bernard De Baets, Veerle Fack, and Peter Dawyndt. 2015. A long fragment aligner called ALFALFA. BMC Bioinform. 16, 1 (May 2015), 159.
[52]
Michaël Vyverman, Dieter De Smedt, Yao-Cheng Lin, Lieven Sterck, Bernard De Baets, Veerle Fack, and Peter Dawyndt. 2014. Fast and accurate cDNA mapping and splice site identification. In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC’14). SCITEPRESS, 233--238. Retrieved from http://hdl.handle.net/1854/LU-6851320.
[53]
Sebastian Wandelt and Ulf Leser. 2014. RRCA: Ultra-fast multiple in-species genome alignments. In Proceedings of trhe 1st International Conference on Algorithms for Computational Biology (AlCoB’14) (LNCS), Vol. 8542. Springer, 247--261.
[54]
Hilmi Yildirim, Vineet Chaoji, and Mohammed J. Zaki. 2010. GRAIL: Scalable reachability index for large graphs. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 276--284.

Cited By

View all
  • (2024)Maximum-scoring path sets on pangenome graphs of constant treewidthFrontiers in Bioinformatics10.3389/fbinf.2024.13910864Online publication date: 1-Jul-2024
  • (2024)Co-linear chaining on pangenome graphsAlgorithms for Molecular Biology10.1186/s13015-024-00250-w19:1Online publication date: 27-Jan-2024
  • (2024)Max-Min Diversification with Asymmetric DistancesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671757(1440-1450)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 15, Issue 2
Special Issue on Soda'17 and Regular Papers
April 2019
407 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/3292530
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2019
Accepted: 01 December 2018
Revised: 01 November 2018
Received: 01 May 2018
Published in TALG Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pattern matching
  2. co-linear chaining
  3. longest common subsequence
  4. pan-genomics

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)8
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Maximum-scoring path sets on pangenome graphs of constant treewidthFrontiers in Bioinformatics10.3389/fbinf.2024.13910864Online publication date: 1-Jul-2024
  • (2024)Co-linear chaining on pangenome graphsAlgorithms for Molecular Biology10.1186/s13015-024-00250-w19:1Online publication date: 27-Jan-2024
  • (2024)Max-Min Diversification with Asymmetric DistancesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671757(1440-1450)Online publication date: 25-Aug-2024
  • (2024)Haplotype-aware sequence alignment to pangenome graphsGenome Research10.1101/gr.279143.12434:9(1265-1275)Online publication date: 16-Jul-2024
  • (2024)Label-guided seed-chain-extend alignment on annotated De Bruijn graphsBioinformatics10.1093/bioinformatics/btae22640:Supplement_1(i337-i346)Online publication date: 28-Jun-2024
  • (2024)Elastic founder graphs improved and enhancedTheoretical Computer Science10.1016/j.tcs.2023.114269982(114269)Online publication date: Jan-2024
  • (2024)Haplotype-Aware Sequence Alignment to Pangenome GraphsResearch in Computational Molecular Biology10.1007/978-1-0716-3989-4_36(381-384)Online publication date: 29-Apr-2024
  • (2023)Chaining for accurate alignment of erroneous long reads to acyclic variation graphsBioinformatics10.1093/bioinformatics/btad46039:8Online publication date: 26-Jul-2023
  • (2023)Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome GraphsJournal of Computational Biology10.1089/cmb.2023.018630:11(1182-1197)Online publication date: 1-Nov-2023
  • (2023)Chaining of Maximal Exact Matches in GraphsString Processing and Information Retrieval10.1007/978-3-031-43980-3_29(353-366)Online publication date: 20-Sep-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media