Skip to main content

On the Hardness of Wildcard Pattern Matching on de Bruijn Graphs

  • Conference paper
  • First Online:
Computational Advances in Bio and Medical Sciences (ICCABS 2023)

Abstract

In the pattern matching on labeled graphs problem, given an edge labeled graph \(G = (V, E)\) and a string P, one seeks to identify if there exists a walk in the graph whose concatenation of edge labels (approximately) matches P. This is an elementary subproblem for utilizing genome graphs to represent collections of genetic sequences where patterns arise as reads in the sequencing data. Unfortunately, for general graphs, it is known that an algorithm running in \(O(|E||P|^{1-\varepsilon } + |E|^{1-\varepsilon }|P|)\) time for constant \(\varepsilon > 0\) is not possible under the Strong Exponential Time Hypothesis (SETH). De Bruijn graphs provide a valuable exception, allowing for a path exactly matching a pattern to be found in \(O(|E| + |P|)\) for constant-sized alphabets. This property has led de Bruijn graphs to be applied as indexes in the popular tool vg-toolkit. In this work, we consider the case where wildcards (that match with any edge label) are included in the pattern, and the graph is a de Bruijn graph. We demonstrate that adding these wildcards to the pattern is enough to again prove quadratic lower bounds conditioned on SETH for pattern matching on de Bruijn graphs, even when restricted to alphabets of size at most three and k-mer length \(\varTheta (\log |V|)\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abrahamson, K.R.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Lewenstein, M., Lewenstein, N.: Pattern matching in hypertext. J. Algorithms 35(1), 82–99 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Reif, J.H. (ed.) Proceedings on 34th Annual ACM Symposium on Theory of Computing, 19–21 May 2002, Montréal, Québec, Canada, pp. 592–601. ACM (2002)

    Google Scholar 

  5. Darbari, P., Gibney, D., Thankachan, S.V.: Quantum time complexity and algorithms for pattern matching on labeled graphs. In: Arroyuelo, D., Poblete, B. (eds.) String Processing and Information Retrieval - 29th International Symposium, SPIRE 2022, Concepción, Chile, 8–10 November 2022, Proceedings. Lecture Notes in Computer Science, vol. 13617, pp. 303–314. Springer (2022)

    Google Scholar 

  6. Equi, M., Mäkinen, V., Tomescu, A.I., Grossi, R.: On the complexity of string matching for graphs. ACM Trans. Algorithms 19(3), 21:1–21:25 (2023)

    Google Scholar 

  7. Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: A framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gibney, D., Hoppenworth, G., Thankachan, S.V.: Simple reductions from formula-sat to pattern matching on labeled graphs and subtree isomorphism. In: Le, H.V., King, V. (eds.) 4th Symposium on Simplicity in Algorithms, SOSA 2021, Virtual Conference, 11–12 January 2021, pp. 232–242. SIAM (2021)

    Google Scholar 

  9. Gibney, D., Thankachan, S.V., Aluru, S.: The complexity of approximate pattern matching on de Bruijn graphs. In: Pe’er, I. (ed.) Research in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, 22–25 May 2022, Proceedings. Lecture Notes in Computer Science, vol. 13278, pp. 263–278. Springer (2022)

    Google Scholar 

  10. Navarro, G.: Improved approximate pattern matching on hypertext. Theor. Comput. Sci. 237(1–2), 455–463 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Sirén, J.: Indexing variation graphs. In: Fekete, S.P., Ramachandran, V. (eds.) Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2017, Barcelona, Spain, Hotel Porta Fira, 17–18 January 2017, pp. 13–27. SIAM (2017)

    Google Scholar 

  12. Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

S. Thankachan is partially supported by the U.S. National Science Foundation (NSF) award CCF-2316691.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Gibney .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ganguly, A., Gibney, D., Das, A.K., Thankachan, S.V. (2025). On the Hardness of Wildcard Pattern Matching on de Bruijn Graphs. In: Bansal, M.S., et al. Computational Advances in Bio and Medical Sciences. ICCABS 2023. Lecture Notes in Computer Science(), vol 14548. Springer, Cham. https://doi.org/10.1007/978-3-031-82768-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-82768-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-82767-9

  • Online ISBN: 978-3-031-82768-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics