skip to main content
research-article

Pattern Matching with Variables: Efficient Algorithms and Complexity Results

Published: 11 February 2020 Publication History

Abstract

A pattern ɑ (i.e., a string of variables and terminals) matches a word w, if w can be obtained by uniformly replacing the variables of ɑ by terminal words. The respective matching problem, i.e., deciding whether or not a given pattern matches a given word, is generally NP-complete, but can be solved in polynomial-time for restricted classes of patterns. We present efficient algorithms for the matching problem with respect to patterns with a bounded number of repeated variables and patterns with a structural restriction on the order of variables. Furthermore, we show that it is NP-complete to decide, for a given number k and a word w, whether w can be factorised into k distinct factors. As an immediate consequence of this hardness result, the injective version (i.e., different variables are replaced by different words) of the matching problem is NP-complete even for very restricted classes of patterns.

References

[1]
Amihood Amir and Igor Nor. 2007. Generalized function matching. J. Discrete Algor. 5, 3 (2007), 514--523.
[2]
Dana Angluin. 1980. Finding patterns common to a set of strings. J. Comput. System Sci. 21 (1980), 46--62.
[3]
Brenda S. Baker. 1996. Parameterized pattern matching: Algorithms and applications. J. Comput. System Sci. 52 (1996), 28--42.
[4]
Hideo Bannai, Travis Gagie, Shunsuke Inenaga, Juha Kärkkäinen, Dominik Kempa, Marcin Piatkowski, and Shiho Sugimoto. 2018. Diverse palindromic factorization is NP-complete. Int. J. Found. Comput. Sci. 29, 2 (2018), 143--164.
[5]
Pablo Barceló, Leonid Libkin, Anthony W. Lin, and Peter T. Wood. 2012. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37 (2012).
[6]
Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. 2003. A formal study of practical regular expressions. Int. J. Found. Comput. Sci. 14 (2003), 1007--1018.
[7]
Raphaël Clifford, Aram Wettroth Harrow, Alexandru Popa, and Benjamin Sach. 2009. Generalised matching. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE’09). 295--301.
[8]
Anne Condon, Ján Manuch, and Chris Thachuk. 2008. Complexity of a collision-aware string partition problem and its relation to oligo design for gene synthesis. In Proceedings of the 14th Annual International Conference on Computing and Combinatorics (COCOON’08). 265--275.
[9]
Anne Condon, Ján Manuch, and Chris Thachuk. 2015. The complexity of string partitioning. J. Discrete Algor. 32 (2015), 24--43.
[10]
Maxime Crochemore. 1981. An optimal algorithm for computing the repetitions in a word. Inform. Process. Lett. 12, 5 (1981), 244--250.
[11]
Maxime Crochemore and Wojciech Rytter. 1991. Usefulness of the Karp-Miller-Rosenberg algorithm in parallel computations on strings and arrays. Theoret. Comput. Sci. 88, 1 (1991), 59--82.
[12]
Maxime Crochemore and Wojciech Rytter. 1995. Squares, cubes, and time-space efficient string searching. Algorithmica 13, 5 (1995), 405--425.
[13]
Thomas Erlebach, Peter Rossmanith, Hans Stadtherr, Angelika Steger, and Thomas Zeugmann. 2001. Learning one-variable pattern languages very efficiently on average, in parallel, and by asking queries. Theoret. Comput. Sci. 261 (2001), 119--156.
[14]
Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. 2015. Pattern matching with variables: Fast algorithms and new hardness results. In Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science (STACS’15). 302--315.
[15]
Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. 2018. Revisiting Shinohara’s algorithm for computing descriptive patterns. Theoret. Comput. Sci. 733 (2018), 44--54.
[16]
Henning Fernau and Markus L. Schmid. 2015. Pattern matching with variables: A multivariate complexity analysis. Info. Comput. 242 (2015), 287--305.
[17]
Henning Fernau, Markus L. Schmid, and Yngve Villanger. 2015. On the parameterised complexity of string morphism problems. Theory Comput. Syst. (2015).
[18]
Dominik D. Freydenberger. 2013. Extended regular expressions: Succinctness and decidability. Theory Comput. Syst. 53 (2013), 159--193.
[19]
Jeffrey E. F. Friedl. 2006. Mastering Regular Expressions (3rd ed.). O’Reilly, Sebastopol, CA.
[20]
Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 8 Co., New York, NY.
[21]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY.
[22]
Oscar H. Ibarra, Ting-Chuen Pong, and Stephen M. Sohn. 1995. A note on parsing pattern languages. Pattern Recogn. Lett. 16 (1995), 179--182.
[23]
Juhani Karhumäki, Wojciech Plandowski, and Filippo Mignosi. 2000. The expressibility of languages and relations by word equations. J. ACM 47 (2000), 483--505.
[24]
Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6 (2006), 918--936.
[25]
Michael J. Kearns and Leonard Pitt. 1989. A polynomial-time algorithm for learning k-variable pattern languages from examples. In Proceedings of the 2nd Annual Workshop on Computational Learning Theory (COLT’89). 57--71.
[26]
Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. 2012. Efficient data structures for the factor periodicity problem. In Proceedings of the 19th International Symposium on String Processing and Information Retrieval (SPIRE’12). 284--294.
[27]
Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. 2015. Internal pattern matching queries in a text and applications. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’15). 532--551.
[28]
Dmitry Kosolobov, Florin Manea, and Dirk Nowotka. 2017. Detecting one-variable patterns. In Proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE’17). 254--270.
[29]
M. Lothaire. 1997. Combinatorics on Words. Cambridge University Press.
[30]
M. Lothaire. 2002. Algebraic Combinatorics on Words. Cambridge University Press, Cambridge/New York.
[31]
Alexandru Mateescu and Arto Salomaa. 1994. Finite degrees of ambiguity in pattern languages. RAIRO Informatique Théoretique et Applications 28 (1994), 233--253.
[32]
Yen K. Ng and Takeshi Shinohara. 2008. Developments from enquiries into the learnability of the pattern languages from positive data. Theoret. Comput. Sci. 397 (2008), 150--165.
[33]
Sebastian Ordyniak and Alexandru Popa. 2016. A parameterized study of maximum generalized pattern matching problems. Algorithmica 75 (2016), 1--26.
[34]
Daniel Reidenbach. 2008. Discontinuities in pattern inference. Theoret. Comput. Sci. 397 (2008), 166--193.
[35]
Daniel Reidenbach and Markus L. Schmid. 2010. A polynomial time match test for large classes of extended regular expressions. In Proceedings of the 15th International Conference on Implementation and Application of Automata (CIAA’10). 241--250.
[36]
Daniel Reidenbach and Markus L. Schmid. 2014. Patterns with bounded treewidth. Info. Comput. 239 (2014), 87--99.
[37]
Markus L. Schmid. 2013. A note on the complexity of matching patterns with variables. Info. Process. Lett. 113, 19–21 (2013), 729--733.
[38]
Markus L. Schmid. 2016. Computing equality-free and repetitive string factorisations. Theoret. Comput. Sci. 618 (2016), 42--51.
[39]
Takeshi Shinohara. 1982. Polynomial time inference of pattern languages and its application. In Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science (MFCS’82). 191--209.

Cited By

View all
  • (2023)Matching Patterns with Variables Under Simon’s CongruenceReachability Problems10.1007/978-3-031-45286-4_12(155-170)Online publication date: 5-Oct-2023
  • (2022)NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matchingIntelligent Data Analysis10.3233/IDA-21632526:6(1661-1682)Online publication date: 12-Nov-2022
  • (2022)Matching Patterns with Variables Under Edit DistanceString Processing and Information Retrieval10.1007/978-3-031-20643-6_20(275-289)Online publication date: 1-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computation Theory
ACM Transactions on Computation Theory  Volume 12, Issue 1
March 2020
199 pages
ISSN:1942-3454
EISSN:1942-3462
DOI:10.1145/3376904
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2020
Accepted: 01 September 2019
Revised: 01 August 2019
Received: 01 October 2018
Published in TOCT Volume 12, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Combinatorial pattern matching
  2. NP-complete string problems
  3. combinatorics on words
  4. patterns with variables

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • DFG Grant MA 5725/1-2

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Matching Patterns with Variables Under Simon’s CongruenceReachability Problems10.1007/978-3-031-45286-4_12(155-170)Online publication date: 5-Oct-2023
  • (2022)NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matchingIntelligent Data Analysis10.3233/IDA-21632526:6(1661-1682)Online publication date: 12-Nov-2022
  • (2022)Matching Patterns with Variables Under Edit DistanceString Processing and Information Retrieval10.1007/978-3-031-20643-6_20(275-289)Online publication date: 1-Nov-2022
  • (2020)String factorisations with maximum or minimum dimensionTheoretical Computer Science10.1016/j.tcs.2020.07.029842(65-73)Online publication date: Nov-2020
  • (2020)NetDAP: (δ, γ) −approximate pattern matching with length constraintsApplied Intelligence10.1007/s10489-020-01778-150:11(4094-4116)Online publication date: 10-Jul-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media