Finite-state computability of annotations of strings and trees (extended abstract)

Bodlaender, Hans L.; Fellows, Michael R.; Evans, Patricia A.

doi:10.1007/3-540-61258-0_28

Hans L. Bodlaender¹,
Michael R. Fellows² &
Patricia A. Evans²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

148 Accesses
2 Citations

Abstract

An annotation of a string X over an alphabet Σ is a string Y having the same length as X, over an alphabet Γ. The pair (X, Y) can be viewed as a string over the alphabet Σ×Γ. The search problem for a language L ⊑ (Σ × Γ) ^* is the problem of computing, given a string X ∈ Σ^*, a string Y ∈ Γ^*, having the same length as X, such that (X, Y) ∈ L, or determining that no such Y exists. We define a notion of finitestate searchability and prove the following (main) theorem: If L is finitestate recognizable, then it is finitestate searchable. The notions of annotation and finite-state searchability can be generalized to trees of symbols. Annotation search problems have a variety of applications. For example, the tree or string of symbols may represent the structural parse of a graph of bounded treewidth or pathwidth, and the annotation may represent a “solution” to a search problem (e.g., finding a subgraph homeomorphic to a fixed graph H). Our main theorem allows us to treat in a general and natural way the search problems that correspond to the many important graph decision problems now known to be solvable in linear time for graphs of bounded treewidth and pathwidth. As a corollary, we show finite-state searchability for graph properties whose solutions can be expressed by leading existential quantification in monadic second-order logic. This can be viewed as a “search problem” form of Courcelle's Theorem on the decidability of monadic second-order graph properties. We describe several possible applications to computing annotations of biological sequences, and discuss how the resulting annotations may be useful in sequence comparison and alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. Abrahamson and M. Fellows. [AF93] K. Abrahamson and M. Fellows. Finite automata, bounded treewidth and well-quasi-ordering. In: Graph Structure Theory: Proceedings of the Joint Summer Research Conference on Graph Minors, Seattle, June, 1991, American Mathematical Society, Contemporary Mathematics 147 (1993), 539–564.
Google Scholar
H. Bodlaender. A linear time algorithm for finding tree-decompositions of small width. In Proceedings of the 25th Annual Symposium on Theory of Computing (1993), 226–234. To appear in SIAM J. Comput.
Google Scholar
B. Courcelle. The monadic second-order logic of graphs I: Recognizable sets of finite graphs. Information and Computation 85 (1990), 12–75.
Article Google Scholar
F. Corpet and B. Minchot. RNAlign program: alignment of RNA sequences using both primary and secondary structures. Computer Applications in the Biosciences 10 (1994), 389–399.
PubMed Google Scholar
S. Dong and D. Searls. Gene structure prediction by linguistic methods. Genomics 23 (1994), 540–551.
PubMed Google Scholar
S. Eilenberg. Automata, Languages, and Machines: Volume A. Academic Press, 1974.
Google Scholar
C. Elgot and J. Mezei. On relations defined by generalized finite automata. IBM Journal of Research 9 (1965), 47–68.
Google Scholar
K. Han and H. Kim. Prediction of common folding structures of homologous RNAs. Nucleic Acids Res. 21 (1993), 1251–1257.
PubMed Google Scholar
C. Johnson. Formal Aspects of Phonological Description. Mouton, 1972.
Google Scholar
R. Kaplan and M. Kay. Regular models of phonological rule systems. Computational Linguistics 20 (1994), 331–378.
Google Scholar
K. Koskenniemi. Two-level morphology: a general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki.
Google Scholar
S. Needleman and C. Wunsch. A general method applicable to the search for similarities in the amino-acid sequence of two proteins. Journal of Molecular Biology 48 (1970), 443–453.
PubMed Google Scholar
B. Rost, R. Schneider, A. de Daruvar, and C. Sander. The PredictProtein server. http://www.embl-heidelberg.de/predictprotein/predictprotein.html
Google Scholar
D. Searls and S. Dong. A syntactic pattern recognition system for DNA sequences. Proceedings of the Second International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis (1993), 89–101.
Google Scholar
D. Searls. The computational linguistics of biological sequences. In Artificial Intelligence and Molecular Biology, AAAI Press, 1993, 47–120.
Google Scholar
D. Searls. Formal grammars for intermolecular structure. Proceedings of the International IEEE Symposium on Intelligence in Neural and Biological Systems (1995).
Google Scholar
R. Sproat. Morphology and Computation. MIT Press, 1992.
Google Scholar
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981), 195–197.
PubMed Google Scholar
G. Stephen. String Searching Algorithms. Lecture Notes Series on Computing, vol. 3. World Scientific, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Utrecht University, P.O. Box 80.089, 3508, TB Utrecht, the Netherlands
Hans L. Bodlaender
Department of Computer Science, University of Victoria, V8W 3P6, Victoria, British Columbia, Canada
Michael R. Fellows & Patricia A. Evans

Authors

Hans L. Bodlaender
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Fellows
View author publications
You can also search for this author in PubMed Google Scholar
Patricia A. Evans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bodlaender, H.L., Fellows, M.R., Evans, P.A. (1996). Finite-state computability of annotations of strings and trees (extended abstract). In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_28

Download citation

DOI: https://doi.org/10.1007/3-540-61258-0_28
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics