Abstract
Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.
References
Cai CZ, Han LY, Ji ZL, Chen YZ (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31:3692–3697
Chomsky N (1957) Syntactic structures. Mouton, Hague
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2 edn. Wiley, New York
Horn D, Solan Z, Ruppin E, Edelman S (2004) Unsupervised language acquisition: syntax from plain corpus. Presented at the Newcastle symposium on human language: cognitive, neuroscientific and dynamical systems perspectives
Kunik V, Meroz Y, Solan Z, Sandbank B, Weingart U, Ruppin E, Horn D (2007) Functional representation of enzymes by specific peptides. PLOS Comp Biol 3(8):e167 doi: 10.1371/journal.pcbi.0030167
MacWhinney B, Snow C (1985) The child language exchange system. J Comput Linguist 12:271–296
Moore B, Carroll J (2001) Parser comparison—context-free grammar (CFG) data. Online at http://www.informatics.susx.ac.uk/research/nlp/carroll/cfg-resources/
Nowak NA, Komarova NL, Niyogi P (2002) Computational and evolutionary aspects of language. Nature 417:611–617
Scholkopf B (1997) Support vector learning. R. Oldenburg Verlag, Munich
Segal L, Lapidot M, Solan Z, Ruppin E, Pilpel Y, Horn D (2007) Nucleotide variation of regulatory motifs may lead to distinct expression profiles. Bioinformatics 23(ISMB/ECCB 2007):i440–i449
Searls D (2002) The language of genes. Nature 420:211–217
Smith T, Waterman M (1981) Identification of common molecular subsequences J Mol Biol 147:195–197
Solan Z, Horn D, Ruppin E, Edelman S (2005) Unsupervised learning of natural languages. Proc Natl Acad Sci USA 102:11629–11634
Tian W, Skolnick J (2003) How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 333:863–882
Turing AM (1936–7) On computable numbers, with an application to the Entscheidungs problem. Proc. Lond Math Soc 42:230–265; correction ibid. 43:544–546
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Horn, D. Syntactic structures in languages and biology. Cogn Process 9, 153–158 (2008). https://doi.org/10.1007/s10339-007-0194-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-007-0194-7