Abstract
We propose an efficient automata-based approach to extract behavioral units and rules from continuous sequential data of animal behavior. By introducing novel extensions, we integrate two elemental methods—the N-gram model and Angluin’s machine learning algorithm into an ethological data mining framework. This allows us to obtain the minimized automaton-representation of behavioral rules that accept (or generate) the smallest set of possible behavioral patterns from sequential data of animal behavior. With this method, we demonstrate how the ethological data mining works using real birdsong data; we use the Bengalese finch song and perform experimental evaluations of this method using artificial birdsong data generated by a computer program. These results suggest that our ethological data mining works effectively even for noisy behavioral data by appropriately setting the parameters that we introduce. In addition, we demonstrate a case study using the Bengalese finch song, showing that our method successfully grasps the core structure of the singing behavior such as loops and branches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Angluin D (1982) Inference of Reversible Languages. J Assoc Comput Mach 29(3): 741–765
Berwick RC, Pilato SF (1987) Learning syntax by automata induction. Mach Learn 2(1): 9–38
Brainard MS, Doupe AJ (2002) What songbirds teach us about learning. Nature 417: 351–358
Brian L, Michael G (1979) Biology of communication. Kluwer Academic Publishers Group
Catchpole CK, Slater PJB (1995) Bird song: biological themes and variations. Cambridge University Press
Chatfield C, Lemon RE (1970) Analysing sequences of behavioural events. J Theor Biol 29(3): 427–445
Doupe AJ, Kuhl PK (1999) Birdsong and human speech: common themes and mechanisms. Ann Rev Neurosci 22: 567–631
Gentner TQ, Fenn KM, Margoliash D, Nusbaum H (2006) Recursive syntactic pattern learning by songbirds. Nature 440: 1204–1207
Gold ME (1967) Language identification in the limit. Inf Control 10(5): 447–474
Graham S (2004) Essential animal behavior. Wiley-Blackwell
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1): 55–86
Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language: what is it, who has it, and how did it evolve?. Science 298: 1569–1589
Hopcroft JE, Ullman JD (1979) Introduction to automata theory, languages and computation. Addison Wesley
Hosino T, Okanoya K (2000) Lesion of a higher-order song control nucleus disrupts phrase-level complexity in Bengalese finches. NeuroReport 11: 2091–2095
Ian H W, Eibe F (2005) Data mining: practical machine learning tools and techniques, 2nd edn (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann
Jelinek F (1990) Self-organized language modeling for speech recognition. Morgan Kaufmann, San Francisco, pp 450–506
Jelinek F (1998) Statistical methods for speech recognition (language, speech, and communication). The MIT Press
Kakishita Y, Sasahara K, Nishino T, Takahasi M, Okanoya K (2007) Pattern extraction improves automata-based syntax analysis in songbirds. In: Progress in artificial life. Lecture notes in artificial intelligence, vol 4828. Springer, pp 320–332
Marler PR, Slabbekoorn H (2004) Nature’s music: the science of birdsong. Academic Press
Okanoya K (2004) Song syntax in Bengalese finches: proximate and ultimate analyses. Adv Study Behav 34: 297–346
Ramus F, Hauser MD, Miller C, Morris D, Mehler J (2000) Language discrimination by human newborns and by cotton-top Tamarin monkeys. Science 288: 349–351
Sasahara K, Kakishita Y, Nishino T, Takahasi M, Okanoya K (2006) A reversible automata approach to modeling birdsongs. In: 15th international conference on computing (CIC’06). IEEE Computer Society, pp 80–85
Shannon CE (1948) A mathematical theory of communication. Bell Sys Tech J 27:379–423, 623–656
Shannon CE (1950) Prediction and entropy of printed English. Bell Sys Tech J 3: 50–64
Suzuki R, Buck JR, Tyack PL (2006) Information entropy of humpback whale songs. J Acoust Soc Am 119(3): 1849–1866
Wren JD, Hildebrand WH, Chandrasekaran S, Melcher U (2005) Markov model recognition and classification of DNA/protein sequences within large text databases. Bioinformatics 21(21): 4046–4053
Author information
Authors and Affiliations
Corresponding authors
Additional information
Responsible editor: Eamonn Keogh.
Yasuki Kakishita and Kazutoshi Sasahara have contributed equally to this work.
Rights and permissions
About this article
Cite this article
Kakishita, Y., Sasahara, K., Nishino, T. et al. Ethological data mining: an automata-based approach to extract behavioral units and rules. Data Min Knowl Disc 18, 446–471 (2009). https://doi.org/10.1007/s10618-008-0122-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0122-1