Abstract
In this paper I present the most recent version of the SCA method for pairwise and multiple alignment analyses. In contrast to previously proposed alignment methods, SCA is based on a novel framework of sequence alignment which combines new approaches to sequence modeling in historical linguistics with recent developments in computational biology. In contrast to earlier versions of SCA [1,2] the new version comes along with a couple of modifications that significantly improve the performance and the application range of the algorithm: A new sound class model was defined which works well on highly divergent sequences, the algorithm for pairwise alignment was modified to be sensitive to secondary sequence structures such as syllable boundaries, and an algorithm for the pre-processing of the data in multiple alignment analyses [3] was included to cope for the bias resulting from progressive alignment analyses. In order to test the method, a new gold standard for pairwise and multiple alignment analyses was created which consists of 45 947 sequences covering a total of 435 different taxa belonging to six different language families.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
List, J.M.: Phonetic alignment based on sound classes. In: Slavkovik, M. (ed.) Proceedings of the 15th Student Session of the European Summer School for Logic, Language and Information, Kopenhagen, pp. 192–202 (2010)
List, J.M.: Multiple sequence alignment in historical linguistics. A sound class based approach. In: Proceedings of ConSOLE XIX (2011) (forthcoming)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee. A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)
Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426(6965), 435–439 (2003)
Holman, E.W., Brown, C.H., Wichmann, S., Müller, A., Velupillai, V., Hammarström, H., Sauppe, S., Jung, H., Bakker, D., Brown, P., Belyaev, O., Urban, M., Mailhammer, R., List, J.M., Egorov, D.: Automated dating of the world’s language families based on lexical similarity. Current Anthropology 52(6), 841–875 (2011)
Baxter, W.H., Manaster Ramer, A.: Beyond lumping and splitting. Probabilistic issues in historical linguistics. In: Renfrew, C., McMahon, A., Trask, L. (eds.) Time Depth in Historical Linguistics, pp. 167–188. McDonald Institute for Archaeological Research, Cambridge (2000)
Kessler, B.: The significance of word lists. Statistical tests for investigating historical connections between languages. CSLI Publications, Stanford (2001)
Kondrak, G.: Algorithms for language reconstruction. Dissertation. University of Toronto, Toronto (2002)
Prokić, J., Wieling, M., Nerbonne, J.: Multiple sequence alignments in linguistics. In: Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, pp. 18–25. Association for Computational Linguistics, Stroudsburg (2009)
Turchin, P., Peiros, I., Gell-Mann, M.: Analyzing genetic connections between languages by matching consonant classes. Journal of Language Relationship 3, 117–126 (2010)
Covington, M.A.: An algorithm to align words for historical comparison. Computational Linguistics 22(4), 481–496 (1996)
Ross, M., Durie, M.: Introduction. In: Durie, M. (ed.) The Comparative Method Reviewed. Regularity and Irregularity in Language Change, pp. 3–38. Oxford University Press, New York (1996)
Trask, R.L. (ed.): The dictionary of historical and comparative linguistics. Edinburgh University Press, Edinburgh (2000)
Lass, R.: Historical linguistics and language change. Cambridge University Press, Cambridge (1997)
Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge (1997)
Needleman, S.B., Wunsch, C.D.: A gene method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)
Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22(8), 1035–1036 (2004)
Rosenberg, M.S.: Sequence alignment. Concepts and history. In: Rosenberg, M.S. (ed.) Sequence Alignment. Methods, Models, Concepts, and Strategies, pp. 1–22. University of California Press, Berkeley and Los Angeles and London (2009)
Durbin, R., Eddy, S.R., Krogh, A., Mitchinson, G.: Biological sequence analysis. Probabilistic models of proteins and nucleic acids, 7th edn. Cambridge University Press, Cambridge (2002)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 1, 195–197 (1981)
Morgenstern, B., Dress, A., Werner, T.D.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of the National Acadamy of Science, USA 93, 12098–12103 (1996)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409–1438 (1958)
Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)
Feng, D.F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25(4), 351–360 (1987)
Dolgopolsky, A.B.: Gipoteza drevnejšego rodstva jazykovych semej Severnoj Evrazii s verojatnostej točky zrenija (A probabilistic hypothesis concerning the oldest relationships among the language families of Northern Eurasia). Voprosy Jazykoznanija 2, 53–63 (1964)
Dolgopolsky, A.B.: A probabilistic hypothesis concerning the oldest relationships among the language families of northern Eurasia. In: Shevoroshkin, V.V. (ed.) Typology, Relationship and Time, pp. 27–50. Karoma Publisher, Ann Arbor (1986)
Brown, C.H., Holman, E.W., Wichmann, S.: Sound correspondences in the world’s languages (2011), Online manuscript, PDF, http://wwwstaff.eva.mpg.de/~wichmann/wwcPaper23.pdf
Brown, C.H., Holman, E.W., Wichmann, S., Velupillai, V., Cysouw, M.: Automated classification of the world’s languages. Sprachtypologie und Universalienforschung 61(4), 285–308 (2008)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W. Nucleic Acids Research 22(22), 4673–4680 (1994)
Geisler, H.: Akzent und Lautwandel in der Romania. Narr, Tübingen (1992)
Hóu, J. (ed.): Xiàndài Hànyǔ fāngyán yīnkù (Phonological database of Chinese dialects). Shànghǎi Jiàoyǔ, Shanghai (2004)
Downey, S.S., Hallmark, B., Cox, M.P., Norquest, P., Lansing, S.: Computational feature-sensitive reconstruction of language relationships: Developing the ALINE distance for comparative historical linguistic reconstruction. Journal of Quantitative Linguistics 15(4), 340–369 (2008)
Wang, F.: Comparison of languages in contact. Institute of Linguistics Academia Sinica, Taipei (2006)
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27(13), 2682–2690 (1999)
Raghava, G.P.S., Barton, G.J.: Quantification of the variation in percentage identity for protein sequence alignments. BMC Bioinformatics 7(415) (2006)
Heggarty, P.: Sounds of the Andean languages. Online resource, http://www.quechua.org.uk/
Allen, B.: Bai Dialect Survey. SIL International (2007)
Almberg, J., Skarbø, K.: Nordavinden og sola. En norsk dialektprøvedatabase på nettet (The North Wind and the Sun. A Norwegian dialect database on the web) (2011), Online resource, http://www.ling.hf.ntnu.no/nos/
Gauchat, L., Jeanjaquet, J., Tappolet, E.: Tableaux phonétiques des patois suisses romands. Attinger, Neuchâtel (1925)
Renfrew, C., Heggarty, P.: Languages and origins in europe. Online resource, http://www.languagesandpeoples.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
List, JM. (2012). SCA: Phonetic Alignment Based on Sound Classes. In: Lassiter, D., Slavkovik, M. (eds) New Directions in Logic, Language and Computation. ESSLLI ESSLLI 2010 2011. Lecture Notes in Computer Science, vol 7415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31467-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-31467-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31466-7
Online ISBN: 978-3-642-31467-4
eBook Packages: Computer ScienceComputer Science (R0)