Abstract
Recent advances in the quantitative analysis of natural language call for a theoretical framework that explains, how these advances are possible. This helps to unify different approaches and algorithms in quantitative linguistics. We consider the linguistic tradition of structuralism as a basis for such a framework. In what follows, we focus on syntagmatic and paradigmatic relations and attempt to describe them in a coherent way. We present an abstract version of a (neo-)structuralist language model and show how already known algorithms fit into it. We also show how new algorithms can be derived from it. As has already been predicted by linguists like Firth and Harris, it is possible to construct a computational model of language based on linguistic structuralism and statistical mathematics. The model we propose specifically helps to explain fully unsupervised algorithms for natural language processing which are based on well known methods like co-occurrence measures and clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Banarjee and T. Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In CICLing-2002, 2002.
C. Biemann, S. Bordag, G. Heyer, U. Quastho., and C. Wol. Languageindependent Methods for Compiling Monolingual Lexical Data. In Proceedings of CICLing 2004, pages 215–228. Springer, 2004.
C. Biemann, S. Bordag, and U. Quastho. Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences. In Proceedings of LREC 2004, 2004.
S. Bordag. Two-step Approach to Unsupervised Morpheme Segmentation. In Proceedings of PASCAL 2006, 2006.
T. Brants. TnT – a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, 2000.
E. Brill. A Simple Rule-based Part-of-speech Tagger. In Proceedings of ANLP, pages 152–155, 1992.
P. F. Brown, P. V. de Souza, R. L. Mercer, T. J. Watson, V. J. Della Pietra, and J. C. Lai. Class-based n-gram Models of Natural Language. Computational Linguistics, 18(4):467–479, 1992.
K. W. Church and W. Gale. Poisson Mixtures. Journal of Natural Language Engineering, 1(2):163–190, 1995.
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A Practical Part-ofspeech Tagger. In Proceedings of ANLP, pages 133–140, 1992.
I. Dagan, L. Lee, and F. C. N. Pereira. Similarity Based Models of Word Cooccurrence Probabilities. Machine Learning, 1-3(34):43–69, 1999.
F. de Saussure. Grundfragen der allgemeinen Sprachwissenschaft. de Gruyter, c. bally and a. sechehaye (eds.), 3rd edition, 2001.
S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 Report. In D. K. Harman, editor, Overview of the Third Text Retrieval Conference (TREC-3), pages 219–230, Gaithersburg, 1995. National Institute of Standards and Technology.
T. E. Dunning. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1):61–74, 1993.
G. Grewendorf. Parametrisierung der Syntax. Zur kognitiven Revolution in der Linguistik. In L. Hoffmann, editor, Deutsche Syntax. Ansichten und Einsichten, pages 11–73. de Gruyter, 1993.
G. Grewendorf, F. Hamm, and W. Sternefeld. Sprachliches Wissen. Eine Einführung. Suhrkamp, 1989.
B. Hamp and H. Feldweg. GermaNet – a Lexical-Semantic Net for German. In Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 1997.
Z. S. Harris. Mathematical Structures of Language. Wiley, 1968.
V. Hatzivassiloglou and K. R. McKeown. Predicting the Semantic Orientation of Adjectives. In Proceedings of ACL/EACL-97, pages 174–181, 1997.
G. Heyer, U. Quastho., T. Wittig, and C. Wol. Learning Relations Using Collocations. In A. Maedche, S. Staab, C. Nedellec, and E. Hovy, editors, Proceedings IJCAI Workshop on Ontology Learning, 2001.
G. Heyer, U. Quastho., and C. Wol. Knowledge Extraction from Text: Using Filters on Collocation Sets. In Proceedings of LREC-2002 and IICS 2002, pages 153–162. Springer, Berlin/New York, 2002.
R. Jakobson. Two Aspects of Language and Two Types of Aphasic Disturbances. In R. Jakobson, editor, Selected Writings II. Word and Language, pages 239–259. The Hague, 1956.
D. Jurafsky and J. H. Martin. Speech and Language Processing. Prentice Hall, 2000.
C. Kunze and A. Wagner. Integrating GermaNet into EuroWordNet, a Multilingual Lexical-semantic Database. Sprache und Datenverarbeitung – International Journal for Language Data Processing, 1999.
M. Lesk. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In Proceedings of SIGDOC, pages 24–26, 1986.
D. Lin. Automatic Retrieval and Clustering of Similar Words. In Proceedings of COLING/ACL-98, pages 768–774, 1998.
D. Lin. Extracting Collocations from Text Corpora. In First Workshop on Computational Terminology, 1998.
D. Lin, S. Zhao, L. Qin, and M. Zhou. Identifying Synonyms among Distributionally Similar Words. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003.
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
D. Marcu. The Theory and Practice of Discourse Parsing and Summarization. MIT Press, 2000.
A. Mehler. Hierarchical Orderings of Textual Units. In Proceedings of the 19th International Conference on Computational Linguistics, COLING' 02, Taipei, pages 646–652, San Francisco, 2002. Morgan Kaufmann.
G. A. Miller. WordNet: An online lexical database. International Journal of Lexicography, 4(3):235–312, 1990.
C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 01-04, Seattle, pages 159–168. ACM, 1998.
S. D. Richardson. Determining Similarity and Inferring Relations in a Lexical Knowledge Base. PhD thesis, The City University of New York, 1997.
B. B. Rieger. Unscharfe Semantik. Die empirische Analyse, quantitative Beschreibung, formale Repräsentation und prozedurale Modellierung vager Wortbedeutungen in Texten. Peter Lang, Bern/Frankfurt/New York, 1989.
B. B. Rieger. Distributed Semantic Representations of Word Meanings. In J. D. Becker, I. Eisele, and F. W. Mündemann, editors, Parallelism, Learning, Evolution. Proceedings of the Workshop on Evolutionary Models and Strategies and of the Workshop on Parallel Processing (WOPPLOT' 89), pages 243–273, Berlin/New York, 1991. Springer.
B. B. Rieger. Situations, Language Games, and SCIPS. Modeling Semiotic Cognitive Information Processing Systems. In J. Albus, A. Meystel, D. Pospelov, and T. Reader, editors, Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems. Proceedings of the ISIC-Workshop and the 10th International IEEE-Symposium on Intelligent Control, pages 130–138, Bala Cynwyd, 1995. AdRem.
M. Sanderson. Word Sense Disambiguation and Information Retrieval. In Proceedings of the 17th ACM SIGIR Conference, pages 142–151. ACM, 1996.
A. Schiller, S. Teufel, and C. Thielen. Guidelines für das Taggen deutscher Textcorpora mit STTS. Technical report, IMS-CL, Universität Stuttgart and SfS, Universität Tübingen, 1995.
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure for Association Patterns. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 32–41. ACM, 2002.
E. Terra and C. L. A. Clarke. Frequency Estimates for Statistical Word Similarity Measures. In Proceedings of HLT-NAACL 2003, pages 165– 172, 2003.
N. S. Trubetzkoy. Grundzüge der Phonologie. Travaux du Cercle Linguistique de Prague 7. Kraus, Nendeln, 1939.
P. D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of ACL-02, pages 417–424, 2002.
P. Vossen. Introduction to EuroWordNet. Special Issue on EuroWordNet of Computers and the Humanities, 32(2-3):73–89, 1998.
L. Wittgenstein. Tractatus Logico-Philosophicus. Frankfurt a. M., 2003.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Bordag, S., Heyer, G. (2007). A Structuralist Framework for Quantitative Linguistics. In: Aspects of Automatic Text Analysis. Studies in Fuzziness and Soft Computing, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37522-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-37522-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37520-3
Online ISBN: 978-3-540-37522-7
eBook Packages: EngineeringEngineering (R0)