A Structuralist Framework for Quantitative Linguistics

Bordag, Stefan; Heyer, Gerhard

doi:10.1007/978-3-540-37522-7_8

Stefan Bordag³ &
Gerhard Heyer⁴

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 209))

888 Accesses
1 Citations

Abstract

Recent advances in the quantitative analysis of natural language call for a theoretical framework that explains, how these advances are possible. This helps to unify different approaches and algorithms in quantitative linguistics. We consider the linguistic tradition of structuralism as a basis for such a framework. In what follows, we focus on syntagmatic and paradigmatic relations and attempt to describe them in a coherent way. We present an abstract version of a (neo-)structuralist language model and show how already known algorithms fit into it. We also show how new algorithms can be derived from it. As has already been predicted by linguists like Firth and Harris, it is possible to construct a computational model of language based on linguistic structuralism and statistical mathematics. The model we propose specifically helps to explain fully unsupervised algorithms for natural language processing which are based on well known methods like co-occurrence measures and clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Lingualyzer: A computational linguistic tool for multilingual and multidimensional text analysis

Article Open access 29 November 2023

Introduction

Formal Models in the Study of Language: Introduction

References

S. Banarjee and T. Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In CICLing-2002, 2002.
Google Scholar
C. Biemann, S. Bordag, G. Heyer, U. Quastho., and C. Wol. Languageindependent Methods for Compiling Monolingual Lexical Data. In Proceedings of CICLing 2004, pages 215–228. Springer, 2004.
Google Scholar
C. Biemann, S. Bordag, and U. Quastho. Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences. In Proceedings of LREC 2004, 2004.
Google Scholar
S. Bordag. Two-step Approach to Unsupervised Morpheme Segmentation. In Proceedings of PASCAL 2006, 2006.
Google Scholar
T. Brants. TnT – a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, 2000.
Google Scholar
E. Brill. A Simple Rule-based Part-of-speech Tagger. In Proceedings of ANLP, pages 152–155, 1992.
Google Scholar
P. F. Brown, P. V. de Souza, R. L. Mercer, T. J. Watson, V. J. Della Pietra, and J. C. Lai. Class-based n-gram Models of Natural Language. Computational Linguistics, 18(4):467–479, 1992.
Google Scholar
K. W. Church and W. Gale. Poisson Mixtures. Journal of Natural Language Engineering, 1(2):163–190, 1995.
Article Google Scholar
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A Practical Part-ofspeech Tagger. In Proceedings of ANLP, pages 133–140, 1992.
Google Scholar
I. Dagan, L. Lee, and F. C. N. Pereira. Similarity Based Models of Word Cooccurrence Probabilities. Machine Learning, 1-3(34):43–69, 1999.
Article Google Scholar
F. de Saussure. Grundfragen der allgemeinen Sprachwissenschaft. de Gruyter, c. bally and a. sechehaye (eds.), 3rd edition, 2001.
Google Scholar
S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 Report. In D. K. Harman, editor, Overview of the Third Text Retrieval Conference (TREC-3), pages 219–230, Gaithersburg, 1995. National Institute of Standards and Technology.
Google Scholar
T. E. Dunning. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1):61–74, 1993.
Google Scholar
G. Grewendorf. Parametrisierung der Syntax. Zur kognitiven Revolution in der Linguistik. In L. Hoffmann, editor, Deutsche Syntax. Ansichten und Einsichten, pages 11–73. de Gruyter, 1993.
Google Scholar
G. Grewendorf, F. Hamm, and W. Sternefeld. Sprachliches Wissen. Eine Einführung. Suhrkamp, 1989.
Google Scholar
B. Hamp and H. Feldweg. GermaNet – a Lexical-Semantic Net for German. In Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 1997.
Google Scholar
Z. S. Harris. Mathematical Structures of Language. Wiley, 1968.
Google Scholar
V. Hatzivassiloglou and K. R. McKeown. Predicting the Semantic Orientation of Adjectives. In Proceedings of ACL/EACL-97, pages 174–181, 1997.
Google Scholar
G. Heyer, U. Quastho., T. Wittig, and C. Wol. Learning Relations Using Collocations. In A. Maedche, S. Staab, C. Nedellec, and E. Hovy, editors, Proceedings IJCAI Workshop on Ontology Learning, 2001.
Google Scholar
G. Heyer, U. Quastho., and C. Wol. Knowledge Extraction from Text: Using Filters on Collocation Sets. In Proceedings of LREC-2002 and IICS 2002, pages 153–162. Springer, Berlin/New York, 2002.
Google Scholar
R. Jakobson. Two Aspects of Language and Two Types of Aphasic Disturbances. In R. Jakobson, editor, Selected Writings II. Word and Language, pages 239–259. The Hague, 1956.
Google Scholar
D. Jurafsky and J. H. Martin. Speech and Language Processing. Prentice Hall, 2000.
Google Scholar
C. Kunze and A. Wagner. Integrating GermaNet into EuroWordNet, a Multilingual Lexical-semantic Database. Sprache und Datenverarbeitung – International Journal for Language Data Processing, 1999.
Google Scholar
M. Lesk. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In Proceedings of SIGDOC, pages 24–26, 1986.
Google Scholar
D. Lin. Automatic Retrieval and Clustering of Similar Words. In Proceedings of COLING/ACL-98, pages 768–774, 1998.
Google Scholar
D. Lin. Extracting Collocations from Text Corpora. In First Workshop on Computational Terminology, 1998.
Google Scholar
D. Lin, S. Zhao, L. Qin, and M. Zhou. Identifying Synonyms among Distributionally Similar Words. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003.
Google Scholar
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
Google Scholar
D. Marcu. The Theory and Practice of Discourse Parsing and Summarization. MIT Press, 2000.
Google Scholar
A. Mehler. Hierarchical Orderings of Textual Units. In Proceedings of the 19th International Conference on Computational Linguistics, COLING' 02, Taipei, pages 646–652, San Francisco, 2002. Morgan Kaufmann.
Google Scholar
G. A. Miller. WordNet: An online lexical database. International Journal of Lexicography, 4(3):235–312, 1990.
Article Google Scholar
C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 01-04, Seattle, pages 159–168. ACM, 1998.
Google Scholar
S. D. Richardson. Determining Similarity and Inferring Relations in a Lexical Knowledge Base. PhD thesis, The City University of New York, 1997.
Google Scholar
B. B. Rieger. Unscharfe Semantik. Die empirische Analyse, quantitative Beschreibung, formale Repräsentation und prozedurale Modellierung vager Wortbedeutungen in Texten. Peter Lang, Bern/Frankfurt/New York, 1989.
Google Scholar
B. B. Rieger. Distributed Semantic Representations of Word Meanings. In J. D. Becker, I. Eisele, and F. W. Mündemann, editors, Parallelism, Learning, Evolution. Proceedings of the Workshop on Evolutionary Models and Strategies and of the Workshop on Parallel Processing (WOPPLOT' 89), pages 243–273, Berlin/New York, 1991. Springer.
Google Scholar
B. B. Rieger. Situations, Language Games, and SCIPS. Modeling Semiotic Cognitive Information Processing Systems. In J. Albus, A. Meystel, D. Pospelov, and T. Reader, editors, Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems. Proceedings of the ISIC-Workshop and the 10th International IEEE-Symposium on Intelligent Control, pages 130–138, Bala Cynwyd, 1995. AdRem.
Google Scholar
M. Sanderson. Word Sense Disambiguation and Information Retrieval. In Proceedings of the 17th ACM SIGIR Conference, pages 142–151. ACM, 1996.
Google Scholar
A. Schiller, S. Teufel, and C. Thielen. Guidelines für das Taggen deutscher Textcorpora mit STTS. Technical report, IMS-CL, Universität Stuttgart and SfS, Universität Tübingen, 1995.
Google Scholar
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure for Association Patterns. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 32–41. ACM, 2002.
Google Scholar
E. Terra and C. L. A. Clarke. Frequency Estimates for Statistical Word Similarity Measures. In Proceedings of HLT-NAACL 2003, pages 165– 172, 2003.
Google Scholar
N. S. Trubetzkoy. Grundzüge der Phonologie. Travaux du Cercle Linguistique de Prague 7. Kraus, Nendeln, 1939.
Google Scholar
P. D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of ACL-02, pages 417–424, 2002.
Google Scholar
P. Vossen. Introduction to EuroWordNet. Special Issue on EuroWordNet of Computers and the Humanities, 32(2-3):73–89, 1998.
Article Google Scholar
L. Wittgenstein. Tractatus Logico-Philosophicus. Frankfurt a. M., 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Leipzig University, Leipzig
Stefan Bordag
Leipzig University, Leipzig
Gerhard Heyer

Authors

Stefan Bordag
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Heyer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bordag, S., Heyer, G. (2007). A Structuralist Framework for Quantitative Linguistics. In: Aspects of Automatic Text Analysis. Studies in Fuzziness and Soft Computing, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37522-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-37522-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37520-3
Online ISBN: 978-3-540-37522-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics