Skip to main content

A Structuralist Framework for Quantitative Linguistics

  • Chapter
Aspects of Automatic Text Analysis

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 209))

Abstract

Recent advances in the quantitative analysis of natural language call for a theoretical framework that explains, how these advances are possible. This helps to unify different approaches and algorithms in quantitative linguistics. We consider the linguistic tradition of structuralism as a basis for such a framework. In what follows, we focus on syntagmatic and paradigmatic relations and attempt to describe them in a coherent way. We present an abstract version of a (neo-)structuralist language model and show how already known algorithms fit into it. We also show how new algorithms can be derived from it. As has already been predicted by linguists like Firth and Harris, it is possible to construct a computational model of language based on linguistic structuralism and statistical mathematics. The model we propose specifically helps to explain fully unsupervised algorithms for natural language processing which are based on well known methods like co-occurrence measures and clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Banarjee and T. Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In CICLing-2002, 2002.

    Google Scholar 

  2. C. Biemann, S. Bordag, G. Heyer, U. Quastho., and C. Wol. Languageindependent Methods for Compiling Monolingual Lexical Data. In Proceedings of CICLing 2004, pages 215–228. Springer, 2004.

    Google Scholar 

  3. C. Biemann, S. Bordag, and U. Quastho. Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences. In Proceedings of LREC 2004, 2004.

    Google Scholar 

  4. S. Bordag. Two-step Approach to Unsupervised Morpheme Segmentation. In Proceedings of PASCAL 2006, 2006.

    Google Scholar 

  5. T. Brants. TnT – a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, 2000.

    Google Scholar 

  6. E. Brill. A Simple Rule-based Part-of-speech Tagger. In Proceedings of ANLP, pages 152–155, 1992.

    Google Scholar 

  7. P. F. Brown, P. V. de Souza, R. L. Mercer, T. J. Watson, V. J. Della Pietra, and J. C. Lai. Class-based n-gram Models of Natural Language. Computational Linguistics, 18(4):467–479, 1992.

    Google Scholar 

  8. K. W. Church and W. Gale. Poisson Mixtures. Journal of Natural Language Engineering, 1(2):163–190, 1995.

    Article  Google Scholar 

  9. D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A Practical Part-ofspeech Tagger. In Proceedings of ANLP, pages 133–140, 1992.

    Google Scholar 

  10. I. Dagan, L. Lee, and F. C. N. Pereira. Similarity Based Models of Word Cooccurrence Probabilities. Machine Learning, 1-3(34):43–69, 1999.

    Article  Google Scholar 

  11. F. de Saussure. Grundfragen der allgemeinen Sprachwissenschaft. de Gruyter, c. bally and a. sechehaye (eds.), 3rd edition, 2001.

    Google Scholar 

  12. S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 Report. In D. K. Harman, editor, Overview of the Third Text Retrieval Conference (TREC-3), pages 219–230, Gaithersburg, 1995. National Institute of Standards and Technology.

    Google Scholar 

  13. T. E. Dunning. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1):61–74, 1993.

    Google Scholar 

  14. G. Grewendorf. Parametrisierung der Syntax. Zur kognitiven Revolution in der Linguistik. In L. Hoffmann, editor, Deutsche Syntax. Ansichten und Einsichten, pages 11–73. de Gruyter, 1993.

    Google Scholar 

  15. G. Grewendorf, F. Hamm, and W. Sternefeld. Sprachliches Wissen. Eine Einführung. Suhrkamp, 1989.

    Google Scholar 

  16. B. Hamp and H. Feldweg. GermaNet – a Lexical-Semantic Net for German. In Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 1997.

    Google Scholar 

  17. Z. S. Harris. Mathematical Structures of Language. Wiley, 1968.

    Google Scholar 

  18. V. Hatzivassiloglou and K. R. McKeown. Predicting the Semantic Orientation of Adjectives. In Proceedings of ACL/EACL-97, pages 174–181, 1997.

    Google Scholar 

  19. G. Heyer, U. Quastho., T. Wittig, and C. Wol. Learning Relations Using Collocations. In A. Maedche, S. Staab, C. Nedellec, and E. Hovy, editors, Proceedings IJCAI Workshop on Ontology Learning, 2001.

    Google Scholar 

  20. G. Heyer, U. Quastho., and C. Wol. Knowledge Extraction from Text: Using Filters on Collocation Sets. In Proceedings of LREC-2002 and IICS 2002, pages 153–162. Springer, Berlin/New York, 2002.

    Google Scholar 

  21. R. Jakobson. Two Aspects of Language and Two Types of Aphasic Disturbances. In R. Jakobson, editor, Selected Writings II. Word and Language, pages 239–259. The Hague, 1956.

    Google Scholar 

  22. D. Jurafsky and J. H. Martin. Speech and Language Processing. Prentice Hall, 2000.

    Google Scholar 

  23. C. Kunze and A. Wagner. Integrating GermaNet into EuroWordNet, a Multilingual Lexical-semantic Database. Sprache und Datenverarbeitung – International Journal for Language Data Processing, 1999.

    Google Scholar 

  24. M. Lesk. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In Proceedings of SIGDOC, pages 24–26, 1986.

    Google Scholar 

  25. D. Lin. Automatic Retrieval and Clustering of Similar Words. In Proceedings of COLING/ACL-98, pages 768–774, 1998.

    Google Scholar 

  26. D. Lin. Extracting Collocations from Text Corpora. In First Workshop on Computational Terminology, 1998.

    Google Scholar 

  27. D. Lin, S. Zhao, L. Qin, and M. Zhou. Identifying Synonyms among Distributionally Similar Words. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003.

    Google Scholar 

  28. C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.

    Google Scholar 

  29. D. Marcu. The Theory and Practice of Discourse Parsing and Summarization. MIT Press, 2000.

    Google Scholar 

  30. A. Mehler. Hierarchical Orderings of Textual Units. In Proceedings of the 19th International Conference on Computational Linguistics, COLING' 02, Taipei, pages 646–652, San Francisco, 2002. Morgan Kaufmann.

    Google Scholar 

  31. G. A. Miller. WordNet: An online lexical database. International Journal of Lexicography, 4(3):235–312, 1990.

    Article  Google Scholar 

  32. C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 01-04, Seattle, pages 159–168. ACM, 1998.

    Google Scholar 

  33. S. D. Richardson. Determining Similarity and Inferring Relations in a Lexical Knowledge Base. PhD thesis, The City University of New York, 1997.

    Google Scholar 

  34. B. B. Rieger. Unscharfe Semantik. Die empirische Analyse, quantitative Beschreibung, formale Repräsentation und prozedurale Modellierung vager Wortbedeutungen in Texten. Peter Lang, Bern/Frankfurt/New York, 1989.

    Google Scholar 

  35. B. B. Rieger. Distributed Semantic Representations of Word Meanings. In J. D. Becker, I. Eisele, and F. W. Mündemann, editors, Parallelism, Learning, Evolution. Proceedings of the Workshop on Evolutionary Models and Strategies and of the Workshop on Parallel Processing (WOPPLOT' 89), pages 243–273, Berlin/New York, 1991. Springer.

    Google Scholar 

  36. B. B. Rieger. Situations, Language Games, and SCIPS. Modeling Semiotic Cognitive Information Processing Systems. In J. Albus, A. Meystel, D. Pospelov, and T. Reader, editors, Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems. Proceedings of the ISIC-Workshop and the 10th International IEEE-Symposium on Intelligent Control, pages 130–138, Bala Cynwyd, 1995. AdRem.

    Google Scholar 

  37. M. Sanderson. Word Sense Disambiguation and Information Retrieval. In Proceedings of the 17th ACM SIGIR Conference, pages 142–151. ACM, 1996.

    Google Scholar 

  38. A. Schiller, S. Teufel, and C. Thielen. Guidelines für das Taggen deutscher Textcorpora mit STTS. Technical report, IMS-CL, Universität Stuttgart and SfS, Universität Tübingen, 1995.

    Google Scholar 

  39. P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure for Association Patterns. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 32–41. ACM, 2002.

    Google Scholar 

  40. E. Terra and C. L. A. Clarke. Frequency Estimates for Statistical Word Similarity Measures. In Proceedings of HLT-NAACL 2003, pages 165– 172, 2003.

    Google Scholar 

  41. N. S. Trubetzkoy. Grundzüge der Phonologie. Travaux du Cercle Linguistique de Prague 7. Kraus, Nendeln, 1939.

    Google Scholar 

  42. P. D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of ACL-02, pages 417–424, 2002.

    Google Scholar 

  43. P. Vossen. Introduction to EuroWordNet. Special Issue on EuroWordNet of Computers and the Humanities, 32(2-3):73–89, 1998.

    Article  Google Scholar 

  44. L. Wittgenstein. Tractatus Logico-Philosophicus. Frankfurt a. M., 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Bordag, S., Heyer, G. (2007). A Structuralist Framework for Quantitative Linguistics. In: Aspects of Automatic Text Analysis. Studies in Fuzziness and Soft Computing, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37522-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-37522-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37520-3

  • Online ISBN: 978-3-540-37522-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics