Abstract
This paper describes a corpus linguistic analysis of large text corpora based on collocations with the aim of extracting semantic relations from unstructured text. We regard this approach as a viable method for generating and structuring information about WEB communities. Starting from a short description of our corpora as well as our language analysis tools, we discuss in depth the automatic generation of collocation sets. We further give examples of different types of relations that may be found in collocation sets for arbitrary terms. We conclude with a brief discussion of applying our approach to the analysis of a sample community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Armstrong, S. (ed.) (1993). Using Large Corpora. Computational Linguistics 19(1/2) (1993) [Special Issue on Corpus Processing, repr. MIT Press 1994].
Bentley, J.; Sedgewick, R. (1998). “Ternary Search Trees.” In: Dr. Dobbs Journal, April 1998.
Davidson, R., Harel, D. (1996). “Drawing Graphs Nicely Using Simulated Annealing.” In: ACM Transactions on Graphics 15(4), 301–331.
Heyer, G.; Quasthoff, U.; Wolff, Ch. (2000). “Aiding Web Searches by Statistical Classification Tools.“ In: Knorz, G.; Kuhlen, R. (edd.) (2000). Informationskompetenz-Basiskompetenz in der Informationsgesellschaft. Proc. 7. Intern. Symposium f. Informationswissenschaft, ISI 2000, Darmstadt. Konstanz: UVK, 163–177.
Heyer, G.; Läuter, M.; Quasihoff, U.; Wittig, Th.; Wolff, Ch. (2001). „Learning Relations using Collocations.” In: Maedche, Alexander; Staab, Steffen; Nedellec, C.; Hovy, E. (edd.). Proc. IJCAI Workshop on Ontology Learning, Seattle/WA, August 2001, 19–24.
Krenn, B. (2000). “Distributional and Linguistic Implications of Collocation Identification.” In: Proc. Collocations Workshop, DGfS Conference, Marburg, March 2000.
Krenn, B., 2000. Empirical Implications on Lexical Association Measures. Proceedings of the Ninth EURALEX International Congress. Stuttgart, Germany.
Läuter, M., Quasthoff, U. (1999). “Kollokationen und semantisches Clustering.” In: Gippert, J. (ed.) (1999). Multilinguale Corpora. Codierung, Strukturierung, Analyse. Proc. 11. GLDV-Jahrestagung. Prague: Enigma Corporation, 34–41.
Lemnitzer, L. (1998). “Komplexe lexikalische Einheiten in Text und Lexikon.” In: Heyer, G.; Wolff, Ch. (ed.). Linguistik und neue Medien. Wiesbaden: Dt. Universitätsverlag, 85–91.
Manning, Ch.D.; Schütze, H. (1999). Foundations of Statistical Language Processing. Cambridge/MA, London: The MIT Press.
Milgram, S. (19922). “The Small World Problem.” In: Milgram, S.; Sabini, J.; Silver, M. (eds.). The Individual in a Social World: Essays and Experiments. New York/NY: McGraw Hill.
Quasthoff, U. (1998A). “Tools for Automatic Lexicon Maintenance: Acquisition, Error Correction, and the Generation of Missing Values.“ In: Proc. First International Conference on Language Resources and Evaluation [LREC], Granada, May 1998, Vol. II, 853–856.
Quasthoff, U. (1998B). “Projekt der deutsche Wortschatz.” In: Heyer, G., Wolff, Ch. (eds.). Linguistik und neue Medien. Wiesbaden: Dt. Universitätsverlag, 93–99.
Quasthoff, U.; Wolff, Ch. (2000). “An Infrastructure for Corpus-Based Monolingual Dictionaries.” In: Proc. LREC-2000. Second International Conference On Language Resources and Evaluation. Athens, May/June 2000, Vol. I, 241–246.
Salton, Gerard (1989). Automatic Text Processing. The Transformation, Analysis, and Retrieval of Information by Computer. Reading/MA: Addison-Wesley.Sinclair, J. (1991). Corpus Concordance Collocation. Oxford: Oxford University Press.
Smadja F. (1993). “Retrieving Collocations from Text: Xtract.” In: Computational Linguistics 19(1) (1993), 143–177.
Svartvik, J. (ed.) (1992). Directions in Corpus Linguistics: Proc. Nobel Symposium 82, Stockholm, 4–8 August 1991. Barlin: Mouton de Gruyter [= Trends in Linguistics Vol. 65].
van der Vet, P.E.; Mars, N.J.I. (1998). “Bottom-Up Construction of Ontologies.” In: IEEE Transactions on Knowledge and Data Engineering 10(4) (1998), 513–526.
Wulff, M.; Unger, H. (2000). “Message Chains as a new Form of Active Communication in the WOSNet.” In: Proc. High Performance Computing (HPC) 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heyer, G., Quasthoff, U., Wolff, C. (2002). Automatic Analysis of Large Text Corpora - A Contribution to Structuring WEB Communities. In: Unger, H., Böhme, T., Mikler, A. (eds) Innovative Internet Computing Systems. IICS 2002. Lecture Notes in Computer Science, vol 2346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48080-3_2
Download citation
DOI: https://doi.org/10.1007/3-540-48080-3_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43790-1
Online ISBN: 978-3-540-48080-8
eBook Packages: Springer Book Archive