Skip to main content

Extraktion relationaler Daten aus Texten

  • Chapter
Handbuch Netzwerkforschung

Zusammenfassung

Daten für netzwerkanalytische Projekte können explizit oder implizit in natürlichsprachlichen, un- oder halbstrukturierten Texten enthalten sein. In dieser Situation ermöglichen Verfahren zur Relationsextraktion die Gewinnung oder Anreicherung von Netzwerkdaten. Die folgenden Beispiele verdeutlichen Einsatzgebiete für diese Familie von Methoden: Analysten aus Wirtschaft und Verwaltung entnehmen Berichten von und über Organisationen Angaben zu deren Zusammensetzung, Effizienz und Entwicklung (Corman et al. 2002; Krackhardt 1987). Kognitions- und Sozialwissenschaftler untersuchen auf der Grundlage von Interviews, wer welche Themen anspricht und wie in Verbindung setzt (Carley und Palmquist 1991; Collins und Loftus 1975). Journalisten und Analysten durchsuchen Meldungen und Archive nach Beteiligten, Gegenstand, Grund, Verlauf, Ort, Zeit, und Zusammenhängen von Ereignissen (Gerner et al. 1994; van Cuilenburg et al. 1986). Marktforscher analysieren Kundenbewertungen um herauszufinden, welche Marken und Produkte welche Empfindungen hinterlassen (Wiebe 2000). Internetforscher verfolgen die akteursbezogene Diffusion von Themen im Internet (Adar und Adamic 2005; Kleinberg 2003). Nutzer senden Suchmaschinen Anfragen, deren Beantwortung Informationen von mehr als einer Webseite bedarf (Berners-Lee et al. 2001; Brin 1999). All diesen Aufgaben ist gemeinsam, dass sie gelöst werden können, indem die jeweils relevanten Informationen (Knoten) und deren Verbindungen (Kanten) aus Texten herausgefunden, wiedergegeben und netzwerkanalytisch ausgewertet werden (McCallum 2005). In diesem Kapitel erläutern wir, unter welchen Bedingungen das Extrahieren relationaler Daten aus Texten sinnvoll ist, welche Verfahren dafür zur Verfügung stehen, und zeigen Grenzen und bislang ungelöste Probleme der Methodik auf.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

5 Literatur

  • Adar, Eytan und Lada A. Adamic, 2005: Tracking Information Epidemics in Blogspace. Proc. of IEEE/WIC/ACM International Conference on Web Intelligence, September 2005, Compiegne, Frankreich: 207–214.

    Google Scholar 

  • Allen, James F. und Allen M. Frisch, 1982: What's in a semantic network? Proc. of 20th annual meeting of Association for Computational Linguistics Toronto, Canada: 19–27.

    Google Scholar 

  • Baker, Wayne E. und Robert R. Faulkner, 1993: The Social Organization of Conspiracy: Illegal Networks in the Heavy Electrical Equipment Industry. American Sociological Review 58(6): 837–860.

    Article  Google Scholar 

  • Berelson, Bernard, 1952: Content analysis in communication research. Glencoe, Ill: Free Press.

    Google Scholar 

  • Bernard, H. Russel und Gery W. Ryan, 19En: Text analysis: Qualitative and quantitative methods. S. 595–646 in: H. Russel Bernard (Hg.), Handbook of methods in cultural anthropology, Walnut Creek: Altamira Press.

    Google Scholar 

  • Berners-Lee, Tim, James Hendler und Ora Lassila, 2001: The Semantic Web. Scientific American 284(5): 34–43.

    Article  Google Scholar 

  • Brin, Sergey, 1999: Extracting Patterns and Relations from the World Wide Web. WebDB Workshop at 6th International Conference on Extending Database Technology (EDBT), März 1998, Valencia, Spanien: 172–183.

    Google Scholar 

  • Bunescu, Razvan und Raymond J. Mooney, 2007: Statistical Relational Learning for Natural Language Information Extraction. S. 535–552 in: Lise Getoor und Ben Taskar (Hg.), Statistical Relational Learning. Cambridge: MIT Press.

    Google Scholar 

  • Burt, Ronald und Nan Lin, 1977: Network Time Series from Archival Records. S. 224–254 in: David R. Heise (Hg.), Sociological Methodology, San Francisco, CA: Jossey-Bass.

    Google Scholar 

  • Buzan, Tony, 1984: Make the Most of Your Mind. New York, NY: Simon and Schuster.

    Google Scholar 

  • Cafarella, Michael J., Michele Banko und Oren Etzioni, 2006: Relational web search. Proc. of World Wide Web Conference (WWW), Mai 2006, Edinburgh, UK.

    Google Scholar 

  • Carley, Kathleen M., 1997: Network text analysis: The network position of concepts. S. 79–100 in: Carl W. Roberts (Hg.), Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Carley, Kathleen M., Jana Diesner, Jeffrey Reminga und Maksim Tsvetovat, 2007: Toward an interoperable dynamic network analysis toolkit. Decision Support Systems. 43(3): 1324–1347.

    Article  Google Scholar 

  • Carley, Kathleen M. und Michael Palmquist, 1991: Extracting, Representing, and Analyzing Mental Models. Social Forces 70(3): 601–636.

    Article  Google Scholar 

  • Central Intelligence Agency. World Factbook: Available from: www.cia.gov/library/publications/the-world-factbook/.

  • Chomsky, Noam, 1956: Three models for the description of language. IRE Transactions on Information Theory 2(3): 113–124.

    Article  Google Scholar 

  • Collins, Allan M. und Elisabeth F. Loftus, 1975: A spreading-activation theory of semantic processing. Psychological Review 82: 407–428.

    Article  Google Scholar 

  • Corman, Stephen R., Timothy Kuhn, Robert D. McPhee, und Kevin J. Dooley, 2002: Studying Complex Discursive Systems: Centering Resonance Analysis of Communication. Human Communication Research 28: 157–206.

    Google Scholar 

  • Culotta, Aron, Andrew McCallum und Jonathan Betz, 2006: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. Proc. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Juni 2006, New York, NY.

    Google Scholar 

  • Danowski, James A., 1993: Network Analysis of Message Content. Progress in Communication Sciences 12: 198–221.

    Google Scholar 

  • Diesner, Jana und Kathleen M. Carley, 2005: Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. S. 81–108 in: V. K. Narayanan and Deborah J. Armstrong (Hg.), Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations, Harrisburg, PA: Idea Group Publishing.

    Google Scholar 

  • Diesner, Jana und Kathleen M. Carley, 2008: Conditional Random Fields for Entity Extraction and Ontological Text Coding. Journal of Computational and Mathematical Organization Theory 14(3): 248–262.

    Article  Google Scholar 

  • Diesner, Jana und Kathleen M. Carley, 2009a: WYSIWII - What You See Is What It Is: Informed Approximation of Relational Data from Texts. Presentation General Online Research (GOR), April 2009, Wien, Österreich.

    Google Scholar 

  • Diesner, Jana und Kathleen M. Carley 2009b. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis. Proceedings of IEEE Symposium on Computational Intelligence for Security and Defence Applications (CISDA), Juli 2009, Ottawa, Canada.

    Google Scholar 

  • Diesner, Jana, Kathleen M. Carley und Harald Katzmair, 2007: The morphology of a breakdown. How the semantics and mechanics of communication networks from an organization in crises relate. Präsentation, XXVII Sunbelt Social Network Conference, Mai 2007, Korfu, Griechenland.

    Google Scholar 

  • Diesner, Jana, Terrill L. Frantz und Kathleen M. Carley, 2005: Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different”. Journal of Computational and Mathematical Organization 11(3): 201–228.

    Article  Google Scholar 

  • Dietterich, Thomas G., 2002: Machine Learning for Sequential Data: A Review. Proc. of Joint IAPR International Workshops SSPR 2002 and SPR 2002, August 2002, Windsor, ON, Canada: 15–33.

    Google Scholar 

  • Doddington, George, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel und Ralph Weischedel, 2004: The Automatic Content Extraction (ACE) Program–Tasks, Data, and Evaluation. Proc. of Language Resources and Evaluation Conference (LREC), Mai 2004, Lissabon, Portugal: 837–840.

    Google Scholar 

  • Doerfel, Marya, 1998: What Constitutes Semantic Network Analysis? A Comparison of Research and Methodologies. Connections 21(2): 16–26.

    Google Scholar 

  • Doerfel, Marya und George A. Barnett, 1999: A Semantic Network Analysis of the International Communication Association. Human Communication Research 25(4): 589–603.

    Article  Google Scholar 

  • Fellbaum, Christiane, 1998: WordNet: An electronic lexical database. Cambridge MA: MIT Press.

    Google Scholar 

  • Fillmore, Charles J., 1982: Frame Semantics. S. 111–137 in: The Linguistic Society of Korea (Hg.), Linguistics in the morning calm. Seoul, Süd Korea: Hanshin Publishing Co.

    Google Scholar 

  • Fillmore, Charles J., 1968: The Case for Case. S. 1–88 in: Emon Bach and Robert T. Harms (Hg.), Universals in Linguistic Theory. New York: Holt, Rinehart and Winston.

    Google Scholar 

  • Frank, Ove, 2004: Network sampling and model fitting. S. 31–56 in: Peter J. Carrington, John Scott und Stanley Wasserman (Hg.), Models and methods in social network analysis. New York: Cambridge University Press.

    Google Scholar 

  • Franzosi, Roberto, 1989: From words to numbers: A generalized and linguistics-based coding procedure for collecting textual data. Sociological Methodology 19: 225–257.

    Article  Google Scholar 

  • Gerner, Deborah, Phillip A. Schrodt, Ronald A. Francisco und Judith L. Weddle, 1994: Machine Coding of Event Data Using Regional and International Sources. International Studies Quarterly 38(1): 91–119.

    Article  Google Scholar 

  • Glaser, B. und A. Strauss, 1967: The Discovery of Grounded Theory: Strategies for Qualitative Research. New York, NY: Aldine.

    Google Scholar 

  • Grisham, Ralph und Beth Sundheim, 1996: Message understanding conference - 6: A brief history. Proc. of 16th International Conference on Computational Linguistics, Kopenhagen, Dänemark, Juni 1996.

    Google Scholar 

  • Hartley, Roger und John Barnden, 1997: Semantic networks: visualizations of knowledge. Trends in Cognitive Sciences 1(5): 169–175.

    Article  Google Scholar 

  • Howard, Ronald A., 1989: Knowledge maps. Management Science 35(8): 903–922.

    Article  Google Scholar 

  • Janas, Jtirgen und Camilla Schwind, 1979: Extensional Semantic Networks. S. 267–302 in: Nicholas V. Findler (Hg.), Associative Networks. Representation and Use of Knowledge by Computers. New York u.a.: Academic Press.

    Google Scholar 

  • Johnson-Laird, Phil N., 2005: The history of mental models. S. 179–212 in: Ken Manktelow und Man C. Chung (Hg.), Psychology of Reasoning: Theoretical and Historical Perspectives. London: Psychology Press.

    Google Scholar 

  • Jurafsky, Daniel und James H. Martin, 2000: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Sadle River NJ: Prentice Hall.

    Google Scholar 

  • King, Gary und Will Lowe, 2003: An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57(3): 617–642.

    Article  Google Scholar 

  • Kleene, Stephen, 1956: Representation of events in nerve nets and finite automata. S. 3–41 in: Claude Shannon und John McCarthy (Hg.), Automata Studies. Princeton NJ: Princeton University Press.

    Google Scholar 

  • Kleinberg, Jon, 2003: Bursty and Hierarchical Structure in Streams. Data Mining and Knowledge Discovery 7(4): 373–397.

    Article  Google Scholar 

  • Krackhardt, David, 1987: Cognitive social structures. Social Networks 9: 109–134.

    Article  Google Scholar 

  • Krebs, Valdis E., 2002: Mapping networks of terrorist cells. Connections 24(3): 43–52.

    Google Scholar 

  • Krippendorff, Klaus, 2004: Content analysis: An introduction to its methodology. Thousand Oaks CA: Sage.

    Google Scholar 

  • Lafferty, John, Andrew McCallum und Fernando Pereira, 2001: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. of 18th International Conference on Machine Learning, Juni 2001, Willliamstown, MA: 282–289.

    Google Scholar 

  • Lewins, Ann und Christina Silver, 2007: Using software in qualitative research: a step-by-step guide. London: Sage.

    Google Scholar 

  • McCallum, Andrew, 2005: Information extraction: distilling structured data from unstructured text. ACM Queue 3(9): 48–57.

    Article  Google Scholar 

  • Miller, Scott, Heidi Fox, Lance Ramshaw und Ralph Weischedel, 2000: A novel use of statistical parsing to extract information from text. Proc. of 1st Conference of North American chapter of the Association for Computational Linguistics (NAACL), Seattle, WA: 226–233.

    Google Scholar 

  • Minsky, Marvin, 1974: A Framework for Representing Knowledge. MIT-AI Laboratory Memo 306.

    Google Scholar 

  • Mitchell, Tom, 1997: Machine Learning. Muggleton: McGraw-Hill.

    Google Scholar 

  • Mohr, John W., 1998: Measuring Meaning Structures. Annual Review of Sociology 24(1): 345–370.

    Article  Google Scholar 

  • Norvig, Peter und Stuart Russell, 1995: Artificial Intelligence: A Modern Approach. Upper Saddle River: Pearson Education.

    Google Scholar 

  • Novak, Joseph D. und Alberto Cañas, 2008: The Theory Underlying Concept Maps and How to Construct Them. Florida Institute for Human and Machine Cognition, Report No. IHMC CmapTools Rev 01–2008.

    Google Scholar 

  • Osgood, Charles E., 1959: The representational model and relevant research methods. S. 33–88 in: Ithiel de Sola Pool (Hg.), Trends in content analysis. Urbana, IL: University of Illinois Press.

    Google Scholar 

  • Pearl, Judea, 1988: Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Petri, Carl Adam, 1962: Kommunikation mit Automaten. Universität Bonn, Ph. D. Dissertationsschrift.

    Google Scholar 

  • Richards, Tom, 2002: An intellectual history of NUD* IST and NVivo. International Journal of Social Research Methodology 5(3): 199–214.

    Article  Google Scholar 

  • Roberts, Carl W., 1997: A Generic Semantic Grammar for Quantitative Text Analysis: Applications to East and West Berlin Radio News Content from 1979. Sociological Methodology 27: 89–129.

    Article  Google Scholar 

  • Roberts, Carl W., 2000: A Conceptual Framework for Quantitative Text Analysis. Quality and Quantity 34(3): 259–274.

    Article  Google Scholar 

  • Rumelhart, David E., 1981: Schemata: The building blocks of cognition. Comprehension and teaching: Research reviews: 3–26.

    Google Scholar 

  • Schrodt, Phillip A., Ömür Yilmaz, Deborah J. Gerner und Dennis Hermick, 2008: Coding Sub-State Actors using the CAMEO (Conflict and Mediation Event Observations) Actor Coding Framework. Präsentation, Annual Meeting of the International Studies Association, März 2008, San Francisco, CA.

    Google Scholar 

  • Seibel, Wolfgang und Jörg Raab, 2003: Verfolgungsnetzwerke. Kölner Zeitschrift für Soziologie und Sozialpsychologie 55(2): 197–230.

    Article  Google Scholar 

  • Shapiro, Stuart C., 1971: A net structure for semantic information storage, deduction and retrieval. Proc. of Second International Joint Conference on Artificial Intelligence: 512–523.

    Google Scholar 

  • Smith, Andrew E. und Michael S. Humphreys, 2006: Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods 38(2): 262–279.

    Google Scholar 

  • Sowa, John F., 1992: Semantic Networks. S. 1493–1511 in: Stuart C. Shapiro (Hg.), Encyclopedia of Artificial Intelligence. New York: Wiley and Sons.

    Google Scholar 

  • Tesnière, Lucien, 1959: Elements de syntaxestructurale. Paris: Klincksieck.

    Google Scholar 

  • Tversky, Amos, und Itamar Gati, 1982: Similarity, separability, and the triangle inequality. Psychological Review, 89(2): 123–154.

    Article  Google Scholar 

  • Van Atteveldt, Wouter, 2008: Semantic network analysis: Techniques for extracting, representing, and querying media content. Charleston: Book Surge Publishers.

    Google Scholar 

  • van Cuilenburg, Jan J., Jan Kleinnijenhuis und Jan A. de Ridder, 1986: A Theory of Evaluative Discourse: Towards a Graph Theory of Journalistic Texts. European Journal of Communication 1(1): 65–96.

    Article  Google Scholar 

  • White, Harrison C., 1993: Canvases and careers: institutional change in the French painting world. Chicago: University of Chicago Press.

    Google Scholar 

  • Wiebe, Janyce M., 2000: Learning Subjective Adjectives from Corpora. Proc. of 17th National Conference on Artificial Intelligence (AAAI) 2000, Juli 2000, Austin, TX: 735–741.

    Google Scholar 

  • Woods, William A., 1975: What's in a link: Foundations for semantic networks. S. 35–82 in: Daniel G. Bobrow und Allan Collins (Hg.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press.

    Google Scholar 

  • Yang, Yiming und Jan O. Pedersen, 1997: A comparative study on feature selection in text categorization. Proc. 14th International Conference on Machine Learning (ICML), Nashville, TN.

    Google Scholar 

  • Zelenko, Dmitry, Chinatsu Aone und Anthony Richardella, 2003: Kernel methods for relation extraction. Journal of Machine Learning Research 3(2): 1083–1106.

    Article  Google Scholar 

  • Zipf, George K., 1949: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, MA: Addison-Wesley Press.

    Google Scholar 

  • Züll, Cornelia und Melina Alexa, 2001: Automatisches Codieren von Textdaten. Ein Überblick über neue Entwicklungen. S. 303–317 in: Werner Wirth und Edmund Lauf (Hg.), Inhaltsanalyse - Perspektiven, Probleme, Potenziale. Köln: Herbert von Halem.

    Google Scholar 

Download references

Authors

Editor information

Christian Stegbauer Roger Häußling

Rights and permissions

Reprints and permissions

Copyright information

© 2010 VS Verlag für Sozialwissenschaften | Springer Fachmedien Wiesbaden GmbH

About this chapter

Cite this chapter

Diesner, J., Carley, K.M. (2010). Extraktion relationaler Daten aus Texten. In: Stegbauer, C., Häußling, R. (eds) Handbuch Netzwerkforschung. VS Verlag für Sozialwissenschaften. https://doi.org/10.1007/978-3-531-92575-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-531-92575-2_44

  • Publisher Name: VS Verlag für Sozialwissenschaften

  • Print ISBN: 978-3-531-15808-2

  • Online ISBN: 978-3-531-92575-2

  • eBook Packages: Humanities, Social Science (German Language)

Publish with us

Policies and ethics