Extraktion relationaler Daten aus Texten

Diesner, Jana; Carley, Kathleen M.

doi:10.1007/978-3-531-92575-2_44

Jana Diesner &
Kathleen M. Carley

29k Accesses
6 Citations

Zusammenfassung

Daten für netzwerkanalytische Projekte können explizit oder implizit in natürlichsprachlichen, un- oder halbstrukturierten Texten enthalten sein. In dieser Situation ermöglichen Verfahren zur Relationsextraktion die Gewinnung oder Anreicherung von Netzwerkdaten. Die folgenden Beispiele verdeutlichen Einsatzgebiete für diese Familie von Methoden: Analysten aus Wirtschaft und Verwaltung entnehmen Berichten von und über Organisationen Angaben zu deren Zusammensetzung, Effizienz und Entwicklung (Corman et al. 2002; Krackhardt 1987). Kognitions- und Sozialwissenschaftler untersuchen auf der Grundlage von Interviews, wer welche Themen anspricht und wie in Verbindung setzt (Carley und Palmquist 1991; Collins und Loftus 1975). Journalisten und Analysten durchsuchen Meldungen und Archive nach Beteiligten, Gegenstand, Grund, Verlauf, Ort, Zeit, und Zusammenhängen von Ereignissen (Gerner et al. 1994; van Cuilenburg et al. 1986). Marktforscher analysieren Kundenbewertungen um herauszufinden, welche Marken und Produkte welche Empfindungen hinterlassen (Wiebe 2000). Internetforscher verfolgen die akteursbezogene Diffusion von Themen im Internet (Adar und Adamic 2005; Kleinberg 2003). Nutzer senden Suchmaschinen Anfragen, deren Beantwortung Informationen von mehr als einer Webseite bedarf (Berners-Lee et al. 2001; Brin 1999). All diesen Aufgaben ist gemeinsam, dass sie gelöst werden können, indem die jeweils relevanten Informationen (Knoten) und deren Verbindungen (Kanten) aus Texten herausgefunden, wiedergegeben und netzwerkanalytisch ausgewertet werden (McCallum 2005). In diesem Kapitel erläutern wir, unter welchen Bedingungen das Extrahieren relationaler Daten aus Texten sinnvoll ist, welche Verfahren dafür zur Verfügung stehen, und zeigen Grenzen und bislang ungelöste Probleme der Methodik auf.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

5 Literatur

Adar, Eytan und Lada A. Adamic, 2005: Tracking Information Epidemics in Blogspace. Proc. of IEEE/WIC/ACM International Conference on Web Intelligence, September 2005, Compiegne, Frankreich: 207–214.
Google Scholar
Allen, James F. und Allen M. Frisch, 1982: What's in a semantic network? Proc. of 20th annual meeting of Association for Computational Linguistics Toronto, Canada: 19–27.
Google Scholar
Baker, Wayne E. und Robert R. Faulkner, 1993: The Social Organization of Conspiracy: Illegal Networks in the Heavy Electrical Equipment Industry. American Sociological Review 58(6): 837–860.
Article Google Scholar
Berelson, Bernard, 1952: Content analysis in communication research. Glencoe, Ill: Free Press.
Google Scholar
Bernard, H. Russel und Gery W. Ryan, 19En: Text analysis: Qualitative and quantitative methods. S. 595–646 in: H. Russel Bernard (Hg.), Handbook of methods in cultural anthropology, Walnut Creek: Altamira Press.
Google Scholar
Berners-Lee, Tim, James Hendler und Ora Lassila, 2001: The Semantic Web. Scientific American 284(5): 34–43.
Article Google Scholar
Brin, Sergey, 1999: Extracting Patterns and Relations from the World Wide Web. WebDB Workshop at 6th International Conference on Extending Database Technology (EDBT), März 1998, Valencia, Spanien: 172–183.
Google Scholar
Bunescu, Razvan und Raymond J. Mooney, 2007: Statistical Relational Learning for Natural Language Information Extraction. S. 535–552 in: Lise Getoor und Ben Taskar (Hg.), Statistical Relational Learning. Cambridge: MIT Press.
Google Scholar
Burt, Ronald und Nan Lin, 1977: Network Time Series from Archival Records. S. 224–254 in: David R. Heise (Hg.), Sociological Methodology, San Francisco, CA: Jossey-Bass.
Google Scholar
Buzan, Tony, 1984: Make the Most of Your Mind. New York, NY: Simon and Schuster.
Google Scholar
Cafarella, Michael J., Michele Banko und Oren Etzioni, 2006: Relational web search. Proc. of World Wide Web Conference (WWW), Mai 2006, Edinburgh, UK.
Google Scholar
Carley, Kathleen M., 1997: Network text analysis: The network position of concepts. S. 79–100 in: Carl W. Roberts (Hg.), Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Carley, Kathleen M., Jana Diesner, Jeffrey Reminga und Maksim Tsvetovat, 2007: Toward an interoperable dynamic network analysis toolkit. Decision Support Systems. 43(3): 1324–1347.
Article Google Scholar
Carley, Kathleen M. und Michael Palmquist, 1991: Extracting, Representing, and Analyzing Mental Models. Social Forces 70(3): 601–636.
Article Google Scholar
Central Intelligence Agency. World Factbook: Available from: www.cia.gov/library/publications/the-world-factbook/.
Chomsky, Noam, 1956: Three models for the description of language. IRE Transactions on Information Theory 2(3): 113–124.
Article Google Scholar
Collins, Allan M. und Elisabeth F. Loftus, 1975: A spreading-activation theory of semantic processing. Psychological Review 82: 407–428.
Article Google Scholar
Corman, Stephen R., Timothy Kuhn, Robert D. McPhee, und Kevin J. Dooley, 2002: Studying Complex Discursive Systems: Centering Resonance Analysis of Communication. Human Communication Research 28: 157–206.
Google Scholar
Culotta, Aron, Andrew McCallum und Jonathan Betz, 2006: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. Proc. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Juni 2006, New York, NY.
Google Scholar
Danowski, James A., 1993: Network Analysis of Message Content. Progress in Communication Sciences 12: 198–221.
Google Scholar
Diesner, Jana und Kathleen M. Carley, 2005: Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. S. 81–108 in: V. K. Narayanan and Deborah J. Armstrong (Hg.), Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations, Harrisburg, PA: Idea Group Publishing.
Google Scholar
Diesner, Jana und Kathleen M. Carley, 2008: Conditional Random Fields for Entity Extraction and Ontological Text Coding. Journal of Computational and Mathematical Organization Theory 14(3): 248–262.
Article Google Scholar
Diesner, Jana und Kathleen M. Carley, 2009a: WYSIWII - What You See Is What It Is: Informed Approximation of Relational Data from Texts. Presentation General Online Research (GOR), April 2009, Wien, Österreich.
Google Scholar
Diesner, Jana und Kathleen M. Carley 2009b. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis. Proceedings of IEEE Symposium on Computational Intelligence for Security and Defence Applications (CISDA), Juli 2009, Ottawa, Canada.
Google Scholar
Diesner, Jana, Kathleen M. Carley und Harald Katzmair, 2007: The morphology of a breakdown. How the semantics and mechanics of communication networks from an organization in crises relate. Präsentation, XXVII Sunbelt Social Network Conference, Mai 2007, Korfu, Griechenland.
Google Scholar
Diesner, Jana, Terrill L. Frantz und Kathleen M. Carley, 2005: Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different”. Journal of Computational and Mathematical Organization 11(3): 201–228.
Article Google Scholar
Dietterich, Thomas G., 2002: Machine Learning for Sequential Data: A Review. Proc. of Joint IAPR International Workshops SSPR 2002 and SPR 2002, August 2002, Windsor, ON, Canada: 15–33.
Google Scholar
Doddington, George, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel und Ralph Weischedel, 2004: The Automatic Content Extraction (ACE) Program–Tasks, Data, and Evaluation. Proc. of Language Resources and Evaluation Conference (LREC), Mai 2004, Lissabon, Portugal: 837–840.
Google Scholar
Doerfel, Marya, 1998: What Constitutes Semantic Network Analysis? A Comparison of Research and Methodologies. Connections 21(2): 16–26.
Google Scholar
Doerfel, Marya und George A. Barnett, 1999: A Semantic Network Analysis of the International Communication Association. Human Communication Research 25(4): 589–603.
Article Google Scholar
Fellbaum, Christiane, 1998: WordNet: An electronic lexical database. Cambridge MA: MIT Press.
Google Scholar
Fillmore, Charles J., 1982: Frame Semantics. S. 111–137 in: The Linguistic Society of Korea (Hg.), Linguistics in the morning calm. Seoul, Süd Korea: Hanshin Publishing Co.
Google Scholar
Fillmore, Charles J., 1968: The Case for Case. S. 1–88 in: Emon Bach and Robert T. Harms (Hg.), Universals in Linguistic Theory. New York: Holt, Rinehart and Winston.
Google Scholar
Frank, Ove, 2004: Network sampling and model fitting. S. 31–56 in: Peter J. Carrington, John Scott und Stanley Wasserman (Hg.), Models and methods in social network analysis. New York: Cambridge University Press.
Google Scholar
Franzosi, Roberto, 1989: From words to numbers: A generalized and linguistics-based coding procedure for collecting textual data. Sociological Methodology 19: 225–257.
Article Google Scholar
Gerner, Deborah, Phillip A. Schrodt, Ronald A. Francisco und Judith L. Weddle, 1994: Machine Coding of Event Data Using Regional and International Sources. International Studies Quarterly 38(1): 91–119.
Article Google Scholar
Glaser, B. und A. Strauss, 1967: The Discovery of Grounded Theory: Strategies for Qualitative Research. New York, NY: Aldine.
Google Scholar
Grisham, Ralph und Beth Sundheim, 1996: Message understanding conference - 6: A brief history. Proc. of 16th International Conference on Computational Linguistics, Kopenhagen, Dänemark, Juni 1996.
Google Scholar
Hartley, Roger und John Barnden, 1997: Semantic networks: visualizations of knowledge. Trends in Cognitive Sciences 1(5): 169–175.
Article Google Scholar
Howard, Ronald A., 1989: Knowledge maps. Management Science 35(8): 903–922.
Article Google Scholar
Janas, Jtirgen und Camilla Schwind, 1979: Extensional Semantic Networks. S. 267–302 in: Nicholas V. Findler (Hg.), Associative Networks. Representation and Use of Knowledge by Computers. New York u.a.: Academic Press.
Google Scholar
Johnson-Laird, Phil N., 2005: The history of mental models. S. 179–212 in: Ken Manktelow und Man C. Chung (Hg.), Psychology of Reasoning: Theoretical and Historical Perspectives. London: Psychology Press.
Google Scholar
Jurafsky, Daniel und James H. Martin, 2000: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Sadle River NJ: Prentice Hall.
Google Scholar
King, Gary und Will Lowe, 2003: An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57(3): 617–642.
Article Google Scholar
Kleene, Stephen, 1956: Representation of events in nerve nets and finite automata. S. 3–41 in: Claude Shannon und John McCarthy (Hg.), Automata Studies. Princeton NJ: Princeton University Press.
Google Scholar
Kleinberg, Jon, 2003: Bursty and Hierarchical Structure in Streams. Data Mining and Knowledge Discovery 7(4): 373–397.
Article Google Scholar
Krackhardt, David, 1987: Cognitive social structures. Social Networks 9: 109–134.
Article Google Scholar
Krebs, Valdis E., 2002: Mapping networks of terrorist cells. Connections 24(3): 43–52.
Google Scholar
Krippendorff, Klaus, 2004: Content analysis: An introduction to its methodology. Thousand Oaks CA: Sage.
Google Scholar
Lafferty, John, Andrew McCallum und Fernando Pereira, 2001: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. of 18th International Conference on Machine Learning, Juni 2001, Willliamstown, MA: 282–289.
Google Scholar
Lewins, Ann und Christina Silver, 2007: Using software in qualitative research: a step-by-step guide. London: Sage.
Google Scholar
McCallum, Andrew, 2005: Information extraction: distilling structured data from unstructured text. ACM Queue 3(9): 48–57.
Article Google Scholar
Miller, Scott, Heidi Fox, Lance Ramshaw und Ralph Weischedel, 2000: A novel use of statistical parsing to extract information from text. Proc. of 1st Conference of North American chapter of the Association for Computational Linguistics (NAACL), Seattle, WA: 226–233.
Google Scholar
Minsky, Marvin, 1974: A Framework for Representing Knowledge. MIT-AI Laboratory Memo 306.
Google Scholar
Mitchell, Tom, 1997: Machine Learning. Muggleton: McGraw-Hill.
Google Scholar
Mohr, John W., 1998: Measuring Meaning Structures. Annual Review of Sociology 24(1): 345–370.
Article Google Scholar
Norvig, Peter und Stuart Russell, 1995: Artificial Intelligence: A Modern Approach. Upper Saddle River: Pearson Education.
Google Scholar
Novak, Joseph D. und Alberto Cañas, 2008: The Theory Underlying Concept Maps and How to Construct Them. Florida Institute for Human and Machine Cognition, Report No. IHMC CmapTools Rev 01–2008.
Google Scholar
Osgood, Charles E., 1959: The representational model and relevant research methods. S. 33–88 in: Ithiel de Sola Pool (Hg.), Trends in content analysis. Urbana, IL: University of Illinois Press.
Google Scholar
Pearl, Judea, 1988: Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco: Morgan Kaufmann.
Google Scholar
Petri, Carl Adam, 1962: Kommunikation mit Automaten. Universität Bonn, Ph. D. Dissertationsschrift.
Google Scholar
Richards, Tom, 2002: An intellectual history of NUD* IST and NVivo. International Journal of Social Research Methodology 5(3): 199–214.
Article Google Scholar
Roberts, Carl W., 1997: A Generic Semantic Grammar for Quantitative Text Analysis: Applications to East and West Berlin Radio News Content from 1979. Sociological Methodology 27: 89–129.
Article Google Scholar
Roberts, Carl W., 2000: A Conceptual Framework for Quantitative Text Analysis. Quality and Quantity 34(3): 259–274.
Article Google Scholar
Rumelhart, David E., 1981: Schemata: The building blocks of cognition. Comprehension and teaching: Research reviews: 3–26.
Google Scholar
Schrodt, Phillip A., Ömür Yilmaz, Deborah J. Gerner und Dennis Hermick, 2008: Coding Sub-State Actors using the CAMEO (Conflict and Mediation Event Observations) Actor Coding Framework. Präsentation, Annual Meeting of the International Studies Association, März 2008, San Francisco, CA.
Google Scholar
Seibel, Wolfgang und Jörg Raab, 2003: Verfolgungsnetzwerke. Kölner Zeitschrift für Soziologie und Sozialpsychologie 55(2): 197–230.
Article Google Scholar
Shapiro, Stuart C., 1971: A net structure for semantic information storage, deduction and retrieval. Proc. of Second International Joint Conference on Artificial Intelligence: 512–523.
Google Scholar
Smith, Andrew E. und Michael S. Humphreys, 2006: Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods 38(2): 262–279.
Google Scholar
Sowa, John F., 1992: Semantic Networks. S. 1493–1511 in: Stuart C. Shapiro (Hg.), Encyclopedia of Artificial Intelligence. New York: Wiley and Sons.
Google Scholar
Tesnière, Lucien, 1959: Elements de syntaxestructurale. Paris: Klincksieck.
Google Scholar
Tversky, Amos, und Itamar Gati, 1982: Similarity, separability, and the triangle inequality. Psychological Review, 89(2): 123–154.
Article Google Scholar
Van Atteveldt, Wouter, 2008: Semantic network analysis: Techniques for extracting, representing, and querying media content. Charleston: Book Surge Publishers.
Google Scholar
van Cuilenburg, Jan J., Jan Kleinnijenhuis und Jan A. de Ridder, 1986: A Theory of Evaluative Discourse: Towards a Graph Theory of Journalistic Texts. European Journal of Communication 1(1): 65–96.
Article Google Scholar
White, Harrison C., 1993: Canvases and careers: institutional change in the French painting world. Chicago: University of Chicago Press.
Google Scholar
Wiebe, Janyce M., 2000: Learning Subjective Adjectives from Corpora. Proc. of 17th National Conference on Artificial Intelligence (AAAI) 2000, Juli 2000, Austin, TX: 735–741.
Google Scholar
Woods, William A., 1975: What's in a link: Foundations for semantic networks. S. 35–82 in: Daniel G. Bobrow und Allan Collins (Hg.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press.
Google Scholar
Yang, Yiming und Jan O. Pedersen, 1997: A comparative study on feature selection in text categorization. Proc. 14th International Conference on Machine Learning (ICML), Nashville, TN.
Google Scholar
Zelenko, Dmitry, Chinatsu Aone und Anthony Richardella, 2003: Kernel methods for relation extraction. Journal of Machine Learning Research 3(2): 1083–1106.
Article Google Scholar
Zipf, George K., 1949: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, MA: Addison-Wesley Press.
Google Scholar
Züll, Cornelia und Melina Alexa, 2001: Automatisches Codieren von Textdaten. Ein Überblick über neue Entwicklungen. S. 303–317 in: Werner Wirth und Edmund Lauf (Hg.), Inhaltsanalyse - Perspektiven, Probleme, Potenziale. Köln: Herbert von Halem.
Google Scholar

Download references

Authors

Jana Diesner
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen M. Carley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Stegbauer Roger Häußling

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Diesner, J., Carley, K.M. (2010). Extraktion relationaler Daten aus Texten. In: Stegbauer, C., Häußling, R. (eds) Handbuch Netzwerkforschung. VS Verlag für Sozialwissenschaften. https://doi.org/10.1007/978-3-531-92575-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-531-92575-2_44
Publisher Name: VS Verlag für Sozialwissenschaften
Print ISBN: 978-3-531-15808-2
Online ISBN: 978-3-531-92575-2
eBook Packages: Humanities, Social Science (German Language)

Publish with us

Policies and ethics