Incorporating context in text analysis by interactive activation with competition artificial neural networks
Introduction
Part of science is the discovery of associations and patterns in the world around us. According to John Barrow: “the goal of science is to make sense of the diversity of Nature…” [through] “the transformation of lists of observational data into abbreviated form by the recognition of patterns”, all with the goal in mind of “algorithmic compression”. (Barrow, 1991, pp. 10–11) This can be applied to the social sciences where the goal is to make sense of the diversity of human activities. Of particular interest to scholars of communication and informatics are the patterns and associations that occur in texts, such as newspaper articles, works of literature, conversations or email messages. These patterns are studied to learn about social structures and interactions, individual thought processes and the entire range of cognitive and communicative processes between. The study of patterns in text has been applied to a variety of domains including literature (Pasquale & Meunier, 2001), mass media (Danielson & Lasorsa, 1997), political science (Franzosi, 1997), philosophy (Pasquale & Meunier, 2001) and others.
In the applied realm, individuals and organizations are interested in accurate and robust means to track opinions and attitudes for marketing, political, and social reasons (Lasswell, 1931). Personal attitudes and opinions are expressed and shared through a variety of communication modalities such as email. Email messages in public forums, such as listserv lists, can be a steady and plentiful source of personal expression in machine readable form. List email is created at an estimated rate of 36.5 billion messages or 675 terabytes per year (1.8 gigabytes per day) (Lyman & Varian, 2000). This is clearly too much for existing manual methods to analyze. Analysis of these texts, therefore, requires automated techniques that can deal with natural language. Large corporations, for instance, employ automated text analysis methods to help them discover information in focus group transcripts (Woelfel & Styanoff, 1993).
This paper reports on modifications of an interactive activation with competition (IAC) artificial neural network (ANN) algorithm to incorporate the notion of context, defined here as the words in the sentences preceding a sentence being processed. The results from applying two approaches to external activation of words during self-organizing were compared. In the first (sentence) approach the ANN set the external activations of all words to zero after each learning cycle took place, i.e. after each sentence was processed. In the second (message) approach the ANN reduced (without going below zero) the external activations of the words by a reduction factor after the learning cycle had been run—the external activations were set to zero only when the beginning of a new message was encountered.
The rationale for this second approach comes from the observation that the specific meaning of each word in a text depends on the context within which the text is set. The context is set, in part, by the other words in the text. Therefore, taking these words into account adds a degree of context to an analysis. There are, of course, other contextual clues that assist the reader in assigning specific meaning to specific words. These include, for instance, the source of the text (i.e. where it was published and by whom), the references to other texts made within the work, etc. This research focuses on the immediate contextual clues, those that can be derived from the surrounding text.
After a brief discussion of text classification and artificial neural networks, I will present the methodology used, the results obtained using two samples and some conclusions that are suggested. Finally, some suggestions for future research will be made.
Section snippets
Text classification
There are a variety of methods of text analysis including: content analysis (Breen, 1997, McKinnon, 1989, McMillan, 2000, Palmquist, 2002; Rubenstein, 1995, Schneider, 1997, Shi-Xu, 2000; and others); text mining (Dworman, 1996, Hearst, 1999, Witten, 2001); natural language processing (Liddy, 2001, Mani, 1999); latent semantic analysis (Dumais, Furnas, Landauer, Deerwester, & Harshman, 1988); probabilistic latent semantic indexing (Hofmann, 1999); and the topic of this study, artificial neural
Definitions
For the purpose of this study, text is defined as email messages exclusive of smtp headers, signature “lines” and quoted text. Email was chosen because it is a readily available machine readable example of natural language expression of ideas and opinions typical of those of interest to text analysts. A sentence is a series of words delimited by final punctuation marks (“.”, “!” and “?”), or the end of the text block. A word is any series of characters delimited by a space or punctuation mark
Summary statistics
Summary statistics are shown in Table 3.
Conclusion
This study has explored the effect of modifying the way external activation levels of intra-message words are reduced between sentences during the learning phase of an interactive activation with competition artificial neural network (IAC ANN) processing email text. The results from this initial investigation suggest that IAC ANN analysis of email text is improved by the use of message context during learning. Message context can be implemented by allowing the external activations of the words
Future research
This research has explored a potential method of improving the performance of an interactive activation with competition artificial neural network in the analysis of email text. The development of this technology for text analysis has many potential applications including tracking public opinion, identifying shifts in consumer attitudes, detecting and following the adoption of new ideas and, monitoring the attitudes and thereby helping to predict the behavior of well defined groups. Before
References (40)
An unsupervised learning technique for artificial neural networks
Neural Networks
(1990)Finding structure in time
Cognitive Science
(1990)- Absoft Corp. (2001). Pro Fortran for OSX (Version 7.0 SP3) [IDE]. Rochester Hill, MI: Absoft...
- Aizawa, A. (2002). A method of cluster-based indexing of textual data. In Paper presented at The 19th international...
Theories of everything: The quest for ultimate explanation John D. Barrow
(1991)- Belew, R. K. (1996). Adaptive information retrieval: machine learning in associative networks. Unpublished PhD,...
- Breen, M. J. (1997). Agenda setting and public opinion formation: media content and opinion polls on divorce referenda...
- et al.
Perceptions of social change: 100 years of front-page content in The New York Times and The Los Angeles Times
- Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., & Harshman, R. (1988). Using latent semantic analysis to...
- Dworman, G. (1996). Homer: a pattern discovery support system. In Paper presented at The ACM SIGCHI conference on human...
A stop list for general text
SIGIR Forum
Labor unrest in the Italitan service sector: an application of semantic grammars
Neural networks: An introductory guide for social scientists
Explorations in automatic thesaurus discovery
Serial order: a parallel distributed processing approach
Alphabet soup: an acronym roundup—global e-commerce has inundated us with many new abbreviations
Information Today (United States)
Cited by (13)
A fuzzy conceptualization model for text mining with application in opinion polarity classification
2013, Knowledge-Based SystemsCitation Excerpt :However, the inherent ambiguities and vagueness of text can mislead the classifier and produce unexpected results, which makes text classification significantly different from other classification tasks [12]. Therefore, the ability to handle ambiguities is one of the important indicators of a robust text classification method [33,38]. Disambiguation of text is a more challenging task when it comes to opinion polarity classification.
Building a Multimodal Classifier of Email Behavior: Towards a Social Network Understanding of Organizational Communication
2023, Information (Switzerland)Textual analysis and sentiment analysis in accounting
2021, Revista de Contabilidad-Spanish Accounting ReviewChanges in sustainability reporting by an African defence contractor: A longitudinal analysis
2012, Meditari Accountancy ResearchImpact factor: Outdated artefact or stepping-stone to journal certification?
2012, Scientometrics