Abstract
We investigate automatic classification of posts to Internet forums. We use collective classification methods, which simultaneously classify related objects — in our case, the posts in a thread. Specifically, we compare the Iterative Classification Algorithm (ICA) with Conditional Random Fields and with conventional classifiers (k-Nearest Neighbours and Support Vector Machines). The ICA algorithm invokes a local classifier, for which we use the kNN classifier. Our main contributions are two-fold. First, we define experimental protocols that we believe are suitable for offline evaluation in this domain. Second, by using these protocols to run experiments on two datasets, we show that ICA with kNN has significantly higher accuracy across most of the experimental conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bhatia, S., Biyani, P., Mitra, P.: Classifying user messages for managing web forum data. In: Ives, Z.G., Velegrakis, Y. (eds.) Procs. of the 15th International Workshop on the Web and Databases, pp. 13–18 (2012)
Burfoot, C., Bird, S., Baldwin, T.: Collective classification of congressional floor-debate transcripts. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) Procs. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1506–1515. ACL (2011)
Carvalho, V.R.: On the collective classification of email speech acts. In: Baeza-Yates, R.A., et al. (eds.) Procs. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM Press (2005)
Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: Procs. of the Conference on Empirical Methods in Natural Language Processing, pp. 309–316. ACL (2004)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. Cambridge University Press (2011)
Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Machine Learning Journal 77(1), 27–59 (2009)
Kim, S.N., Wang, L., Baldwin, T.: Tagging and linking web forum posts. In: Procs. of the Fourteenth Conference on Computational Natural Language Learning, pp. 192–202. ACL (2010)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Procs. of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Langley, P. (ed.) Procs. of the 17th International Conference on Machine Learning, pp. 591–598 (2000)
McDowell, L., Gupta, K.M., Aha, D.W.: Case-based collective classification. In: Wilson, D., Sutcliffe, G. (eds.) Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, pp. 399–404. AAAI Press (2007)
McDowell, L., Gupta, K.M., Aha, D.W.: Cautious inference in collective classification. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pp. 596–601. AAAI Press (2007)
McDowell, L., Gupta, K.M., Aha, D.W.: Cautious collective classification. Journal of Machine Learning Research 10, 2777–2836 (2009)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Somasundaran, S., Wiebe, J.: Recognizing stances in online debates. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) Procs. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 226–234. ACL (2009)
Somasundaran, S., Wiebe, J.: Recognizing stances in ideological on-line debates. In: Procs. of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 116–124. ACL (2010)
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Darwiche, A., Friedman, N. (eds.) Procs. of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pp. 485–492 (2002)
Weiss, Y.: Comparing the mean field method and belief propagation for approximate inference in MRFs. In: Opper, M., Saad, D. (eds.) Advanced Mean Field Methods. The MIT Press (2001)
Randall Wilson, D., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ó Duinn, P., Bridge, D. (2014). Collective Classification of Posts to Internet Forums. In: Lamontagne, L., Plaza, E. (eds) Case-Based Reasoning Research and Development. ICCBR 2014. Lecture Notes in Computer Science(), vol 8765. Springer, Cham. https://doi.org/10.1007/978-3-319-11209-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-11209-1_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11208-4
Online ISBN: 978-3-319-11209-1
eBook Packages: Computer ScienceComputer Science (R0)