Collective Classification of Posts to Internet Forums

Ó Duinn, Pádraig; Bridge, Derek

doi:10.1007/978-3-319-11209-1_24

Pádraig Ó Duinn²¹ &
Derek Bridge²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8765))

Included in the following conference series:

International Conference on Case-Based Reasoning

1185 Accesses
2 Citations

Abstract

We investigate automatic classification of posts to Internet forums. We use collective classification methods, which simultaneously classify related objects — in our case, the posts in a thread. Specifically, we compare the Iterative Classification Algorithm (ICA) with Conditional Random Fields and with conventional classifiers (k-Nearest Neighbours and Support Vector Machines). The ICA algorithm invokes a local classifier, for which we use the kNN classifier. Our main contributions are two-fold. First, we define experimental protocols that we believe are suitable for offline evaluation in this domain. Second, by using these protocols to run experiments on two datasets, we show that ICA with kNN has significantly higher accuracy across most of the experimental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bhatia, S., Biyani, P., Mitra, P.: Classifying user messages for managing web forum data. In: Ives, Z.G., Velegrakis, Y. (eds.) Procs. of the 15th International Workshop on the Web and Databases, pp. 13–18 (2012)
Google Scholar
Burfoot, C., Bird, S., Baldwin, T.: Collective classification of congressional floor-debate transcripts. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) Procs. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1506–1515. ACL (2011)
Google Scholar
Carvalho, V.R.: On the collective classification of email speech acts. In: Baeza-Yates, R.A., et al. (eds.) Procs. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM Press (2005)
Google Scholar
Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: Procs. of the Conference on Empirical Methods in Natural Language Processing, pp. 309–316. ACL (2004)
Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. Cambridge University Press (2011)
Google Scholar
Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Machine Learning Journal 77(1), 27–59 (2009)
Article MATH Google Scholar
Kim, S.N., Wang, L., Baldwin, T.: Tagging and linking web forum posts. In: Procs. of the Fourteenth Conference on Computational Natural Language Learning, pp. 192–202. ACL (2010)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Procs. of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Langley, P. (ed.) Procs. of the 17th International Conference on Machine Learning, pp. 591–598 (2000)
Google Scholar
McDowell, L., Gupta, K.M., Aha, D.W.: Case-based collective classification. In: Wilson, D., Sutcliffe, G. (eds.) Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, pp. 399–404. AAAI Press (2007)
Google Scholar
McDowell, L., Gupta, K.M., Aha, D.W.: Cautious inference in collective classification. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pp. 596–601. AAAI Press (2007)
Google Scholar
McDowell, L., Gupta, K.M., Aha, D.W.: Cautious collective classification. Journal of Machine Learning Research 10, 2777–2836 (2009)
MathSciNet MATH Google Scholar
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Google Scholar
Somasundaran, S., Wiebe, J.: Recognizing stances in online debates. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) Procs. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 226–234. ACL (2009)
Google Scholar
Somasundaran, S., Wiebe, J.: Recognizing stances in ideological on-line debates. In: Procs. of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 116–124. ACL (2010)
Google Scholar
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Darwiche, A., Friedman, N. (eds.) Procs. of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pp. 485–492 (2002)
Google Scholar
Weiss, Y.: Comparing the mean field method and belief propagation for approximate inference in MRFs. In: Opper, M., Saad, D. (eds.) Advanced Mean Field Methods. The MIT Press (2001)
Google Scholar
Randall Wilson, D., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Insight Centre for Data Analytics, School of Computer Science and Information Technology, University College Cork, Ireland
Pádraig Ó Duinn & Derek Bridge

Authors

Pádraig Ó Duinn
View author publications
You can also search for this author in PubMed Google Scholar
Derek Bridge
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Université Laval, G1K 7P4, Québec, Canada
Luc Lamontagne
IIIA, Artificial Intelligence Research Institute CSIC, Spanish Council for Scientific Research Campus UAB, 08193, Bellaterra, Catalonia, Spain
Enric Plaza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ó Duinn, P., Bridge, D. (2014). Collective Classification of Posts to Internet Forums. In: Lamontagne, L., Plaza, E. (eds) Case-Based Reasoning Research and Development. ICCBR 2014. Lecture Notes in Computer Science(), vol 8765. Springer, Cham. https://doi.org/10.1007/978-3-319-11209-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-11209-1_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11208-4
Online ISBN: 978-3-319-11209-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics