Skip to main content

Collective Classification of Posts to Internet Forums

  • Conference paper
Case-Based Reasoning Research and Development (ICCBR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8765))

Included in the following conference series:

Abstract

We investigate automatic classification of posts to Internet forums. We use collective classification methods, which simultaneously classify related objects — in our case, the posts in a thread. Specifically, we compare the Iterative Classification Algorithm (ICA) with Conditional Random Fields and with conventional classifiers (k-Nearest Neighbours and Support Vector Machines). The ICA algorithm invokes a local classifier, for which we use the kNN classifier. Our main contributions are two-fold. First, we define experimental protocols that we believe are suitable for offline evaluation in this domain. Second, by using these protocols to run experiments on two datasets, we show that ICA with kNN has significantly higher accuracy across most of the experimental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bhatia, S., Biyani, P., Mitra, P.: Classifying user messages for managing web forum data. In: Ives, Z.G., Velegrakis, Y. (eds.) Procs. of the 15th International Workshop on the Web and Databases, pp. 13–18 (2012)

    Google Scholar 

  2. Burfoot, C., Bird, S., Baldwin, T.: Collective classification of congressional floor-debate transcripts. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) Procs. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1506–1515. ACL (2011)

    Google Scholar 

  3. Carvalho, V.R.: On the collective classification of email speech acts. In: Baeza-Yates, R.A., et al. (eds.) Procs. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM Press (2005)

    Google Scholar 

  4. Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: Procs. of the Conference on Empirical Methods in Natural Language Processing, pp. 309–316. ACL (2004)

    Google Scholar 

  5. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  7. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. Cambridge University Press (2011)

    Google Scholar 

  8. Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Machine Learning Journal 77(1), 27–59 (2009)

    Article  MATH  Google Scholar 

  9. Kim, S.N., Wang, L., Baldwin, T.: Tagging and linking web forum posts. In: Procs. of the Fourteenth Conference on Computational Natural Language Learning, pp. 192–202. ACL (2010)

    Google Scholar 

  10. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Procs. of the 18th International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  11. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Langley, P. (ed.) Procs. of the 17th International Conference on Machine Learning, pp. 591–598 (2000)

    Google Scholar 

  12. McDowell, L., Gupta, K.M., Aha, D.W.: Case-based collective classification. In: Wilson, D., Sutcliffe, G. (eds.) Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, pp. 399–404. AAAI Press (2007)

    Google Scholar 

  13. McDowell, L., Gupta, K.M., Aha, D.W.: Cautious inference in collective classification. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pp. 596–601. AAAI Press (2007)

    Google Scholar 

  14. McDowell, L., Gupta, K.M., Aha, D.W.: Cautious collective classification. Journal of Machine Learning Research 10, 2777–2836 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)

    Google Scholar 

  16. Somasundaran, S., Wiebe, J.: Recognizing stances in online debates. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) Procs. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 226–234. ACL (2009)

    Google Scholar 

  17. Somasundaran, S., Wiebe, J.: Recognizing stances in ideological on-line debates. In: Procs. of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 116–124. ACL (2010)

    Google Scholar 

  18. Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Darwiche, A., Friedman, N. (eds.) Procs. of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pp. 485–492 (2002)

    Google Scholar 

  19. Weiss, Y.: Comparing the mean field method and belief propagation for approximate inference in MRFs. In: Opper, M., Saad, D. (eds.) Advanced Mean Field Methods. The MIT Press (2001)

    Google Scholar 

  20. Randall Wilson, D., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ó Duinn, P., Bridge, D. (2014). Collective Classification of Posts to Internet Forums. In: Lamontagne, L., Plaza, E. (eds) Case-Based Reasoning Research and Development. ICCBR 2014. Lecture Notes in Computer Science(), vol 8765. Springer, Cham. https://doi.org/10.1007/978-3-319-11209-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11209-1_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11208-4

  • Online ISBN: 978-3-319-11209-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics