Skip to main content
Log in

The ACODEA framework: Developing segmentation and classification schemes for fully automatic analysis of online discussions

  • Published:
International Journal of Computer-Supported Collaborative Learning Aims and scope Submit manuscript

Abstract

Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also, segmenting is a necessary step, but frequently, trained models are very sensitive to the particulars of the segmentation that was used when the model was trained. Therefore, in prior published research on text classification in a CSCL context, the data was segmented by hand. We discuss work towards overcoming these challenges. We present a framework for developing coding schemes optimized for automatic segmentation and context-independent coding that builds on this segmentation. The key idea is to extract the semantic and syntactic features of each single word by using the techniques of part-of-speech tagging and named-entity recognition before the raw data can be segmented and classified. Our results show that the coding on the micro-argumentation dimension can be fully automated. Finally, we discuss how fully automated analysis can enable context-sensitive support for collaborative learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Andriessen, J., Baker, M., & Suthers, D. (2003). Argumentation, computer support, and the educational context of confronting cognitions. In J. Andriessen, M. Baker, & D. Suthers (Eds.), Arguing to learn: Confronting cognitions in computer-supported collaborative learning environments (pp. 1–25). Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text - Interdisciplinary Journal for the Study of Discourse, 23(3), 321–346. doi:10.1515/text.2003.014.

    Article  Google Scholar 

  • Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the blogosphere: Age, gender and the varieties of self-expression. First Monday 12(9).

  • Arnold, A. O. (2009). Exploiting domain and task regularities for robust named entity recognition. PhD thesis, Carnegie Mellon University.

  • Arora, S., Joshi, M., & Rosé, C. P. (2009). Identifying types of claims in online customer reviews. Paper presented at the Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (pp. 37–40), Boulder, Colorado, USA.

  • Arora, S., Mayfield, E., Rosé, C. P., & Nyberg, E. (2010). Sentiment classification using automatically extracted subgraph features. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (pp. 131–139), Los Angeles, California, USA.

  • Brill, E. (1992). A simple rule-based part of speech tagger. Paper presented at the Proceedings of the Third Conference on Applied Natural Language Processing (pp. 152–155), Trento, Italy.

  • Castro, F., Vellido, A., Nebot, A., & Minguillon, J. (2005). Detecting atypical student behaviour on an e-learning system. Paper presented at the Simposio Nacional de Tecnologas de la Informacin y las Comunicaciones en la Educacion (pp. 153–160), Granada, Spain.

  • Clark, D., Sampson, V., Weinberger, A., & Erkens, G. (2007). Analytic frameworks for assessing dialogic argumentation in online learning environments. Educational Psychology Review, 19(3), 343–374. doi:10.1007/s10648-007-9050-7.

    Article  Google Scholar 

  • Corney, M., de Vel, O., Anderson, A., & Mohay, G. (2002). Gender-preferential text mining of e-mail discourse. Paper presented at the the 18th Annual Computer Security Applications Conference (pp. 21–27), Las Vegas, NV, USA.

  • Daumé III, H. (2007). Frustratingly easy domain adaptation. Paper presented at the the 45th Annual Meeting of the Association of Computational Linguistics (pp. 256–263), Prague, Czech Republic.

  • De Laat, M., & Lally, V. (2003). Complexity, theory and praxis: Researching collaborative learning and tutoring processes in a networked learning community. Instructional Science, 31(1), 7–39. doi:10.1023/a:1022596100142.

    Article  Google Scholar 

  • De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers in Education, 46(1), 6–28. doi:10.1016/j.compedu.2005.04.005.

    Article  Google Scholar 

  • Diziol, D., Walker, E., Rummel, N., & Koedinger, K. (2010). Using intelligent tutor technology to implement adaptive support for student collaboration. Educational Psychology Review, 22(1), 89–102.

    Article  Google Scholar 

  • Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., & Fischer, F. (2005). Supporting CSCL with automatic corpus analysis technology. Paper presented at the Proceedings of th 2005 Conference on Computer Support for Collaborative Learning: Learning 2005: The Next 10 Years! (pp. 125–134), Taipei, Taiwan.

  • Duwairi, R. M. (2006). A framework for the computerized assessment of university student essays. Computers in Human Behavior, 22(3), 381–388.

    Article  Google Scholar 

  • Finkel, J., & Manning, C. (2009). Hierarchical bayesian domain adaptation. Paper presented at the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 602–610), Boulder, Colorado, USA.

  • Gianfortoni, P., Adamson, D., & Rosé, C. P. (2011). Modeling stylistic variation in social media with stretchy patters. Paper presented at the First Workshop on Algorithms and Resources for Modeling of Dialects and Language Varieties (pp. 49–59), Edinburgh, Scotland, UK.

  • Girju, R. (2010). Towards social causality: An analysis of interpersonal relationships in online blogs and forums. Paper presented at the the Fourth International AAAI Conference on Weblogs and Social Media (pp. 251–260), Montreal, Quebec, Canada.

  • Gweon, G., Rosé, C., Carey, R., & Zaiss, Z. (2006). Providing support for adaptive scripting in an on-line collaborative learning environment. Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 251–260), Montreal, Quebec, Canada.

  • Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. San Mateo: Morgan Kaufmann Publishers.

    Google Scholar 

  • Howley, I., Mayfield, E., & Rose, C. P. (2011). Missing something? Authority in collaborative learning. Paper presented at the Connecting Computer-Supported Collaborative Learning to Policy and Practice: CSCL2011 Conference (pp. 366–373), Hong Kong.

  • Jiang, M., & Argamon, S. (2008). Political leaning categorization by exploring subjectivities in political blogs. Paper presented at the the 4th International Conference on Data Mining(pp. 647–653), Las Vegas, Nevada, USA.

  • Joshi, M., & Rosé, C. P. (2009). Generalizing dependency features for opinion mining. Paper presented at the Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 313–316), Suntec, Singapore.

  • Klosgen, W., & Zytkow, J. (2002). Handbook of data mining and knowledge discovery. New York: Oxford University Press.

    Google Scholar 

  • Kumar, R., & Rosé, C. (2011). Architecture for building conversational agents that support collaborative learning. IEEE Transactions on Learning Technologies, 4(1), 21–34. doi:10.1109/tlt.2010.41.

    Article  Google Scholar 

  • Kumar, R., Rosé, C., Wang, Y.-C., Joshi, M., & Robinson, A. (2007). Tutorial dialogue as adaptive collaborative learning support. Paper presented at the Proceeding of the 2007 Conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work (pp. 383–390).

  • Landauer, T. K. (2003). Automatic essay assessment. Assessment in Education: Principles, Policy & Practice, 10(3), 295–308. doi:10.1080/0969594032000148154.

    Article  Google Scholar 

  • Mayfield, E., & Rosé, C. (2010a). An interactive tool for supporting error analysis for text mining. Paper presented at the Proceedings of the NAACL HLT 2010 Demonstration Session (pp. 25–28), Los Angeles, California.

  • Mayfield, E., & Rosé, C. (2010b). Using feature construction to avoid large feature spaces in text classification. Paper presented at the Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp. 1299–1306), Portland, Oregon, USA.

  • Mayfield, E., & Rosé, C. P. (2011). Recognizing authority in dialogue with an integer linear programming constrained model. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 1018–1026), Portland, Oregon.

  • Mora, G., & Peiró, J. A. S. (2007). Part-of-speech tagging based on machine translation techniques. Paper presented at the Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Part I (pp. 257–264), Girona, Spain.

  • MUC6. (1995). Paper presented at the the sixth message understanding conference. Maryland: Columbia.

    Google Scholar 

  • Mukherjee, A., & Liu, B. (2010). Improving gender classification of blog authors. Paper presented at the Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 207–217), Cambridge, Massachusetts.

  • Poel, M., Stegeman, L., & op den Akker, R. (2007). A support vector machine approach to dutch part-of-speech tagging. In M. R. Berthold, J. Shawe-Taylor, & N. Lavrac (Eds.), Advances in intelligent data analysis VII (Vol. 4723, pp. 274–283). Berlin: Springer Verlag.

    Chapter  Google Scholar 

  • Romero, C., & Ventura, S. (2006). Data mining in e-learning. Southampton: Wit Press.

    Book  Google Scholar 

  • Rosé, C., & Vanlehn, K. (2005). An evaluation of a hybrid language understanding approach for robust selection of tutoring goals. International Journal of AI in Education, 15(4), 325–355.

    Google Scholar 

  • Rosé, C., Wang, Y.-C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., & Fischer, F. (2008). Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. International Journal of Computer-Supported Collaborative Learning, 3(3), 237–271. doi:10.1007/s11412-007-9034-0.

    Article  Google Scholar 

  • Schler, J. (2006). Effects of age and gender on blogging. Artificial Intelligence, 86, 82–84.

    Google Scholar 

  • Schler, J., Koppel, M., Argamon, S., & Pennebaker, J. (2006). Effects of age and gender on blogging. Paper presented at the Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs (pp. 199–205), Stanford, California, USA.

  • Stegmann, K., Weinberger, A., & Fischer, F. (2007). Facilitating argumentative knowledge construction with computer-supported collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 2(4), 421–447. doi:10.1007/s11412-007-9028-y.

    Article  Google Scholar 

  • Stegmann, K., Wecker, C., Weinberger, A., & Fischer, F. (2012). Collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. Instructional Science, 40(2), 297–323. doi:10.1007/s11251-011-9174-5.

    Article  Google Scholar 

  • Strijbos, J.-W., Martens, R. L., Prins, F. J., & Jochems, W. M. G. (2006). Content analysis: What are they talking about? Computers in Education, 46(1), 29–48. doi:10.1016/j.compedu.2005.04.002.

    Article  Google Scholar 

  • Tsur, O., Davidov, D., & Rappoport, A. (2010). ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. Paper presented at the the Fourth International AAAI Conference on Weblogs and Social Media (pp. 162–169), Washington, DC, USA. http://staff.science.uva.nl/~otsur/papers/sarcasmAmazonICWSM10.pdf

  • Walker, E., Rummel, N., & Koedinger, K. (2009). CTRL: A research framework for providing adaptive collaborative learning support. User Modeling and User-Adapted Interaction, 19(5), 387–431.

    Article  Google Scholar 

  • Wang, H.-C., Rosé, C., & Chang, C.-Y. (2011). Agent-based dynamic support for learning from collaborative brainstorming in scientific inquiry. International Journal of Computer-Supported Collaborative Learning, 6(3), 371–395. doi:10.1007/s11412-011-9124-x.

    Article  Google Scholar 

  • Wecker, C., Stegmann, K., Bernstein, F., Huber, M., Kalus, G., Kollar, I., & Fischer, F. (2010). S-COL: A copernican turn for the development of flexibly reusable collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 5(3), 321–343. doi:10.1007/s11412-010-9093-5.

    Article  Google Scholar 

  • Weinberger, A., & Fischer, F. (2006). A framework to analyze argumentative knowledge construction in computer-supported collaboratice learning. [Journal]. Computers in Education, 46, 71–95.

    Article  Google Scholar 

  • Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92(4), 548–573. doi:10.1037/0033-295x.92.4.548.

    Article  Google Scholar 

  • Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational Linguistics, 30(3), 277–308. doi:10.1162/0891201041850885.

    Article  Google Scholar 

  • Witten, L. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco: Elsevier.

    Google Scholar 

  • Yan, X., & Yan, L. (2006). Gender classification of weblog authors. Paper presented at the the AAAI Spring Symposium Series Computational Approaches to Analyzing Weblogs(pp. 228–230), Stanford, California, USA.

  • Zhang, Y., Dang, Y., & Chen, H. (2009). Gender difference analysis of political web forums: An experiment on an international Islamic women’s forums. Paper presented at the Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics (pp. 61–64), Richardson, Texas, USA.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Mu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mu, J., Stegmann, K., Mayfield, E. et al. The ACODEA framework: Developing segmentation and classification schemes for fully automatic analysis of online discussions. Computer Supported Learning 7, 285–305 (2012). https://doi.org/10.1007/s11412-012-9147-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11412-012-9147-y

Keywords

Navigation