Abstract
Conversational message thread identification regards a wide spectrum of applications, ranging from social network marketing to virus propagation, digital forensics, etc. Many different approaches have been proposed in literature for the identification of conversational threads focusing on features that are strongly dependent on the dataset. In this paper, we introduce a novel method to identify threads from any type of conversational texts overcoming the limitation of previously determining specific features for each dataset. Given a pool of messages, our method extracts and maps in a three dimensional representation the semantic content, the social interactions and the timestamp; then it clusters each message into conversational threads. We extend our previous work by introducing a deep learning approach and by performing new extensive experiments and comparisons with classical learning algorithms.
G. Domeniconi—This work was partially supported by the european project “TOREADOR” (grant agreement no. H2020-688797).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Jurczyk, P., Agichtein, E.: Discovering authorities in question answer communities by using link analysis. In: CIKM, Lisbon, Portugal, 6–10 November 2007, pp. 919–922 (2007)
Coussement, K., den Poel, D.V.: Improving customer complaint management by automatic email classification using linguistic style features as predictors. Decis. Support Syst. 44, 870–882 (2008)
Glass, K., Colbaugh, R.: Toward emerging topic detection for business intelligence: Predictive analysis of meme’ dynamics. CoRR abs/1012.5994 (2010)
Khan, F.M., Fisher, T.A., Shuler, L., Wu, T., Pottenger, W.M.: Mining chatroom conversations for social and semantic interactions. In: Technical report LU-CSE-02-011, Lehigh University (2002)
Hofmann, T.: Probabilistic latent semantic indexing. In: ACM SIGIR, pp. 50–57. ACM (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Shen, D., Yang, Q., Sun, J., Chen, Z.: Thread detection in dynamic text message streams. In: SIGIR, Washington, USA, 6–11 August 2006, pp. 35–42 (2006)
Huang, J., Zhou, B., Wu, Q., Wang, X., Jia, Y.: Contextual correlation based thread detection in short text message streams. J. Intell. Inf. Syst. 38, 449–464 (2012)
Adams, P.H., Martell, C.H.: Topic detection and extraction in chat. In: ICSC 2008, pp. 581–588 (2008)
Yeh, J.: Email thread reassembly using similarity matching. In: CEAS, 27–28 July 2006, Mountain View, California, USA (2006)
Domeniconi, G., Semertzidis, K., Lopez, V., Daly, E.M., Kotoulas, S., Moro, G.: A novel method for unsupervised and supervised conversational message thread detection. In: Proceedings of the 5th International Conference on Data Management Technologies and Applications, vol. 1, DATA, pp. 43–54 (2016)
Zhao, Q., Mitra, P.: Event detection and visualization for social text streams. In: ICWSM, Boulder, Colorado, USA, 26–28 March 2007
Lena, P., Domeniconi, G., Margara, L., Moro, G.: Gota: go term annotation of biomedical literature. BMC Bioinform. 16, 346 (2015)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, Portland, Oregon, USA, pp. 226–231 (1996)
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42, 2785–2797 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2, 1 (2015)
Zhao, Q., Mitra, P., Chen, B.: Temporal and information flow based event detection from social text streams. In: AAAI, 22–26 July 2007, Vancouver, British Columbia, Canada, pp. 1501–1506 (2007)
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. In: Data Management Technologies and Applications (DATA 2015), Revised Selected Papers, pp. 39–58, vol. 553. Springer, Heidelberg (2016)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 513–523 (1988)
Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24, 35–43 (2001)
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Aumayr, E., Chan, J., Hayes, C.: Reconstruction of threaded conversations in online discussion forums. In: Weblogs and Social Media (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. Book in preparation for MIT Press (2016)
Sugomori, Y.: Java Deep Learning Essentials. Packt Publishing Ltd., Birmingham (2016)
Ulrich, J., Murray, G., Carenini, G.: A publicly available annotated corpus for supervised email summarization. In: AAAI08 EMAIL Workshop (2008)
Soboroff, I., de Vries, A.P., Craswell, N.: Overview of the TREC 2006 enterprise track. In: TREC, Gaithersburg, Maryland, USA, 14–17 November 2006 (2006)
Dehghani, M., Shakery, A., Asadpour, M., Koushkestani, A.: A learning approach for email conversation thread reconstruction. J. Inf. Sci. 39, 846–863 (2013)
Erera, S., Carmel, D.: Conversation detection in email systems. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 498–505. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78646-7_48
Wu, Y., Oard, D.W.: Indexing emails and email threads for retrieval. In: SIGIR, pp. 665–666 (2005)
Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Raschka, S.: Python Machine Learning. Packt Publishing, Birmingham (2015)
Wang, X., Xu, M., Zheng, N., Chen, M.: Email conversations reconstruction based on messages threading for multi-person. In: ETTANDGRS 2008, vol. 1, pp. 676–680 (2008)
Joshi, S., Contractor, D., Ng, K., Deshpande, P.M., Hampp, T.: Auto-grouping emails for faster e-discovery. PVLDB 4, 1284–1294 (2011)
Wang, H., Wang, C., Zhai, C., Han, J.: Learning online discussion structures by conditional random fields. In: SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 435–444 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Domeniconi, G., Semertzidis, K., Moro, G., Lopez, V., Kotoulas, S., Daly, E.M. (2017). Identifying Conversational Message Threads by Integrating Classification and Data Clustering. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-62911-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62910-0
Online ISBN: 978-3-319-62911-7
eBook Packages: Computer ScienceComputer Science (R0)