Abstract
Nowadays, mailing lists are widely used in team work for discussion and consultation. Identifying important emails in mailing list discussions could significantly benefit content summary and opinion leader recognition. However, previous studies only focus on the importance evaluation methods regarding personal emails, and there is no consensus on the definition of important emails. Therefore, in this paper we consider the characteristics of mailing lists and study how to evaluate email importance in mailing list discussions. Our contribution mainly includes the following aspects. First, we propose ER-Match, an email conversation thread reconstruction algorithm that takes nested quotation relationships into consideration while constructing the email relationship network. Based on the email relationship network, we formulate the importance of emails in mailing list discussions. Second, we propose a feature-rich learning method to predict the importance of new emails. Furthermore, we characterize various factors affecting email importance in mailing list discussions. Experiments with publicly available mailing lists show that our prediction model outperforms baselines with large gains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
The body here refers to contents without header, signature and quotation.
- 4.
- 5.
- 6.
For email importance prediction with XGBoost, we set learning_rate = 0.1, n_estimators = 1000, max_depth = 5, min_child_weight = 1, gamma = 0, subsample = 0.8, colsample_bytree = 0.8, objective = ’binary:logistic’, scale_pos_weight = 1, seed = 27.
References
Aberdeen, D., Pacovsky, O., Slater, A.: The learning behind gmail priority inbox. In: LCCC: NIPS 2010 Workshop on Learning on Cores, Clusters and Clouds (2010)
Albitar, S., Fournier, S., Espinasse, B.: An effective TF/IDF-based text-to-text semantic similarity measure for text classification. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014. LNCS, vol. 8786, pp. 105–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11749-2_8
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Dabbish, L.A., Kraut, R.E., Fussell, S., Kiesler, S.: Understanding email use: predicting action on a message. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 691–700. ACM (2005)
Dehghani, M., Shakery, A., Asadpour, M., Koushkestani, A.: A learning approach for email conversation thread reconstruction. J. Inf. Sci. 39(6), 846–863 (2013)
Golbeck, J., Hendler, J.: Inferring binary trust relationships in web-based social networks. ACM Tran. Internet Technol. (TOIT) 6(4), 497–529 (2006)
Jain, A.: XGboost tuning. https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/. Accessed 24 July 2018
Joshi, S., Contractor, D., Ng, K., Deshpande, P.M., Hampp, T.: Auto-grouping emails for faster e-discovery. Proc. VLDB Endow. 4(12), 1284–1294 (2011)
Lewis, D.D., Knowles, K.A.: Threading electronic mail: a preliminary study. Inf. Process. Manage. 33(2), 209–217 (1997)
Liu, L., Tang, J., Han, J., Jiang, M., Yang, S.: Mining topic-level influence in heterogeneous networks. In: CIKM ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October, pp. 199–208 (2010)
Merton, R.K.: The Matthew effect in science: the reward and communication systems of science are considered. Science 159(3810), 56–63 (1968)
Page, L.: The pagerank citation ranking: bringing order to the web. Stanford Digital Libraries Working Paper 9(1), 1–14 (1999)
Passant, A., Zimmermann, A., Schneider, J., Breslin, J.G.: A semantic framework for modelling quotes in email conversations. In: Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications. ACM (2010)
Sharaff, A., Nagwani, N.K.: Email thread identification using latent Dirichlet allocation and non-negative matrix factorization based clustering techniques. J. Inf. Sci. 42(2), 200–212 (2016)
Tsugawa, S., Ohsaki, H., Imase, M.: Estimating message importance using inferred inter-recipient trust for supporting email triage. Inf. Media Technol. 7(3), 1073–1082 (2012)
Wu, Y., Oard, D.W.: Indexing emails and email threads for retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665–666. ACM (2005)
Yang, L., Dumais, S.T., Bennett, P.N., Awadallah, A.H.: Characterizing and predicting enterprise email reply behavior. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235–244. ACM (2017)
Yoo, S., Yang, Y., Lin, F., Moon, I.C.: Mining social networks for personalized email prioritization. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 967–976. ACM (2009)
Zawinski, J.: Message threading. https://www.jwz.org/doc/threading.html/. Accessed 10 May 2018
Zhang, F., Xu, K.: Annotation and classification of an email importance corpus. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 651–656 (2015)
Acknowledgement
This work is supported by National Key Research & Development Program (2016YFB1000503).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, K., Hu, C., Sun, J., Shen, Q., Jiang, X. (2019). Email Importance Evaluation in Mailing List Discussions. In: Hacid, H., Sheng, Q., Yoshida, T., Sarkheyli, A., Zhou, R. (eds) Data Quality and Trust in Big Data. QUAT 2018. Lecture Notes in Computer Science(), vol 11235. Springer, Cham. https://doi.org/10.1007/978-3-030-19143-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-19143-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19142-9
Online ISBN: 978-3-030-19143-6
eBook Packages: Computer ScienceComputer Science (R0)