Abstract
With user-generated content explosively growing, how to find valuable posts from discussion threads in web communities becomes a hot topic. Although many learning algorithms have been proposed for mining the thread contents, there are still two problems that are not effectively considered. First, the learning algorithms are usually complicated so as to deal with various kinds of threads in web communities, which damages the generalization performance of the algorithms and takes the risk of overfitting to the learning models. Second, the small sample size problem exists when the training data for learning is divided into many isolated groups and each group is trained separately in order to avoid overfitting. In this paper, we propose a metadata-based clustered multi-task learning method, which takes full use of the metadata of threads and fuses it in the multi-task learning based on a divide-and-learn strategy. Our method provides an effective solution to the above problems by finding the geometric structure or context of semantics of threads in web communities and constructing the relations among training thread groups and their corresponding learning tasks. In addition, a soft-assigned clustered multi-task learning model is employed. Our experimental results show the effectiveness of our method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, pp. 79–86. Association for Computational Linguistics (2002)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)
Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 467–474. ACM (2008)
Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden markov model. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 343–348. ACM (2001)
Shen, D., Yang, Q., Sun, J.T., Chen, Z.: Thread detection in dynamic text message streams. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 35–42. ACM (2006)
Lin, C., Yang, J.M., Cai, R., Wang, X.J., Wang, W.: Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 131–138. ACM (2009)
Poh, N., Kittler, J., Bourlai, T.: Quality-based score normalization with device qualitative information for multimodal biometric fusion. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 40(3), 539–554 (2010)
Poh, N., Kittler, J.: A unified framework for biometric expert fusion incorporating quality measures. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(1), 3–18 (2012)
Wu, O., Hu, R., Mao, X., Hu, W.: Quality-based learning for web data classification. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Fu, Z., Robles-Kelly, A., Zhou, J.: Mixing linear svms for nonlinear classification. IEEE Transactions on Neural Networks 21(12), 1963–1975 (2010)
Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp. 307–315 (2013)
Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 567–580. Springer, Heidelberg (2003)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, vol. 2, p. II-762. IEEE
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research 6, 1817–1853 (2005)
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM (2004)
Thrun, S., O’Sullivan, J.: Clustering learning tasks and the selective cross-task transfer of knowledge. Springer (1998)
Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research 4, 83–99 (2003)
Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 543–550 (2010)
Chen, J., Liu, J., Ye, J.: Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(4), 22 (2012)
Chen, J., Tang, L., Liu, J., Ye, J.: A convex formulation for learning shared structures from multiple tasks. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 137–144. ACM (2009)
Xue, Y., Liao, X., Carin, L., Krishnapuram, B.: Multi-task learning for classification with dirichlet process priors. The Journal of Machine Learning Research 8, 35–63 (2007)
Jacob, L., Bach, F., Vert, J.P., et al.: Clustered multi-task learning: a convex formulation. In: NIPS, vol. 21, pp. 745–752 (2008)
Zhou, J., Chen, J., Ye, J.: Clustered multi-task learning via alternating structure optimization. In: NIPS, pp. 702–710 (2011)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2(1), 718–729 (2009)
Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(2), 12 (2011)
Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
Wu, O., Hu, W., Maybank, S.J., Zhu, M., Li, B.: Efficient clustering aggregation based on data fragments. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 42(3), 913–926 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
You, Q., Wu, O., Luo, G., Hu, W. (2016). Metadata-Based Clustered Multi-task Learning for Thread Mining in Web Communities. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-41920-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)