Collaborative text categorization via exploiting sparse coefficients

Yao, Lina; Sheng, Quan Z.; Wang, Xianzhi; Wang, Shengrui; Li, Xue; Wang, Sen

doi:10.1007/s11280-017-0460-2

Collaborative text categorization via exploiting sparse coefficients

Published: 28 April 2017

Volume 21, pages 373–394, (2018)
Cite this article

World Wide Web Aims and scope Submit manuscript

Lina Yao¹,
Quan Z. Sheng²,
Xianzhi Wang¹,
Shengrui Wang³,
Xue Li⁴ &
…
Sen Wang⁵

464 Accesses
3 Citations
Explore all metrics

Abstract

Text categorization is widely characterized as a multi-label classification problem. Robust modeling of the semantic similarity between a query text and training texts is essential to construct an effective and accurate classifier. In this paper, we systematically investigate the Web page/text classification problem via integrating sparse representation with random measurements. In particular, we first adopt a very sparse data-independent random measurement matrix to map the original high dimensional text feature space to a lower dimensional space without loss of key information. We then propose a generic sparse representation method to obtain the sparse solution by decoding the semantic correlations between the query text and entire training samples. Based on the above method, we also design and examine a series of rules by taking advantage of the sparse coefficients to propagate multiple labels for the given query texts. We have conducted extensive experiments using real-world datasets to examine our proposed approach, and the results show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Achlioptas, D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Article MathSciNet MATH Google Scholar
Aggarwal, C.C., Zhai, C.: A Survey of Text Clustering Algorithms. In: C.C. Aggarwal and C. Zhai, editors, Mining Text Data . Springer, pp. 163–222 (2012)
Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28(3), 253–263 (2008)
Article MathSciNet MATH Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MathSciNet MATH Google Scholar
Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: Universal encoding strategies?. IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)
Article MathSciNet MATH Google Scholar
Chen, G., Song, Y., Wang, F., Zhang, C.: Semi-supervised multi-label learning by solving a sylvester equation. In: SDM. SIAM, pp. 410–419 (2008)
Chen, S. S., Donoho, D. L., Saunders, M. A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)
Article MathSciNet MATH Google Scholar
Cheng, B., Yang, J., Yan, S., Fu, Y., Huang, T.S.: Learning with l1-graph for image analysis. IEEE Trans. Image Process. 19(4), (2010)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253–262 (2004)
Donoho, D.L., Tsaig, Y.: Fast solution of-norm minimization problems when the solution may be sparse. IEEE Trans. Inf. Theory 54(11), 4789–4812 (2008)
Article MathSciNet MATH Google Scholar
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 2790–2797 (2009)
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)
Article MathSciNet Google Scholar
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM 2005). ACM, pp. 195–200 (2005)
Guo, Y., Schuurmans, D.: Semi-supervised Multi-label Classification: A Simultaneous Large-margin, Subspace Learning Approach. In: Proceedings of the European Conference on Machine Learning (ECML 2012), Bristol, UK (2012)
Hotelling, H.: Relations between two sets of variates. Biometrika, pp. 321–377 (1936)
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004) (2004)
Ji, S., Tang, L., Yu, S., Ye, J.: A shared-subspace learning framework for multi-label classification. ACM Trans. Knowl. Disc. Data (TKDD) 4(2), 8 (2010)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Springer (1998)
Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)
Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)
Liu, Y., Jin, R., Yang, L.: Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence (AAAI 2006). AAAI Press (2006)
Long, M., Wang, J., Ding, G., Shen, D., Yang, Q.: Transfer learning with graph Co-regularization. IEEE Trans. Knowl. Data Eng. 26(7), 1805–1818 (2014)
Article Google Scholar
Macskassy, S. A., Provost, F.: Classification in networked data A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007)
Google Scholar
Mairal, J., Elad, M., Sapiro, G.: Sparse representation for color image restoration. IEEE Trans. Image Process. 17(1), 53–69 (2008)
Article MathSciNet MATH Google Scholar
Qi, X., Davison, B.D.: Web page classification: Features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)
Article Google Scholar
Ramage, D., Manning, C. D., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011) (2011)
Sainath, T.N., Maskey, S., Kanevsky, D., Ramabhadran, B., Nahamoo, D., Hirschberg, J.: Sparse representations for text categorization. In: INTERSPEECH (2010)
Sharma, N., Sharma, A., Thenkanidiyoor, V., Dileep, A.D.: Text classification using combined sparse representation classifiers and support vector machines. In: 2016 4th International Symposium on Computational and Business Intelligence (ISCBI). IEEE, pp. 181–185 (2016)
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 817–826 (2009)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Article Google Scholar
Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in neural information processing systems, ppp. 721–728 (2002)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1–2), 69–90 (1999)
Article Google Scholar
Yao, L., Sheng, Q.Z., Ngu, A. H.H., Gao, B.J., Li, X., Wang, S.: Multi-label classification via learning a unified object-label graph with sparse representation. World Wide Web 19(6), 1125–1149 (2016)
Article Google Scholar
Yao, L., Sheng, Q.Z., Ngu, A. H.H. , Li, X.: Things of interest recommendation by leveraging heterogeneous relations in the internet of things. ACM Trans. Int. Tech. (TOIT) 16(2), 9 (2016)
Google Scholar
Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for Web object classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009) (2009)
Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2005)
Zhang, M.L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2010)
Zhang, Y., Zhang, W., Pei, J., Lin, X., Lin, Q., Li, A.: Consensus-based ranking of multivalued objects A generalized borda count approach. IEEE Trans. Knowl. Data Eng. 26(1), 83–96 (2014)
Article Google Scholar
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16, 321–328 (2004)
Google Scholar
Zhu, S., Ji, X., Xu, W., Gong, Y.: Multi-labelled classification using maximum entropy method. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 274–281 (2005)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), Washington, USA (2003)

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, The University of New South Wales, NSW, 2052, Australia
Lina Yao & Xianzhi Wang
Department of Computing, Macquarie University, NSW, 2109, Australia
Quan Z. Sheng
Department of Computer Science, University of Sherbrooke, Sherbrooke, Canada
Shengrui Wang
School of Information Technology and Electrical Engineering, The University of Queensland, QLD, 4072, Australia
Xue Li
School of Information and Communication Technology, Griffith University, QLD, 4125, Australia
Sen Wang

Authors

Lina Yao
View author publications
You can also search for this author inPubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author inPubMed Google Scholar
Xianzhi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xue Li
View author publications
You can also search for this author inPubMed Google Scholar
Sen Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lina Yao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, L., Sheng, Q.Z., Wang, X. et al. Collaborative text categorization via exploiting sparse coefficients. World Wide Web 21, 373–394 (2018). https://doi.org/10.1007/s11280-017-0460-2

Download citation

Received: 12 August 2015
Revised: 16 January 2017
Accepted: 19 April 2017
Published: 28 April 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11280-017-0460-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collaborative text categorization via exploiting sparse coefficients

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parameter-free text classification method based on dual compressors

Optimization of Classification Algorithm for Improving Semantic-Based Text Classification

S2-HTC: Hierarchical Text Classification via Fusing the Structural and Semantic Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Collaborative text categorization via exploiting sparse coefficients

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parameter-free text classification method based on dual compressors

Optimization of Classification Algorithm for Improving Semantic-Based Text Classification

S2-HTC: Hierarchical Text Classification via Fusing the Structural and Semantic Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now