Skip to main content
Log in

Clustering data with partial background information

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Clustering with partial supervision background information or semi-supervised clustering, learning from a combination of both labeled and unlabeled data, has received a lot of attention over the last decade. The supervisory information is usually used as the constraints to bias clustering towards a good region of search space. This paper proposes a semi-supervised algorithm, called constrained non-negative matrix factorization (Constrained-NMF), with a few labeled examples as constraints to improve performance. The proposed algorithm is a matrix factorization algorithm, in which initialization of matrices is required at the beginning. Although the benefits of good initialization are well-known, randomized seeding of basis matrix and coefficient matrix is still the standard approach for many non-negative matrix factorization (NMF) algorithms. This work devises an algorithm called entropy-based weighted semi-supervised fuzzy c-means (EWSS-FCM) algorithm to initialize the seeds. The experimental results indicate that the proposed Constrained-NMF can benefit from the initialization obtained from EWSS-FCM, which emphasizes the role of labeled examples and automatically weights them during the course of clustering. This work considers labeled examples in the objective functions to devise the two algorithms, in which the labeled information is propagated to unlabeled examples iteratively. We further analyze the proposed Constrained-NMF and give convergence justifications. The experiments are conducted on five real data sets, and experimental results indicate that the proposed algorithm generally outperforms the other alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. ftp://ftp.cs.cornell.edu/pub/smart/.

  2. http://qwone.com/jason/20Newsgroups/.

  3. http://www.citeulike.org/.

  4. https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval.

  5. http://academic.research.microsoft.com.

References

  1. Agarwal P, Alam MA, Biswas R (2011) Issues, challenges and tools of clustering algorithms. CoRR. abs/1110.2610

  2. Ashfaq RAR, Wang X, Huang JZ, Abbas H, He Y (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497

    Article  Google Scholar 

  3. Baili N, Frigui H (2011) Relational fuzzy clustering with multiple kernels. In: Proceedings of the 2011 IEEE 11th international conference on data mining workshops, Washington, DC, IEEE Computer Society, pp 488–495

  4. Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: Proceedings of the nineteenth international conference on machine learning, San Francisco, pp 27–34. Morgan Kaufmann Publishers Inc

  5. Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, New York, pp 59–68. ACM

  6. Beliakov G, James S, Li G (2011) Learning Choquet-integral-based metrics for semisupervised clustering. Trans Fuz Sys 19(3):562–574

    Article  Google Scholar 

  7. Belkin M, Niyogi P (2004) Semi-supervised learning on Riemannian manifolds. Mach Learn 56(1–3):209–239

    Article  MATH  Google Scholar 

  8. Bensaid AM, Hall LO, Bezdek JC, Clarke LP (1996) Partially supervised clustering for image segmentation. Pattern Recognit 29(5):859–871

    Article  Google Scholar 

  9. Berry MW, Browne M (2005) Email surveillance using non-negative matrix factorization. Comput Math Org Theory 11(3):249–264

    Article  MATH  Google Scholar 

  10. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont

    MATH  Google Scholar 

  11. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  12. Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the eighteenth international conference on machine learning, San Francisco, pp 19–26. Morgan Kaufmann Publishers Inc

  13. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, New York, pp 92–100. ACM

  14. Bosc P, Prade H (1996) An introduction to the fuzzy set and possibility theory-based treatment of flexible queries and uncertain or imprecise databases. In: Uncertainty management in information systems, pp 285–324

  15. Bouchachia A, Pedrycz W (2006) Data clustering with partial supervision. Data Min Knowl Discov 12:47–78

    Article  MathSciNet  MATH  Google Scholar 

  16. Bregler C, Omohundro SM (1993) Surface learning with applications to lipreading. In: Advances in neural information processing systems 6, 7th NIPS conference, Denver, Colorado, USA, pp 43–50

  17. Cao H, Deng H-W, Wang Y-P (2012) Segmentation of m-fish images for improved classification of chromosomes with an adaptive fuzzy c-means clustering algorithm. IEEE Trans Fuz Sys 20(1):1–8

    Article  Google Scholar 

  18. Carlson A, Betteridge J, Wang RC, Hruschka ER Jr., Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Proceedings of the third ACM international conference on web search and data mining, New York, pp 101–110. ACM

  19. Chapelle O, Weston J, Schlkopf B (2002) Cluster kernels for semi-supervised learning. In: Advances in neural information processing systems 15, Neural information processing systems, NIPS 2002, 9–14 December 2002, Vancouver, British Columbia, Canada, pp 585–592

  20. Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the fifteenth international Florida artificial intelligence research society conference, pp 327–331

  21. Culp M, Michailidis G (2008) Graph-based semisupervised learning. IEEE Trans Pattern Anal Mach Intell 30(1):174–179

    Article  Google Scholar 

  22. Devarajan K (2008) Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput Biol 4(7). https://doi.org/10.1371/journal.pcbi.1000029

  23. Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, pp 606–610

  24. Fodor I (2002) A survey of dimension reduction techniques

  25. Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the first workshop on graph based methods for natural language processing, Stroudsburg, pp 45–52. Association for Computational Linguistics

  26. Hamasuna Y, Endo Y, Miyamoto S (2010) Semi-supervised fuzzy c-means clustering using clusterwise tolerance based pairwise constraints. In: Proceedings of the 2010 IEEE international conference on granular computing, Washington, DC, pp 188–193. IEEE Computer Society

  27. Huang H-C, Chuang Y-Y, Chen C-S (2012) Multiple kernel fuzzy clustering. IEEE Trans Fuz Syst 20(1):120–134

    Article  Google Scholar 

  28. Hüllermeier E (2005) Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 156(3):387–406

    Article  MathSciNet  Google Scholar 

  29. Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50

    Article  Google Scholar 

  30. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  31. Ji X, Xu W (2006) Document clustering with prior knowledge. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, New York, pp 405–412. ACM

  32. Joachims T (1999) Making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169–184

    Google Scholar 

  33. Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the sixteenth international conference on machine learning, San Francisco, pp 200–209. Morgan Kaufmann Publishers Inc

  34. Joachims T (2003) Transductive learning via spectral graph partitioning. In: Machine learning, Proceedings of the twentieth international conference (ICML 2003), 21–24 August 2003, Washington, DC, USA, pp 290–297

  35. Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  36. Klose A, Kruse R (2005) Semi-supervised learning in knowledge discovery. Fuzzy Sets Syst 149(1):209–233

    Article  MathSciNet  MATH  Google Scholar 

  37. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  Google Scholar 

  38. Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791

    Article  MATH  Google Scholar 

  39. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, Papers from neural information processing systems (NIPS) 2000, Denver, CO, USA, pp 556–562

  40. Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems 19, Proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 801–808

  41. Liu C, Chang T, Li H (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-kmeans. Fuzzy Sets Syst 221:48–64

    Article  MathSciNet  MATH  Google Scholar 

  42. Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  43. Maraziotis IA (2012) A semi-supervised fuzzy clustering algorithm applied to gene expression data. Pattern Recognit 45(1):637–648

    Article  MATH  Google Scholar 

  44. Miyamoto S, Yamazaki M, Hashimoto W (2009) Fuzzy semi-supervised clustering with target clusters using different additional terms. In: The 2009 IEEE international conference on granular computing, GrC 2009, Lushan Mountain, pp 444–449. IEEE

  45. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39:103–134

    Article  MATH  Google Scholar 

  46. Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern Part B Cybern 27(5):787–795

    Article  Google Scholar 

  47. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  48. Shi X, Tseng B, Adamic L (2009) Information diffusion in computer science citation networks. In: Proceedings of the third international conference on weblogs and social media. arXiv:0905.2636

  49. Labzour JBT, Bensaid A (1998) Improved semi-supervised point-prototype clustering algorithms. In: IEEE international conference on fuzzy systems proceedings, vol 2, pp 1383–1387

  50. Takács G, Pilászy I, Németh B, Tikk D (2008) Matrix factorization and neighbor based algorithms for the netflix prize problem. In: Proceedings of the 2008 ACM conference on recommender systems, New York, pp 267–274. ACM

  51. Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained K-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, San Francisco, pp 577–584. Morgan Kaufmann Publishers Inc

  52. Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of the SIAM international conference on data mining, Atlanta, pp 1–12. SIAM

  53. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern classification

  54. Wang W, Zhou Z-H (June 2010) A new analysis of co-training. In: Fürnkranz J, Joachims T (eds) Proceedings of the 27th international conference on machine learning, Haifa, pp 1135–1142. Omnipress

  55. Wang X, Ashfaq RAR, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196

    Article  MathSciNet  Google Scholar 

  56. Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, New York, pp 563–572. ACM

  57. Wang X, Xing H, Li Y, Hua Q, Dong C, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Article  Google Scholar 

  58. Xiang G, Kreinovich V (2010) Extending maximum entropy techniques to entropy constraints. In: Fuzzy information processing society (NAFIPS), 2010 Annual meeting of the North American, pp 1–7

  59. Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789

    Article  MathSciNet  MATH  Google Scholar 

  60. Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl Based Syst 35:304–311

    Article  Google Scholar 

  61. Yu K, Zhu S, Lafferty J, Gong Y (2009) Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, New York, pp 211–218. ACM

  62. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  MATH  Google Scholar 

  63. Zadeh LA (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern SMC 3:28–44

    Article  MathSciNet  MATH  Google Scholar 

  64. Zhao H, Qi Z (2010) Hierarchical agglomerative clustering with ordering constraints. In: WKDD, IEEE Computer Society, pp 195–199

  65. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Proceedings of advances in neural information processing systems, pp 321–328

  66. Zhou K, Gui-Rong X, Yang Q, Yu Y (2010) Learning with positive and unlabeled examples using topic-sensitive PLSA. IEEE Trans Knowl Data Eng 22:46–58

    Article  Google Scholar 

  67. Zhu X (2005) Semi-supervised learning literature survey. Technical report 1530, Computer Sciences, University of Wisconsin-Madison, Madison

  68. Zhu X (2010) Semi-supervised learning. In: Encyclopedia of machine learning, pp 892–897

  69. Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the twentieth international conference on machine learning, pp 912–919

Download references

Acknowledgements

This work was supported in part by Ministry of Science and Technology, Taiwan, under Grant Nos. MOST 106-2221-E-009-100 and MOST 105-2218-E-009-034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Hoar Hsaio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, CL., Hsaio, WH., Chang, TH. et al. Clustering data with partial background information. Int. J. Mach. Learn. & Cyber. 10, 1123–1138 (2019). https://doi.org/10.1007/s13042-018-0790-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-018-0790-0

Keywords

Navigation