Abstract
In this paper, we propose a real data clustering method based on active learning. Clustering methods are difficult to apply to real data for two reasons. First, real data may include outliers that adversely affect clustering. Second, the clustering parameters such as the number of clusters cannot be made constant because the number of classes of real data may increase as time goes by. To solve the first problem, we focus on labeling outliers. Therefore, we develop a stream-based active learning framework for clustering. The active learning framework enables us to label the outliers intensively. To solve the second problem, we also develop an algorithm to automatically set clustering parameters. This algorithm can automatically set the clustering parameters with some labeled samples. The experimental results show that our method can deal with the problems mentioned above better than the conventional clustering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Halim, Z., Atif, M., Rashid, A.: Profiling players using real-world datasets: clustering the data and correlating the results with the big-five personality traits. IEEE Trans. Affect. Comput., 1–18 (2017)
Bijuraj, L.V.: Clustering and its applications. In: Proceedings of National Conference on New Horizons in IT - NCNHIT 2013, pp. 169–172 (2013)
Tran, N., Vo, B., Phung, D.: Clustering for point pattern data. In: Proceedings of the 2016 23rd International Conference on Pattern Recognition (2013)
Kamishima, T., Motoyoshi, F.: Learning from cluster examples. Mach. Learn. 53(3), 199–233 (2003)
Bair, E.: Semi-supervised clustering methods. Wiley Interdisc. Rev. Comput. Stat. 5(5), 349–361 (2013)
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: Proceedings of the Review of Machine Learning Techniques for Processing MUSCLE European Network of Excellence (2004)
Wang, Y., Chen, S., Zhou, Z.: New semi-supervised classification method based on modified cluster assumption. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 689–702 (2012)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the 9th ICML, pp. 577–584 (2001)
Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Martinez-Uso, A., Pla, F., Sotoca, J.: A semi-supervised Gaussian mixture model for image segmentation. In: Proceedings of 20th International Conference on Pattern Recognition, pp. 2941–2944 (2010)
Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised fuzzy clustering. Pattern Recogn. 41(5), 1834–1844 (2008)
Gosselin, P.H., Cord, M.: Active learning methods for interactive image retrieval. IEEE Trans. Image Process. 17(7), 1200–1211 (2008)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Narr, A., Triebel, R., Cremers, D.: Stream-based active learning for efficient and adaptive classification of 3D objects. In: Proceedings of 2016 IEEE International Conference on Robotics and Automation (2016)
Fujii, K., Kashima, H.: Budgeted stream-based active learning via adaptive submodular maximization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems (2016)
Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml>
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Odate, R., Shinjo, H., Suzuki, Y., Motobayashi, M. (2018). Semi-supervised Clustering Framework Based on Active Learning for Real Data. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science(), vol 11004. Springer, Cham. https://doi.org/10.1007/978-3-319-97785-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-97785-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97784-3
Online ISBN: 978-3-319-97785-0
eBook Packages: Computer ScienceComputer Science (R0)