Abstract
Data clustering is the process of dividing data elements into clusters so that items in the same cluster are as similar as possible, and items in different clusters are as dissimilar as possible. One of the key features for clustering is how to define a sensible similarity measure. Such measures usually handle data in one modality, but unable to cluster data from different modalities. Based on fuzzy set and prototype theory interpretations of label semantics, two (dis) similarity measures are proposed by which we can automatically cluster data and vague concepts represented by logical expressions of linguistic labels. Experimental results on a toy problem and one in image classification demonstrate the effectiveness of new clustering algorithms. Since our new proposed measures can be extended to measuring distance between any two granularities, the new clustering algorithms can also be extended to cluster data instance and imprecise concepts represented by other granularities.
Similar content being viewed by others
References
Beg, M. M. S., Thint, M., & Qin, Z. (2007). PNL-enhanced restricted domain question answering system. The Proceedings of IEEE-FUZZ, 1277–1283.
Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. ISBN 0-306-40671-3.
Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2006). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.
Chakraborty, C., & Chakraborty, D. (2006). A theoretical development on a fuzzy distance measure for fuzzy numbers. Mathematical and Computer Modelling, 43, 254–261.
Deng, Z., Jiang, Y., Chung, F.-L., Ishibuchi, H., Choi, K.-S., & Wang, S. (2016). Transfer prototype-based fuzzy clustering. IEEE Transactions on Fuzzy Systems, 24(5), 1210–1232.
Diamond, P. (1988). Fuzzy least squares. Information Sciences, 46, 141–157.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.
Ghosh, S., & Kumar Dubey, S. (2013). Comparative analysis of K-means and fuzzy C-means algorithms. International Journal of Advanced Computer Science and Applications, 4(4), 35–39.
Hyung, L. K., Song, Y. S., & Lee, K. M. (1994). Similarity measure between fuzzy sets and between elements. Fuzzy Sets and System, 62, 291–293.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
Lawry, J. (2004). A framework for linguistic modeling. Artificial Intelligence, 155, 1–39.
Lawry, J. (2006). Modelling and reasoning with vague concepts. Berlin: Springer.
Lawry, J., & Tang, Y. (2009). Uncertainty modelling for vague concepts: A prototype theory approach. Artificial Intelligence, 173.18(2009), 1539–1558.
Li, D.-F. (2004). Some measures of dissimilarity in intuitionistic fuzzy structures. Journal of Computer and System Sciences, 8, 115–122.
Lavrenko, V., Manmatha, R., & Jeon, J. (2004). A model for learning the semantics of pictures. Advances in Neural Information Processing Systems, 16, 553–560.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297). University of California Press.
Miyamoto, S. (1990). Fuzzy sets in information retrieval and cluster analysis. Dordrecht: Kluwer Academic Publishers.
Pedrycz, W. (2005). Knowledge-based clustering. Hoboken: Wiley.
Qin, Z., & Lawry, J. (2005). Decision tree learning with fuzzy labels. Information Sciences, 172(1–2), 91–129.
Qin, Z., & Lawry, J. (2008). LFOIL: Linguistic rule induction in the label semantics framework. Fuzzy Sets and Systems, 159, 435–448.
Qin, Z., & Tang, Y. (2014). Uncertainty modeling for data mining: A label semantics approach. Berlin: Springer.
Qin, Z., Thint, M., & Beg, M. M. S. (2007). Deduction engine designs for PNL-based question answering systems. Foundations of Fuzzy Logic and Soft Computing, LNAI 4529, 253–262.
Talavera, L., & Bejar, J. (2001). Generality-based conceptual clustering with probabilistic concepts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 196–206.
Yang, K., & Ko, C.-H. (1997). On cluster-wise fuzzy regression analysis. IEEE Transaction on Systems, Man and Cybernetics B, 27, 1–13.
Yong, Y., Chongxun, Z., & Pan, L. (2004). A novel fuzzy C-means clustering algorithm for image thresholding. Measurement Science Review, 4(1), 11–19.
Zadeh, L. A. (1975). The concept of linguistic variable and its application to approximate reasoning Part 2. Information Science, 8, 301–357.
Zadeh, L. A. (1996). Fuzzy logic \(=\) computing with words. IEEE Transaction on Fuzzy Systems, 4, 103–111.
Zadeh, L. A. (2012). Computing with words: Principal concepts and ideas. Studies in fuzziness and soft computing. Berlin: Springer.
Acknowledgements
This work is partially supported by the Natural Science Foundation of China under Grant Nos. 61401012, 61305047 and the NUTP for Innovation and Entrepreneurship of China under No. 201510006143.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qin, Z., Wan, T. & Zhao, H. Hybrid clustering of data and vague concepts based on labels semantics. Ann Oper Res 256, 393–416 (2017). https://doi.org/10.1007/s10479-017-2541-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-017-2541-0