Skip to main content
Log in

A New Representation of Interval Symbolic Data and Its Application in Dynamic Clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

In this study, we consider the type of interval data summarizing the original samples (individuals) with classical point data. This type of interval data are termed interval symbolic data in a new research domain called, symbolic data analysis. Most of the existing research, such as the (centre, radius) and [lower boundary, upper boundary] representations, represent an interval using only the boundaries of the interval. However, these representations hold true only under the assumption that the individuals contained in the interval follow a uniform distribution. In practice, such representations may result in not only inconsistency with the facts, since the individuals are usually not uniformly distributed in many application aspects, but also information loss for not considering the point data within the intervals during the calculation. In this study, we propose a new representation of the interval symbolic data considering the point data contained in the intervals. Then we apply the city-block distance metric to the new representation and propose a dynamic clustering approach for interval symbolic data. A simulation experiment is conducted to evaluate the performance of our method. The results show that, when the individuals contained in the interval do not follow a uniform distribution, the proposed method significantly outperforms the Hausdorff and city-block distance based on traditional representation in the context of dynamic clustering. Finally, we give an application example on the automobile data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BERTOLUZZA, C., CORRAL, N., and SALAS, A. (2008), "On a New Class of Distances between Fuzzy Numbers," Mathware & Soft Computing, 2(2), 71–84.

  • BILLARD, L., and DIDAY, E. (2006), Symbolic Data Analysis: Conceptual Statistics and Data Mining, UK: John Wiley & Sons Ltd.

  • BLANCO-FERN NDEZ, A., CORRAL, N., and GONZ LEZ-RODR GUEZ, G. (2011), "Estimation of a Flexible Simple Linear Model for Interval Data Based on Set Arithmetic," Computational Statistics & Data Analysis, 55(9), 2568–2578.

  • BRITO, P. (2014), "Symbolic Data Analysis: Another Look at the Interaction of Data Mining and Statistics," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(4), 281–295.

  • CHA, S.-H., and SRIHARI, S.N. (2002), "On Measuring the Distance between Histograms," Pattern Recognition, 35(6), 1355–1370.

  • CHAVENT, M., and LECHEVALLIER, Y. (2002), "Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance", in Classification, Clustering, and Data Analysis, eds. K. Jajuga, A. Sokolowski, and H-H. Bock, Springer, pp. 53–60.

  • CHEN, Q., LI, G., and PHOEBE CHEN, Y.-P. (2011), "Interval-Based Distance Function for Identifying Rna Structure Candidates," Journal of Theoretical Biology, 269(1), 280–286.

  • DE CARVALHO, F.D.A., DE SOUZA, R.M., CHAVENT, M., and LECHEVALLIER, Y. (2006), "Adaptive Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data," Pattern Recognition Letters, 27(3), 167–179.

  • DE SOUZA, R.M., and DE CARVALHO, F.D.A. (2004), "Clustering of Interval Data Based on City-Block Distances," Pattern Recognition Letters, 25(3), 353–365.

  • DIDAY, E. (1989), "Introduction a L'Analyse Des Donnees Symboliques," RR-1074, <inria-00075484>.

  • DIDAY, E. (1995), "Probabilist, Possibilist and Belief Objects for Knowledge Analysis," Annals of Operations Research, 55(2), 225–276.

  • DIDAY, E., and NOIRHOMME-FRAITURE, M. (2008), Symbolic Data Analysis and the Sodas Software, Wiley Online Library.

  • GUO, J., LI, W., LI, C., and GAO, S. (2012), "Standardization of Interval Symbolic Data Based on the Empirical Descriptive Statistics," Computational Statistics & Data Analysis, 56(3), 602–610.

  • HEDJAZI, L., AGUILAR-MARTIN, J., and LE LANN, M.-V. (2011), "Similarity-Margin Based Feature Selection for Symbolic Interval Data," Pattern Recognition Letters, 32(4), 578–585.

  • HUBERT, L., and ARABIE, P. (1985), "Comparing Partitions," Journal of Classification 2(1), 193–218.

  • IRPINO, A., and VERDE, R. (2008), "Dynamic Clustering of Interval Data Using a Wasserstein-Based Distance," Pattern Recognition Letters, 29(11), 1648–1658.

  • MALI, K., and MITRA, S. (2002), "Clustering of Symbolic Data and Its Validation," in Advances in Soft Computing–Afss 2002, eds. N.R. Pal and M. Sugeno, Berlin Heidelberg: Springer, pp. 339–344.

  • MALI, K., and MITRA, S. (2003), "Clustering and Its Validation in a Symbolic Framework," Pattern Recognition Letters, 24(14), 2367–2376.

  • SINOVA, B., COLUBI, A., and GIL, M. (2012), "Interval Arithmetic-Based Simple Linear Regression between Interval Data: Discussion and Sensitivity Analysis on the Choice of the Metric," Information Sciences, 199, 109–124.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junpeng Guo.

Additional information

The authors thank the anonymous reviewers for their helpful comments and valuable suggestions on previous versions of the manuscript. This research is financed by the National Natural Science Foundation of China (Grant No. 71271147) which is gratefully acknowledged.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Guo, J., Chen, Y. et al. A New Representation of Interval Symbolic Data and Its Application in Dynamic Clustering. J Classif 33, 149–165 (2016). https://doi.org/10.1007/s00357-016-9193-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-016-9193-7

Keywords

Navigation