Skip to main content
Log in

Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Noirhomme-Fraiture M, Brito P. Far beyond the classical data models: Symbolic data analysis. Statistical Analysis and Data Mining, 2011, 4(2): 157-170.

  2. Xu X, Lu J H, Wang W. Incremental hierarchical clustering of stochastic pattern based symbolic data. In Advances in Knowledge Discovery and Data Mining, Bailey J, Khan L, Washio T et al. (eds.), Springer, 2016, pp.156-167.

  3. Yu X C, He H, Hu D, Zhou W. Land cover classification of remote sensing imagery based on interval-valued data fuzzy c-means algorithm. Science China Earth Science, 2014, 57(6): 1306-1313.

  4. Lauro C, Verde R, Irpino A. Generalized canonical analysis In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley-Interscience, 2008, pp.313-330.

  5. de Carvalho de A T F, de Souza R M C R. Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 2010, 31(5): 430-443.

  6. Rasson J P, Pircon J Y, Lallemand P, Adans S. Unsupervised divisive classification. In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley Interscience, 2008, pp.149-156.

  7. Neto L, de Carvalho F de A T. Constrained linear regression models for symbolic interval-valued variables. Computational Statistics & Data Analysis, 2010, 54(2): 333-347.

  8. Arroyo J, González-Rivera G, Maté C. Forecasting with interval and histogram data. Some financial applications. In Handbook of Empirical Economics and Finance, Ullah A, Giles D (eds.), Chapman and Hall/CRC, 2010, pp.247-279.

  9. Xu X. A novel hierarchical clustering framework for complex symbolic data exploration. In Proc. the 32nd IEEE International Conference on Data Engineering Workshops, May 2016, pp.189-192.

  10. Diday E. The symbolic approach in clustering and related methods of data analysis: The basic choices. In Proc. the 1st Conference of the International Federation of Classification Societies (IFCS), Bock H H (ed.), North Holland, 1988, pp.673-684.

  11. Diday E. Introduction à l′ approche symbolique en analyse des données. Recherche opérationnelle/Operations Research, 1989, 23(2): 193-236. (in French)

  12. Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software. Wiley Interscience, 2008

  13. Bock H H, Diday E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, 2000.

  14. Billard L. Sample covariance functions for complex quantitative data. In Proc. the Joint Meeting of the 4th World Conference of the IASC and the 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, December 2008, pp.157-163.

  15. Lin C M, Chen Y M, Hsueh C S. A self-organizing interval type-2 fuzzy neural network for radar emitter identification. International Journal of Fuzzy Systems, 2014, 16(1): 20-30.

  16. González-Rivera G, Arroyo J. Time series modeling of histogram-valued data: The daily histogram time series of S&P500 intradaily returns. International Journal of Forecasting, 2012, 28(1): 20-33.

  17. Kaytoue M, Kuznetsov S O, Napoli A. Revisiting numerical pattern mining with formal concept analysis. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1342-1347.

  18. Jaccard P. The distribution of the flora in the alpine zone. The New Phytologist, 1912, 11(2): 37-50.

  19. Tan P N, Steinbach M, Kumar V. Introduction to Data Mining (1st edition). Pearson, 2005.

  20. Wang L, Cheung W L D, Cheng R, Lee S D, Yang X S. Efficient mining of frequent item sets on large uncertain databases. IEEE Transactions on Knowledge & Data Engineering, 2012, 24(12): 2170-2183.

  21. Tong Y X, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceeding of the VLDB Endowment, 2012, 5(11): 1650-1661.

  22. Singh S K, Wayal G, Sharma N. A review: Data mining with fuzzy association rule mining. International Journal of Engineering Research & Technology, 2012, 1(5): 1-4.

  23. Prabha K S, Lawrance R. Mining fuzzy frequent item set using compact frequent pattern (CFP) tree algorithm. Data Mining and Knowledge Engineering 2012, 4(7): 365-369.

  24. Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3): 241-254.

  25. Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999, 32(8): 68-75.

  26. Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M. Algorithms for processing K-closest-pair queries in spatial databases. Data & Knowledge Engineering, 2004, 49 (1): 67-104.

  27. Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, June 1984, pp.47-57.

  28. Ibaraki T. Annals of Operations Research. Springer Verlag, 1987.

  29. Xiao C, Wang W, Lin X M, Yu J X, Wang G R. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 2011, 36(3): Article No. 15.

  30. Sun T Y, Shu C C, Li F, Yu H Y, Ma L L, Fang Y T. An efficient hierarchical clustering method for large datasets with MapReduce. In Proc. the International Conference on Parallel and Distributed Computing, Applications and Technologies, December 2009, pp.494-499.

  31. Bruynooghe M. Recent results in hierarchical clustering: I-the reducible neighborhoods clustering algorithm. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(3): 541-571.

  32. Siegfried K. Multivariate tests based on pairwise distance or similarity measures. In Proc. the 6th Conference on Multivariate Distributions with Fixed Marginals, June 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 590 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, X., Lu, J. & Wang, W. Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification. J. Comput. Sci. Technol. 33, 807–822 (2018). https://doi.org/10.1007/s11390-018-1857-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1857-9

Keywords

Navigation