Skip to main content
Log in

Visual analytics of genealogy with attribute-enhanced topological clustering

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

Clustering is able to present a brief illustration for families of interest and patterns of significance within large-scale genealogical datasets. In the traditional clustering methods, topological features are mostly taken for summarizing and organizing family trees. However, plentiful attributes are ignored which are also important to enhance the understanding and interpretation of genealogical clustering features. Thus, it is a crucial task to combine structures and attributes into a clustering model for exploring genealogy datasets. In this paper, we propose an attribute-enhanced topological clustering method for exploring genealogy datasets based on partial least squares (PLS). Firstly, a graphlet kernel method is utilized to measure the structure difference between family trees. Then, we leverage PLS to combine the learned vectors and multiple attributes, and a joint dimensionality reduction method is applied to project the high-dimensional vectors into a two-dimensional space in which a distance-based clustering method is employed to aggregate the similar family trees taking both the topological structures and attribute features into consideration. Further, we implement a visual analysis system with multi-view collaboration, including glyph, family tree view and parallel coordinate view, to represent, evaluate and explore the clustering features. Case studies and quantitative comparisons based on real-world genealogy datasets have demonstrated the effectiveness of our method in genealogical clustering and exploration.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig.6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Toward Better Bus Networks: {A} Visual Analytics Approach.

  2. Facilitating discourse analysis with interactive visualization.

  3. Pathline: A Tool For Comparative Functional Genomics Manually linearize a complete network and render attributes next to the linear layout.

  4. Tac-Miner: Visual Tactic Mining for Multiple Table Tennis Matches.

References

  • Bezerianos A, Dragicevic P, Fekete JD, Bae J, Watson B (2011) GeneaQuilts: a system for exploring large genealogies. IEEE Trans vis Comput Graph 16:1073–1081. https://doi.org/10.1109/TVCG.2010.159

    Article  Google Scholar 

  • Boudjeloud-Assala L, Pinheiro P, Blansch A, Tamisier T, Otjacques B (2016) Interactive and iterative visual clustering. Inf vis 15(3):181–197

    Article  Google Scholar 

  • Cao N, Gotz D, Sun J, Qu H (2011) DICON: interactive visual analysis of multidimensional clusters. In: IEEE transactions on visualization and computer graphics, vol 17, no 12. pp 2581–2590. https://doi.org/10.1109/TVCG.2011.188

  • Chen K, Liu L (2004) VISTA: validating and refining clusters via visualization. Inf vis 3(4):257–270

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. In: IEEE transactions on pattern analysis and machine intelligence, vol PAMI-1, no 2, pp 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

  • Dexter S, Yarmish G, Listowsky P (2016) Parallel clustering of protein structures generated via stochastic Monte Carlo. In: proceedings of symposium on stochastic models in reliability engineering. Life Science and Operations Management (SMRLO), pp 410–413. https://doi.org/10.1109/SMRLO.2016.71

  • Fu S, Dong H, Cui W, Zhao J, Qu H (2017) How do ancestral traits shape family trees over generations? In: IEEE transactions on visualization and computer graphics, vol. 24, no 1, pp 205–214. https://doi.org/10.1109/TVCG.2017.2744080

  • Furnas GW, Zacks J (1994) Multitrees: enriching and reusing hierarchical structure. In: CHI’94: proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 330–336

  • Grygorash O, Zhou Y, Jorgensen Z (2006) Minimum spanning tree based clustering algorithms. In: Proceedings—international conference on tools with artificial intelligence, ICTAI, pp 73–81. https://doi.org/10.1109/ICTAI.2006.83

  • Gu T, Zhu M, Chen W et al (2018) Structuring mobility transition with an adaptive graph representation. IEEE Trans Comput Soc Syst 5(4):1121–1132

    Article  Google Scholar 

  • Hillis DM, Heath TA, John KS (2005) Analysis and visualization of tree space. Syst Biol 54:471–482. https://doi.org/10.1080/10635150590946961

    Article  Google Scholar 

  • Jin C, Bai Q (2016) Text clustering algorithm based on the graph structures of semantic word co-occurrence. In: 2016 international conference on information system and artificial intelligence (ISAI), pp 497-502. https://doi.org/10.1109/ISAI.2016.0112

  • Kemp T (1999) Genealogy: finding roots on the web. Coll Res Libr News 60:452–455. https://doi.org/10.5860/crln.60.6.452

    Article  Google Scholar 

  • Ko S, Afzal S, Walton S, et al (2014) Analyzing high-dimensional multivaríate network links with integrated anomaly detection, highlighting and exploration. In: IEEE conference on visual analytics science and technology (VAST)

  • Kong X, Chen Y, Tian H, Wang T, Cai Y, Chen X (2016) A novel botnet detection method based on preprocessing data packet by graph structure clustering. In: 2016 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC), pp 42–45. https://doi.org/10.1109/CyberC.2016.16

  • Kosaka T, Sagayama S (1994) Tree-structured speaker clustering for fast speaker adaptation. In: IEEE international conference on acoustics, pp 245–248. https://doi.org/10.1109/ICASSP.1994.389309

  • Kozak M (2012) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27. https://doi.org/10.1080/03610927408827101

    Article  MathSciNet  Google Scholar 

  • Kutz DO (2004) Examining the evolution and distribution of patent classifications. In: Proceedings of information visualisation, pp 983–988. https://doi.org/10.1109/IV.2004.1320261

  • Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2017) Clustervision: visual supervision of unsupervised clustering. In: IEEE transactions on visualization and computer graphics. vol 24, no 1, pp 142–151. https://doi.org/10.1109/TVCG.2017.2745085

  • L’Yi S, Ko B, Shin D, Cho Y-J, Lee J, Kim B, Seo J (2015) XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data. BMC Bioinf 16(11):S5

    Google Scholar 

  • Laurens VDM, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    MATH  Google Scholar 

  • Liao H, Wu Y, Chen L, Chen W (2018) Cluster-based visual abstraction for multivariate scatterplots. IEEE Trans vis Comput Graph 24(9):2531–2545. https://doi.org/10.1109/TVCG.2017.2754480

    Article  Google Scholar 

  • Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017a) Towards better analysis of deep convolutional neural networks. IEEE Trans vis Comput Graph 23(1):91–100. https://doi.org/10.1109/TVCG.2016.2598831

    Article  Google Scholar 

  • Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393. https://doi.org/10.1007/s00371-013-0892-3

    Article  Google Scholar 

  • Liu Y, Dai S, Wang C, Zhou Z, Qu H (2017). GenealogyVis: A system for visual analysis of multidimensional genealogical data. In: IEEE

  • Maguire E, Koutsakis I, Louppe G (2016) Clusterix: a visual analytics approach to clustering. In: Symposium on visualization in data science at IEEE VIS

  • Munzner T, Guimbretiere F, Tasiran S, Zhang L, Zhou Y (2003) TreeJuxtaposer: scalable tree comparison using focus+context with guaranteed visibility. ACM Trans Graph 22:453–462. https://doi.org/10.1145/1201775.882291

    Article  Google Scholar 

  • Nober C, Gehlenborg N, Coo H et al (2019) Lineage: visualizing multivariate clinical data in genealogy graphs. Trans vis Comput Graph 25(3):1543–1558

    Article  Google Scholar 

  • Papadopoulos AN, Manolopoulos Y (1999) Structure-based similarity search with graph histograms. In: Proceedings. 10th international workshop on database and expert systems applications. DEXA 99, pp 174–178. https://doi.org/10.1109/DEXA.1999.795162

  • Partl C, Gratzl S, Streit M et al (2016) Pathfinder: visual analysis of paths in graphs. Comput Graph Forum J Eur Assoc Comput Graph 35(3):71–80

    Article  Google Scholar 

  • Rahman M, Bhuiyan MA, Rahman M, Al HM (2014) GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38:511–536. https://doi.org/10.1007/s10115-013-0673-3

    Article  Google Scholar 

  • Shaw PD, Graham M, Kennedy J, Milne I, Marshall DF (2014) Helium: visualization of large scale plant pedigrees. BMC Bioinf 15(1):259. https://doi.org/10.1186/1471-2105-15-259

    Article  Google Scholar 

  • Tsuya NO, Wang F, Alter G, Lee JZ (2010) Prudence and pressure: reproduction and human agency in Europe and Asia, 1700–1900. https://doi.org/10.7551/mitpress/8162.001.0001

  • Wang Y, Shi C, Li L et al (2018) Visualizing research impact through citation data. ACM Trans Interactive Intell Syst 8(1):1–24

    Article  Google Scholar 

  • Wattenberg M (2006) Visual exploration of multivariate graphs. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 811–819. https://doi.org/10.1145/1124772.1124891

  • White D (1993) Representing and computing kinship—a new approach (VOL 33, PG 454, 1992). Curr Anthropol 34:176–176

    Article  Google Scholar 

  • Wu W, Xu J, Zeng H, Zheng Y, Qu H, Ni B, Yuan M, Ni LM (2015) TelCoVis: visual exploration of co-occurrence in urban human mobility based on telco data. IEEE Trans vis Comput Graph 22:1–1. https://doi.org/10.1109/TVCG.2015.2467194

    Article  Google Scholar 

  • Xia JZ, Zhang YH, Ye H, Wang Y, Jiang G, Zhao Y, Xie C, Kui XY, Liao SH, Wang WP (2020a) SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inf Technol Electron Eng 21(4):507–523

    Article  Google Scholar 

  • Xia et al (2020b) SMAP: a joint dimensionality reduction scheme for secure multi-party visualization. IEEE Conf vis Anal Sci Technol 2020:107–118. https://doi.org/10.1109/VAST50239.2020.00015

    Article  Google Scholar 

  • Xia J, Ye F, Chen W, Wang Y, Chen W, Ma Y, Tung AK (2018) LDSScanner: exploratory analysis of low-dimensional structures in high-dimensional datasets. IEEE Trans vis Comput Graph 24(1):236–245

    Article  Google Scholar 

  • Yang M, Wu C, Xie T (2020) Information propagation dynamics model based on implicit cluster structure network. In: Proceedings of IEEE information technology and mechatronics engineering conference, pp 1253–1257. https://doi.org/10.1109/ITOEC49072.2020.9141733

  • Yuan J, Chen C, Yang W et al (2020) A survey of visual analytics techniques for machine learning. Comput Vis Media 7:3–36. https://doi.org/10.1007/s41095-020-0191-7

    Article  Google Scholar 

  • Zhang K, Wang JT, Shasha D (2011) On the editing distance between undirected acyclic graphs. Int J Found Comput Sci. https://doi.org/10.1142/S0129054196000051

    Article  MATH  Google Scholar 

  • Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2020) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans vis Comput Graph 26(1):590–600. https://doi.org/10.1109/TVCG.2019.2934655

    Article  Google Scholar 

  • Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F (2021) Preserving minority structures in graph sampling. IEEE Trans vis Comput Graph 27(2):1–10. https://doi.org/10.1109/TVCG.2020.3030428

    Article  Google Scholar 

  • Zhou Z, Ye Z, Liu Y et al (2017a) Visual analytics for spatial clusters of air-quality data. IEEE Comput Graph Appl 37(5):98

    Article  Google Scholar 

  • Zhou Z et al (2021) Context-aware sampling of large networks via graph representation learning. IEEE Trans vis Comput Graph 27(2):1709–1719. https://doi.org/10.1109/TVCG.2020.3030440

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the reviewers for their thoughtful comments. The work is supported in part by the National Natural Science Foundation of China (Nos. 61872314 and 61802339), the Open Project Program of the State Key Lab of CADCG of Zhejiang University (No. A2001) and the Natural Science Foundation of Zhejiang Province, China (Nos. LY21F020029 and LY19F020011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguang Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, X., Pan, X. et al. Visual analytics of genealogy with attribute-enhanced topological clustering. J Vis 25, 361–377 (2022). https://doi.org/10.1007/s12650-021-00802-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-021-00802-x

Keywords

Navigation