Abstract
This study proposes a novel Approximately Balanced Tree Partitioning Algorithm (TPA) to overcome the significant challenges in genealogical data management, encompassing the storage, maintenance, and interpretation of complex familial networks. Our TPA is adept at modularizing and simplifying intricate relationships in genealogical graphs into logically succinct tree structures, reducing user cognitive load and enhancing the utility of genealogical data in real applications like hereditary disease research, forensic investigation, and consanguinity counseling. In addition, TPA prioritizes structural closeness in partitioning to avoid misleading insights from unrelated data points and maintain a balance of node distribution to prevent workload and communication overheads in distributed graph data processing systems. The effectiveness of our algorithm is demonstrated through extensive experiments on four real-world genealogical datasets, substantiating its superiority over five state-of-the-art rival models in dealing with the complex and rapidly expanding landscape of genealogical data.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets and code used during this study are available upon reasonable request to the authors.
References
Hoeve CD (2018) Finding a place for genealogy and family history in the digital humanities. Digit Libr Perspect 34(3):215–226
Wikipedia (2023) Family tree. https://en.wikipedia.org/wiki/Familytree
Ellis S, Aharonson BS, Drori I, Shapira Z (2017) Imprinting through inheritance: a multi-genealogical study of entrepreneurial proclivity. Acad Manag J 60(2):500–522
Ram N, Roberts JL (2019) Forensic genealogy and the power of defaults. Nat Biotechnol 37(7):707–708
Ram N, Guerrini CJ, McGuire AL (2018) Genealogy databases and the future of criminal investigation. Science 360(6393):1078–1079
Kling D, Phillips C, Kennett D, Tillmar A (2021) Investigative genetic genealogy: current methods, knowledge and practice. Forensic Sci Int Genet 52:102474
Nobre C, Gehlenborg N, Coon H, Lex A (2018) Lineage: visualizing multivariate clinical data in genealogy graphs. IEEE Trans Visual Comput Graphics 25(3):1543–1558
Cannon-Albright LA, Dintelman S, Maness T, Cerny J, Thomas A, Backus S, Farnham JM, Teerlink CC, Contreras J, Kauwe JS et al (2018) Population genealogy resource shows evidence of familial clustering for alzheimer disease. Neurology Genetics 4(4)
Wohns AW, Wong Y, Jeffery B, Akbari A, Mallick S, Pinhasi R, Patterson N, Reich D, Kelleher J, McVean G (2022) A unified genealogy of modern and ancient genomes. Science 375(6583):8264
Guy J (2022) DNA reveals biggest-ever human family tree, dating back 100,000 years. CNN
Website (2023) FamilySearch. https://www.familysearch.org/en/
Wu X (2017–2023) Huapu System. https://www.zhonghuapu.com/
Website (2023) Myheritage. https://www.myheritage.com/
Wikipedia (2023) GEDCOM. https://en.wikipedia.org/wiki/GEDCOM
Lu J, Scaramuzza M (2003) Building xml application in rich detailed genealogical information. Inf Softw Technol 45(2):95–108
Agarwala R, Biesecker LG, Hopkins KA, Francomano CA, Schaffer AA (1998) Software for constructing and verifying pedigrees within large genealogies and an application to the old order amish of lancaster county. Genome Res 8(3):211–221
Efremova J, Ranjbar-Sahraei B, Rahmani H, Oliehoek FA, Calders T, Tuyls K, Weiss G (2015) Multi-source entity resolution for genealogical data. Population reconstruction 129–154
XINHUANET (2017) Confucius family tree digitalized. http://www.xinhuanet.com//english/2017-06/13/c136362834.htm
Kaiser J (2018) Thirteen million degrees of Kevin Bacon: world’s largest family tree shines light on life span, who marries whom. Science
Baker HD (1979) Chinese Family and Kinship. Columbia University Press
Shiue CH (2016) A culture of kinship: Chinese genealogies as a source for research in demographic economics. Journal of Demographic Economics 82(4):459–482
Yelizarov A, Gamayunov D (2014) Adaptive visualization interface that manages user’s cognitive load based on interaction characteristics. In: Proceedings of the 7th international symposium on visual information communication and interaction, pp 1–8
Liu Y, Dai S, Wang C, Zhou Z, Qu H (2017) Genealogyvis: a system for visual analysis of multidimensional genealogical data. IEEE Transactions on Human Machine Systems 47(6):873–885
Rutter L, VanderPlas S, Cook D, Graham MA (2019) ggenealogy: an R package for visualizing genealogical data. J Stat Softw 89:1–31
Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE transactions on neural networks and learning systems 33(2):494–514
Wu X, Sheng S, Jiang T, Bu C, Wu M (2020) Huapu-cp: from knowledge graphs to a data central-platform. Acta Automatica Sinica 46(10):2045–2059
Fernandes D, Bernardino J (2018) Graph databases comparison: Allegrograph, arangodb, infinitegraph, neo4j, and orientdb. In: Data, pp 373–380
Wu X, Jiang T, Zhu Y, Bu C (2021) Knowledge graph for China’s genealogy. IEEE Transactions on Knowledge and Data Engineering
Kowaluk M, Lingas A (2005) Lca queries in directed acyclic graphs. In: International colloquium on automata, languages, and programming, pp 241–248. Springer
McGuffin MJ, Balakrishnan R (2005) Interactive visualization of genealogical graphs. In: IEEE symposium on information visualization, pp 16–23. IEEE
Nayak G, Dutta S, Ajwani D, Nicholson P, Sala A (2019) Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs. Information Retrieval Journal 22(3–4):256–284
Peters J, Bühlmann P (2015) Structural intervention distance for evaluating causal graphs. Neural Comput 27(3):771–799
Chapelle A (1993) Disease gene mapping in isolated human populations: the example of Finland. J Med Genet 30(10):857
Kling D, Tillmar A (2019) Forensic genealogy-a comparison of methods to infer distant relationships based on dense snp data. Forensic Sci Int Genet 42:113–124
Kate LPt, Rutgers-Janssen R, (1983) Family distances can reveal hidden consanguinity. Clin Genet 24(1):29–35
Teixeira CH, Fonseca AJ, Serafini M, Siganos G, Zaki MJ, Aboulnaga A (2015) Arabesque: a system for distributed graph mining. In: Proceedings of the 25th symposium on operating systems principles, pp 425–440
Talukder N, Zaki MJ A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery 30:1024–1052
Zhao Y, Yoshigoe K, Bian J, Xie M, Xue Z, Feng Y (2016) A distributed graph-parallel computing system with lightweight communication overhead. IEEE Transactions on Big Data 2(3):204-218
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613
Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, Hellerstein J (2014) Graphlab: a new framework for parallel machine learning. arXiv:1408.2041
Li D, Mei H, Shen Y, Shuang S, Zhang W, Wang J, Zu M, Chen W (2018) Echarts: a declarative framework for rapid construction of web-based visualization. Visual Informatics 2:136–146
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49(2):291–307
Karypis G, Kumar V (1995) Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. Side Effects of Drugs Annual
Moreira O, Popp M, Schulz C (2017) Graph partitioning with acyclicity constraints. arXiv:1704.00705
Abbas Z, Kalavri V, Carbone P, Vlassov V (2018) Streaming graph partitioning: an experimental study. Proceedings of the VLDB Endowment 11(11):1590–1603
Ball R (2017) Visualizing genealogy through a family-centric perspective. Inf Vis 16(1):74–89
Borges J (2019) A contextual family tree visualization design. Inf Vis 18(4):439–454
Wu X, Li J, Zhou P, Bu C (2020) A fusion technique for fragmented genealogy data. Ruan Jian Xue Bao/Journal of Software 32(9):2816–2836
Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. Algorithm Engineering 117–158
He C, Fei X, Cheng Q, Li H, Hu Z, Tang Y (2021) A survey of community detection in complex networks using nonnegative matrix factorization. IEEE Transactions on Computational Social Systems
Newman ME (2013) Community detection and graph partitioning. Europhys Lett 103(2):28003
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Ji S, Bu C, Li L, Wu X (2023) Localtgep: a lightweight edge partitioner for time varying graph. IEEE Transactions on Emerging Topics in Computing
Li H, Yuan H, Huang J, Ma X, Cui J, Yoo J (2021) Edge repartitioning via structure-aware group migration. IEEE Transactions on Computational Social Systems
Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1222–1230
Andreev K, Räcke H (2006) Balanced graph partitioning. Theory Comput Syst 39(6):929–939
Bourse F, Lelarge M, Vojnovic M (2014) Balanced graph edge partition. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1456–1465
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp 333–342
Mayer R, Orujzade K, Jacobsen HA (2022) Out-of-core edge partitioning at linear run-time. In: 2022 IEEE 38th International conference on data engineering (ICDE), pp 2629–2642. IEEE
Chunaev P (2020) Community detection in node-attributed social networks: a survey. Computer Science Review 37:100286
Gasparetti F, Sansonetti G, Micarelli A (2021) Community detection in social recommender systems: a survey. Appl Intell 51:3975–3995
Sporns O, Betzel RF (2016) Modular brain networks. Annu Rev Psychol 67:613
Muslim N et al (2016) A combination approach to community detection in social networks by utilizing structural and attribute data. Social Networking 5(01):11
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Mahmood A, Small M (2015) Subspace based network community detection using sparse linear coding. IEEE Trans Knowl Data Eng 28(3):801–812
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Newman ME (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2(1):718–729
Xu Z, Ke Y, Wang Y, Cheng H, Cheng, J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516
Liu L, Xu L, Wangy Z, Chen E (2015) Community detection based on structure and content: a content propagation perspective. In: 2015 IEEE International conference on data mining, pp 271–280. IEEE
Ma H, Liu Z, Zhang X, Zhang L, Jiang H (2021) Balancing topology structure and node attribute in evolutionary multi-objective community detection for attributed networks. Knowl-Based Syst 227:107169
Feldmann AE, Foschini L (2015) Balanced partitions of trees and applications. Algorithmica 71(2):354–376
An Z, Feng Q, Kanj I, Xia G (2020) The complexity of tree partitioning. Algorithmica 82(9):2606–2643
Ji S, Bu C, Li L, Wu X (2021) Local graph edge partitioning. ACM Transactions on Intelligent Systems and Technology (TIST) 12(5):1–25
Andreev K, Räcke H (2004) Balanced graph partitioning. In: Proceedings of the sixteenth annual ACM symposium on parallelism in algorithms and architectures, pp 120–124
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under grant 62120106008, and in part by the Fundamental Research Funds for the Central Universities under grant JZ2023HGTB0270.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection, validation, and analysis were performed by Shaojing Sheng, Zan Zhang, Peng Zhou and Xindong Wu. The first draft of the manuscript was written by Shaojing Sheng, and all authors commented on previous versions of the manuscript. The second draft is revised by Shaojing Sheng based on the reviewers’ comments, and all authors commented on the rectifications. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sheng, S., Zhang, Z., Zhou, P. et al. An effective algorithm for genealogical graph partitioning. Appl Intell 54, 1798–1817 (2024). https://doi.org/10.1007/s10489-023-05265-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05265-1