Skip to main content
Log in

An effective algorithm for genealogical graph partitioning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This study proposes a novel Approximately Balanced Tree Partitioning Algorithm (TPA) to overcome the significant challenges in genealogical data management, encompassing the storage, maintenance, and interpretation of complex familial networks. Our TPA is adept at modularizing and simplifying intricate relationships in genealogical graphs into logically succinct tree structures, reducing user cognitive load and enhancing the utility of genealogical data in real applications like hereditary disease research, forensic investigation, and consanguinity counseling. In addition, TPA prioritizes structural closeness in partitioning to avoid misleading insights from unrelated data points and maintain a balance of node distribution to prevent workload and communication overheads in distributed graph data processing systems. The effectiveness of our algorithm is demonstrated through extensive experiments on four real-world genealogical datasets, substantiating its superiority over five state-of-the-art rival models in dealing with the complex and rapidly expanding landscape of genealogical data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The datasets and code used during this study are available upon reasonable request to the authors.

References

  1. Hoeve CD (2018) Finding a place for genealogy and family history in the digital humanities. Digit Libr Perspect 34(3):215–226

    Google Scholar 

  2. Wikipedia (2023) Family tree. https://en.wikipedia.org/wiki/Familytree

  3. Ellis S, Aharonson BS, Drori I, Shapira Z (2017) Imprinting through inheritance: a multi-genealogical study of entrepreneurial proclivity. Acad Manag J 60(2):500–522

    Google Scholar 

  4. Ram N, Roberts JL (2019) Forensic genealogy and the power of defaults. Nat Biotechnol 37(7):707–708

    CAS  PubMed  Google Scholar 

  5. Ram N, Guerrini CJ, McGuire AL (2018) Genealogy databases and the future of criminal investigation. Science 360(6393):1078–1079

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kling D, Phillips C, Kennett D, Tillmar A (2021) Investigative genetic genealogy: current methods, knowledge and practice. Forensic Sci Int Genet 52:102474

    CAS  PubMed  Google Scholar 

  7. Nobre C, Gehlenborg N, Coon H, Lex A (2018) Lineage: visualizing multivariate clinical data in genealogy graphs. IEEE Trans Visual Comput Graphics 25(3):1543–1558

    Google Scholar 

  8. Cannon-Albright LA, Dintelman S, Maness T, Cerny J, Thomas A, Backus S, Farnham JM, Teerlink CC, Contreras J, Kauwe JS et al (2018) Population genealogy resource shows evidence of familial clustering for alzheimer disease. Neurology Genetics 4(4)

  9. Wohns AW, Wong Y, Jeffery B, Akbari A, Mallick S, Pinhasi R, Patterson N, Reich D, Kelleher J, McVean G (2022) A unified genealogy of modern and ancient genomes. Science 375(6583):8264

    Google Scholar 

  10. Guy J (2022) DNA reveals biggest-ever human family tree, dating back 100,000 years. CNN

  11. Website (2023) FamilySearch. https://www.familysearch.org/en/

  12. Wu X (2017–2023) Huapu System. https://www.zhonghuapu.com/

  13. Website (2023) Myheritage. https://www.myheritage.com/

  14. Wikipedia (2023) GEDCOM. https://en.wikipedia.org/wiki/GEDCOM

  15. Lu J, Scaramuzza M (2003) Building xml application in rich detailed genealogical information. Inf Softw Technol 45(2):95–108

    Google Scholar 

  16. Agarwala R, Biesecker LG, Hopkins KA, Francomano CA, Schaffer AA (1998) Software for constructing and verifying pedigrees within large genealogies and an application to the old order amish of lancaster county. Genome Res 8(3):211–221

    CAS  PubMed  Google Scholar 

  17. Efremova J, Ranjbar-Sahraei B, Rahmani H, Oliehoek FA, Calders T, Tuyls K, Weiss G (2015) Multi-source entity resolution for genealogical data. Population reconstruction 129–154

  18. XINHUANET (2017) Confucius family tree digitalized. http://www.xinhuanet.com//english/2017-06/13/c136362834.htm

  19. Kaiser J (2018) Thirteen million degrees of Kevin Bacon: world’s largest family tree shines light on life span, who marries whom. Science

  20. Baker HD (1979) Chinese Family and Kinship. Columbia University Press

    Google Scholar 

  21. Shiue CH (2016) A culture of kinship: Chinese genealogies as a source for research in demographic economics. Journal of Demographic Economics 82(4):459–482

    Google Scholar 

  22. Yelizarov A, Gamayunov D (2014) Adaptive visualization interface that manages user’s cognitive load based on interaction characteristics. In: Proceedings of the 7th international symposium on visual information communication and interaction, pp 1–8

  23. Liu Y, Dai S, Wang C, Zhou Z, Qu H (2017) Genealogyvis: a system for visual analysis of multidimensional genealogical data. IEEE Transactions on Human Machine Systems 47(6):873–885

    Google Scholar 

  24. Rutter L, VanderPlas S, Cook D, Graham MA (2019) ggenealogy: an R package for visualizing genealogical data. J Stat Softw 89:1–31

    Google Scholar 

  25. Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE transactions on neural networks and learning systems 33(2):494–514

    MathSciNet  Google Scholar 

  26. Wu X, Sheng S, Jiang T, Bu C, Wu M (2020) Huapu-cp: from knowledge graphs to a data central-platform. Acta Automatica Sinica 46(10):2045–2059

    Google Scholar 

  27. Fernandes D, Bernardino J (2018) Graph databases comparison: Allegrograph, arangodb, infinitegraph, neo4j, and orientdb. In: Data, pp 373–380

  28. Wu X, Jiang T, Zhu Y, Bu C (2021) Knowledge graph for China’s genealogy. IEEE Transactions on Knowledge and Data Engineering

  29. Kowaluk M, Lingas A (2005) Lca queries in directed acyclic graphs. In: International colloquium on automata, languages, and programming, pp 241–248. Springer

  30. McGuffin MJ, Balakrishnan R (2005) Interactive visualization of genealogical graphs. In: IEEE symposium on information visualization, pp 16–23. IEEE

  31. Nayak G, Dutta S, Ajwani D, Nicholson P, Sala A (2019) Automated assessment of knowledge hierarchy evolution: comparing directed acyclic graphs. Information Retrieval Journal 22(3–4):256–284

    Google Scholar 

  32. Peters J, Bühlmann P (2015) Structural intervention distance for evaluating causal graphs. Neural Comput 27(3):771–799

    MathSciNet  PubMed  Google Scholar 

  33. Chapelle A (1993) Disease gene mapping in isolated human populations: the example of Finland. J Med Genet 30(10):857

    PubMed  PubMed Central  Google Scholar 

  34. Kling D, Tillmar A (2019) Forensic genealogy-a comparison of methods to infer distant relationships based on dense snp data. Forensic Sci Int Genet 42:113–124

    CAS  PubMed  Google Scholar 

  35. Kate LPt, Rutgers-Janssen R, (1983) Family distances can reveal hidden consanguinity. Clin Genet 24(1):29–35

  36. Teixeira CH, Fonseca AJ, Serafini M, Siganos G, Zaki MJ, Aboulnaga A (2015) Arabesque: a system for distributed graph mining. In: Proceedings of the 25th symposium on operating systems principles, pp 425–440

  37. Talukder N, Zaki MJ A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery 30:1024–1052

  38. Zhao Y, Yoshigoe K, Bian J, Xie M, Xue Z, Feng Y (2016) A distributed graph-parallel computing system with lightweight communication overhead. IEEE Transactions on Big Data 2(3):204-218

    Google Scholar 

  39. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613

  40. Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, Hellerstein J (2014) Graphlab: a new framework for parallel machine learning. arXiv:1408.2041

  41. Li D, Mei H, Shen Y, Shuang S, Zhang W, Wang J, Zu M, Chen W (2018) Echarts: a declarative framework for rapid construction of web-based visualization. Visual Informatics 2:136–146

    Google Scholar 

  42. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49(2):291–307

    Google Scholar 

  43. Karypis G, Kumar V (1995) Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. Side Effects of Drugs Annual

  44. Moreira O, Popp M, Schulz C (2017) Graph partitioning with acyclicity constraints. arXiv:1704.00705

  45. Abbas Z, Kalavri V, Carbone P, Vlassov V (2018) Streaming graph partitioning: an experimental study. Proceedings of the VLDB Endowment 11(11):1590–1603

    Google Scholar 

  46. Ball R (2017) Visualizing genealogy through a family-centric perspective. Inf Vis 16(1):74–89

    Google Scholar 

  47. Borges J (2019) A contextual family tree visualization design. Inf Vis 18(4):439–454

    Google Scholar 

  48. Wu X, Li J, Zhou P, Bu C (2020) A fusion technique for fragmented genealogy data. Ruan Jian Xue Bao/Journal of Software 32(9):2816–2836

    Google Scholar 

  49. Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. Algorithm Engineering 117–158

  50. He C, Fei X, Cheng Q, Li H, Hu Z, Tang Y (2021) A survey of community detection in complex networks using nonnegative matrix factorization. IEEE Transactions on Computational Social Systems

  51. Newman ME (2013) Community detection and graph partitioning. Europhys Lett 103(2):28003

    ADS  CAS  Google Scholar 

  52. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    ADS  MathSciNet  Google Scholar 

  53. Ji S, Bu C, Li L, Wu X (2023) Localtgep: a lightweight edge partitioner for time varying graph. IEEE Transactions on Emerging Topics in Computing

  54. Li H, Yuan H, Huang J, Ma X, Cui J, Yoo J (2021) Edge repartitioning via structure-aware group migration. IEEE Transactions on Computational Social Systems

  55. Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1222–1230

  56. Andreev K, Räcke H (2006) Balanced graph partitioning. Theory Comput Syst 39(6):929–939

    MathSciNet  Google Scholar 

  57. Bourse F, Lelarge M, Vojnovic M (2014) Balanced graph edge partition. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1456–1465

  58. Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp 333–342

  59. Mayer R, Orujzade K, Jacobsen HA (2022) Out-of-core edge partitioning at linear run-time. In: 2022 IEEE 38th International conference on data engineering (ICDE), pp 2629–2642. IEEE

  60. Chunaev P (2020) Community detection in node-attributed social networks: a survey. Computer Science Review 37:100286

    MathSciNet  Google Scholar 

  61. Gasparetti F, Sansonetti G, Micarelli A (2021) Community detection in social recommender systems: a survey. Appl Intell 51:3975–3995

    Google Scholar 

  62. Sporns O, Betzel RF (2016) Modular brain networks. Annu Rev Psychol 67:613

    PubMed  Google Scholar 

  63. Muslim N et al (2016) A combination approach to community detection in social networks by utilizing structural and attribute data. Social Networking 5(01):11

    Google Scholar 

  64. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    MathSciNet  Google Scholar 

  65. Mahmood A, Small M (2015) Subspace based network community detection using sparse linear coding. IEEE Trans Knowl Data Eng 28(3):801–812

    Google Scholar 

  66. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    ADS  CAS  Google Scholar 

  67. Newman ME (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133

    ADS  CAS  Google Scholar 

  68. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2(1):718–729

    Google Scholar 

  69. Xu Z, Ke Y, Wang Y, Cheng H, Cheng, J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516

  70. Liu L, Xu L, Wangy Z, Chen E (2015) Community detection based on structure and content: a content propagation perspective. In: 2015 IEEE International conference on data mining, pp 271–280. IEEE

  71. Ma H, Liu Z, Zhang X, Zhang L, Jiang H (2021) Balancing topology structure and node attribute in evolutionary multi-objective community detection for attributed networks. Knowl-Based Syst 227:107169

    Google Scholar 

  72. Feldmann AE, Foschini L (2015) Balanced partitions of trees and applications. Algorithmica 71(2):354–376

    MathSciNet  Google Scholar 

  73. An Z, Feng Q, Kanj I, Xia G (2020) The complexity of tree partitioning. Algorithmica 82(9):2606–2643

    MathSciNet  Google Scholar 

  74. Ji S, Bu C, Li L, Wu X (2021) Local graph edge partitioning. ACM Transactions on Intelligent Systems and Technology (TIST) 12(5):1–25

    Google Scholar 

  75. Andreev K, Räcke H (2004) Balanced graph partitioning. In: Proceedings of the sixteenth annual ACM symposium on parallelism in algorithms and architectures, pp 120–124

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under grant 62120106008, and in part by the Fundamental Research Funds for the Central Universities under grant JZ2023HGTB0270.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, validation, and analysis were performed by Shaojing Sheng, Zan Zhang, Peng Zhou and Xindong Wu. The first draft of the manuscript was written by Shaojing Sheng, and all authors commented on previous versions of the manuscript. The second draft is revised by Shaojing Sheng based on the reviewers’ comments, and all authors commented on the rectifications. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xindong Wu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheng, S., Zhang, Z., Zhou, P. et al. An effective algorithm for genealogical graph partitioning. Appl Intell 54, 1798–1817 (2024). https://doi.org/10.1007/s10489-023-05265-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05265-1

Keywords