Abstract
Formal concept analysis (FCA) is a tool for extracting natural clusters from objects and attributes represented as a binary table. Several parallel and distributed algorithms have been proposed to speedup concept discovery. The replication-based approaches suffer from memory bottlenecks while dealing with large contexts, whereas the partitioning-based approaches incur huge communication overhead. We propose HyPar-FCA, a distributed framework that uses horizontal partitioning for low support attributes and vertical partitioning for high support attributes. Our hybrid partitioning strategy can be tuned according to the machines’ memory constraints. It eliminates inter-machine communication for the horizontal partitions and minimizes it using auxiliary structures for the vertical partition. We show that HyPar-FCA is scalable to large contexts and can work on commodity hardware with memory constraints. Compared with state-of-the-art distributed FCA frameworks, HyPar-FCA improves execution time by 22% and reduces memory usage by 27%.














Similar content being viewed by others
Notes
The frequency of the attributes in the dataset is not uniform.
Worst-case input: only the diagonal elements in the context are zero.
An initial setup cost is required in all the approaches, either for partitioning or replicating the context among the workers. Being a one-time cost, we do not consider that in presenting the results.
References
Belohlavek R (2008) Introduction to formal concept analysis, vol 47. Department of Computer Science, Palacky University, Olomouc
Priss U (2006) Formal concept analysis in information science. Annu Rev Inf Sci Technol 40(1):521–543
Kneale W, Kneale WC, Kneale M (1962) The development of logic. Oxford University Press, Oxford
Arnauld A, Nicole P, Ozell J (1717) Logic, or, the art of thinking. Taylor, London
Missaoui R, Kuznetsov SO, Obiedkov S (2017) Formal concept analysis of social networks. Springer, Berlin
Jiang G, Pathak J, Chute CG (2009) Formalizing ICD coding rules using formal concept analysis. J Biomed Inform 42(3):504–517
Huang Y, Bian L (2015) Using ontologies and formal concept analysis to integrate heterogeneous tourism information. IEEE Trans Emerg Top Comput 3(2):172–184
Atif J, Hudelot C, Bloch I (2013) Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans Syst Man Cybern Syst 44(5):552–570
Hao F, Min G, Pei Z, Park D-S, Yang LT (2015) \(k\)-clique community detection in social networks based on formal concept analysis. IEEE Syst J 11(1):250–259
Sun Z, Wang B, Sheng J, Hu Y, Wang Y, Shao J (2017) Identifying influential nodes in complex networks based on weighted formal concept analysis. IEEE Access 5:3777–3789
Hao F, Pang G, Pei Z, Qin K, Zhang Y, Wang X (2019) Virtual machines scheduling in mobile edge computing: a formal concept analysis approach. IEEE Trans Sustain Comput 5(3):319–328
Ferré S, Cellier P (2020) Graph-FCA: an extension of formal concept analysis to knowledge graphs. Discret Appl Math 273:81–102
Andrews S (2011) In-close2, a high performance formal concept miner. In: International Conference on Conceptual Structures. Springer, Berlin, pp 50–62
Lucchese C, Orlando S, Perego R (2005) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: International Conference on Discovery Science. Springer, Berlin, pp 16–31
Ganter B (2010) Two basic algorithms in concept analysis. In: International Conference on Formal Concept Analysis, Springer, Berlin, pp 312–340
Kuznetsov SO (1999) Learning of simple conceptual graphs from positive and negative examples. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, pp 384–391
Negrevergne B, Termier A, Méhaut J-F, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: 2010 International Conference on High Performance Computing and Simulation. IEEE, New York, pp 521–528
Patel S, Agarwal U, Kailasam S (2018) A dynamic load balancing scheme for distributed formal concept analysis. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, New York, pp 489–496
Xu B, de Fréin R, Robson E, Foghlú MÓ (2012) Distributed formal concept analysis algorithms based on an iterative MapReduce framework. In: International Conference on Formal Concept Analysis. Springer, Berlin, pp 292–308
Yoshizoe K, Terada A, Tsuda K (2015) Redesigning pattern mining algorithms for supercomputers. arXiv preprint. arXiv:1510.07787
Leroy V, Kirchgessner M, Termier A, Amer-Yahia S (2017) TopPI: an efficient algorithm for item-centric mining. Inf Syst 64:104–118
Goel S, Broder A, Gabrilovich E, Pang B (2010) Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp 201–210
Borah A, Nath B (2019) Rare pattern mining: challenges and future perspectives. Complex Intell Syst 5(1):1–23
Wolff KE (1993) A first course in formal concept analysis. SoftStat 93:429–438
Muneeswaran P, Jyoti, Kailasam S (2020) A hybrid partitioning strategy for distributed FCA. In: CLA, pp 71–82
Krajca P, Outrata J, Vychodil V (2010) Parallel algorithm for computing fixpoints of Galois connections. Ann Math Artif Intell 59(2):257–272
Zou L, He T, Dai J (2022) A new parallel algorithm for computing formal concepts based on two parallel stages. Inf Sci 586:514–524
Krajca P, Vychodil V (2009) Distributed algorithm for computing formal concepts using map-reduce framework. In: International Symposium on Intelligent Data Analysis. Springer, Berlin, pp 333–344
Chunduri RK, Cherukuri AK (2019) Scalable formal concept analysis algorithms for large datasets using spark. J Ambient Intell Humaniz Comput 10(11):4283–4303
Venkataraman S, Yang Z, Liu D, Liang E, Falaki H, Meng X, Xin R, Ghodsi A, Franklin M, Stoica I et al (2016) Sparkr: Scaling r programs with spark. In: Proceedings of the 2016 International Conference on Management of Data, pp 1099–1104
Chunduri RK, Cherukuri AK (2018) Haloop approach for concept generation in formal concept analysis. J Inf Knowl Manag 17(03):1850029
Lucchese C, Orlando S, Perego R (2007) Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp 242–251
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, Von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not 40(10):519–538
Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssi-Yan-Kai G (2018) Roaring bitmaps: implementation of an optimized software library. Softw Pract Exp 48(4):867–895
SPMF repository. https://www.philippe-fournier-viger.com/spmf/. Online. Accessed 01 Aug 2021
Apache Kafka. http://kafka.apache.org/. Online. Accessed 01 Aug 2021
Apache ZooKeeper-Home. https://zookeeper.apache.org/. Online. Accessed 01 Aug 2021
Welcome to Apache Hadoop. https://hadoop.apache.org/. Online. Accessed 01 Aug 2021
FIMI repository. http://fimi.cs.helsinki.fi/. Online. Accessed 01 Aug 2021
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work as supported by SPARC, a Government of India Initiative under Grant no. SPARC/2018-2019/P682/SL.
Rights and permissions
About this article
Cite this article
Packiaraj, M., Kailasam, S. HyPar-FCA: a distributed framework based on hybrid partitioning for FCA. J Supercomput 78, 12589–12620 (2022). https://doi.org/10.1007/s11227-022-04366-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04366-x