Abstract
Enumeration of formal concepts is crucial in formal concept analysis. Particularly efficient for this task are algorithms from the Close-by-One family (shortly, CbO-based algorithms). State-of-the-art CbO-based algorithms, e.g. FCbO, In-Close4, and In-Close5, employ several techniques, which we call pruning, to avoid some unnecessary computations. However, the number of the formal concepts can be exponential w.r.t. dimension of the input data. Therefore, the algorithms do not scale well and large datasets become intractable. To resolve this weakness, several parallel and distributed algorithms were proposed. We propose new CbO-based algorithms intended for Apache Spark or a similar programming model and show how the pruning can be incorporated into them. We experimentally evaluate the impact of the pruning and demonstrate the scalability of the new algorithm.
Supported by the grant JG 2019 of Palacký University Olomouc, No. JG_2019_008.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
IBM Quest Synthetic Data Generator was used.
References
Akhmatnurov, M., Ignatov, D.I.: Context-aware recommender system based on Boolean matrix factorisation. In: Yahia, S.B., Konecny, J. (eds.) Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications, Clermont-Ferrand, France, 13–16 October 2015, CEUR Workshop Proceedings, vol. 1466, pp. 99–110 (2015). CEUR-WS.org
Andrews, S.: In-Close, a fast algorithm for computing formal concepts. In: 17th International Conference on Conceptual Structures, ICCS 2009. Springer (2009)
Andrews, S.: In-Close2, a high performance formal concept miner. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds.) ICCS 2011. LNCS (LNAI), vol. 6828, pp. 50–62. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22688-5_4
Andrews, S.: A ‘best-of-breed’ approach for designing a fast algorithm for computing fixpoints of Galois connections. Inf. Sci. 295, 633–649 (2015)
Andrews, S.: Making use of empty intersections to improve the performance of CbO-type algorithms. In: Bertet, K., Borchmann, D., Cellier, P., Ferré, S. (eds.) ICFCA 2017. LNCS (LNAI), vol. 10308, pp. 56–71. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59271-8_4
Andrews, S.: A new method for inheriting canonicity test failures in Close-by-One type algorithms. In: Ignatov, D.I., Nourine, L. (eds.) Proceedings of the Fourteenth International Conference on Concept Lattices and Their Applications, CLA 2018, Olomouc, Czech Republic, 12–14 June 2018, CEUR Workshop Proceedings, vol. 2123, pp. 255–266 (2018). CEUR-WS.org
Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)
Chunduri, R.K., Cherukuri, A.K.: Haloop approach for concept generation in formal concept analysis. JIKM 17(3), 1850029 (2018)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Brewer, E.A., Chen, P. (eds.) 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, 6–8 December 2004, pp. 137–150. USENIX Association (2004)
Ganter, B., Wille, R.: Formal Concept Analysis Mathematical Foundations. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2
Krajca, P., Outrata, J., Vychodil, V.: Advances in algorithms based on CbO. In: Proceedings of the 7th International Conference on Concept Lattices and Their Applications, Sevilla, Spain, 19–21 October 2010, pp. 325–337 (2010)
Krajca, P., Outrata, J., Vychodil, V.: Parallel algorithm for computing fixpoints of Galois connections. Ann. Math. Artif. Intell. 59(2), 257–272 (2010)
Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 333–344. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_29
Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects from an arbitrary semilattice. Nauchno-Tekhnicheskaya Informatsiya Seriya 2-Informatsionnye Protsessy i Sistemy 27(1), 17–20 (1993). https://www.researchgate.net/publication/273759395_SOKuznetsov_A_fast_algorithm_for_computing_all_intersections_of_objects_from_an_arbitrary_semilattice_Nauchno-Tekhnicheskaya_Informatsiya_Seriya_2_-_Informatsionnye_protsessy_i_sistemy_No_1_pp17-20_19
Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data. Inf. Sci. 185(1), 114–127 (2012)
Poelmans, J., Ignatov, D.I., Viaene, S., Dedene, G., Kuznetsov, S.O.: Text mining scientific papers: a survey on FCA-based information retrieval research. In: Perner, P. (ed.) ICDM 2012. LNCS (LNAI), vol. 7377, pp. 273–287. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31488-9_22
Xu, B., de Fréin, R., Robson, E., Ó Foghlú, M.: Distributed formal concept analysis algorithms based on an iterative MapReduce framework. In: Domenach, F., Ignatov, D.I., Poelmans, J. (eds.) ICFCA 2012. LNCS (LNAI), vol. 7278, pp. 292–308. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29892-9_26
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Fonseca, R., Maltz, D.A. (eds.) 4th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2012, Boston, MA, USA, 12–13 June 2012. USENIX Association (2012)
Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Konecny, J., Krajča, P. (2020). Pruning in Map-Reduce Style CbO Algorithms. In: Alam, M., Braun, T., Yun, B. (eds) Ontologies and Concepts in Mind and Machine. ICCS 2020. Lecture Notes in Computer Science(), vol 12277. Springer, Cham. https://doi.org/10.1007/978-3-030-57855-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-57855-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57854-1
Online ISBN: 978-3-030-57855-8
eBook Packages: Computer ScienceComputer Science (R0)