ABSTRACT
Finding interesting patterns is a classical problem in data mining. Boolean matrix decomposition is nowadays a standard tool that can find a set of patterns-also called factors-in Boolean data that explain the data well. We describe and experimentally evaluate a probabilistic algorithm for Boolean matrix decomposition problem. The algorithm is derived from GreCon algorithm which uses formal concepts-maximal rectangles or tiles-as factors in order to find a decomposition. We change the core of GreCon by substituting a sampling procedure for a deterministic computation of suitable formal concepts. This allows us to alleviate the greedy nature of GreCon, creates a possibility to bypass some of the its pitfalls and to preserve its features, e.g. an ability to explain the entire data.
- Radim Belohlavek and Martin Trnecka. 2015. From-below approximations in Boolean matrix factorization: Geometry and new algorithm. J. Comput. Syst. Sci. 81, 8 (2015), 1678--1697. Google ScholarDigital Library
- Radim Belohlavek and Vilem Vychodil. 2010. Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76, 1 (2010), 3--20. Google ScholarDigital Library
- Mario Boley, Thomas Gärtner, and Henrik Grosskreutz. 2010. Formal Concept Sampling for Counting and Threshold-Free Local Pattern Mining. In Proceedings of the SIAM International Conference on Data Mining, SDM 2010. SIAM, 177--188.Google ScholarCross Ref
- Edwin Diday and Richard Emilion. 2003. Maximal and Stochastic Galois Lattices. Discrete Applied Mathematics 127, 2 (2003), 271--284. Google ScholarDigital Library
- Richard Emilion and Gérard Lévy. 2009. Size of random Galois lattices and number of closed frequent itemsets. Discrete Applied Mathematics 157, 13 (2009), 2945--2957. Google ScholarDigital Library
- Alina Ene, William G. Horne, Nikola Milosavljevic, Prasad Rao, Robert Schreiber, and Robert Endre Tarjan. 2008. Fast exact and heuristic methods for role minimization problems. In 13th ACM Symposium on Access Control Models and Technologies, SACMAT 2008. ACM, 1--10. Google ScholarDigital Library
- Bernhard Ganter. 2011. Random Extents and Random Closure Systems. In Proceedings of The Eighth International Conference on Concept Lattices and Their Applications, 2011 (CEUR Workshop Proceedings), Vol. 959. CEUR-WS.org, 309--318. http://ceur-ws.org/Vol-959/paper21.pdfGoogle Scholar
- Bernhard Ganter and Rudolf Wille. 1999. Formal concept analysis - mathematical foundations. Springer. Google Scholar
- Hermann Gruber and Markus Holzer. 2007. Inapproximability of Nondeterministic State and Transition Complexity Assuming P=!NP. In Developments in Language Theory, 11th International Conference, DLT 2007 (Lecture Notes in Computer Science), Vol. 4588. Springer, 205--216. Google ScholarDigital Library
- Mosche Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
- Pauli Miettinen, Taneli Mielikainen, Aristides Gionis, Gautam Das, and Heikki Mannila. 2008. The Discrete Basis Problem. IEEE Transactions on Knowledge and Data Engineering 20, 10 (2008), 1348--1362. Google ScholarDigital Library
- James Orlin. 1977. Contentment in graph theory: Covering graphs with cliques. Indagationes Mathematicae (Proceedings) 80, 5 (1977), 406--424.Google ScholarCross Ref
- Jan Outrata and Martin Trnecka. 2016. Running Boolean Matrix Factoriza- tion in Parallel. In Proceedings of the 14th Australasian Data Mining Conference (AusDM 2016).Google Scholar
- Hans Ulrich Simon. 1990. On Approximate Solutions for Combinatorial Optimization Problems. SIAM J. Discrete Math. 3, 2 (1990), 294--310.Google ScholarCross Ref
Index Terms
- Boolean Matrix Decomposition by Formal Concept Sampling
Recommendations
Characteristic matrix of covering and its application to Boolean matrix decomposition
Covering-based rough sets provide an efficient means of dealing with covering data, which occur widely in practical applications. Boolean matrix decomposition has frequently been applied to data mining and machine learning. In this paper, three types of ...
Multi-label classification using boolean matrix decomposition
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingThis paper introduces a new multi-label classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations of the original labels. Base ...
A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix
Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices that are not ...
Comments