Skip to main content

Reducing Negative Impact of Noise in Boolean Matrix Factorization with Association Rules

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XIX (IDA 2021)

Abstract

Boolean matrix factorization (BMF) is a well-established data analytical method whose goal is to decompose a single large matrix into two, preferably smaller, matrices, carrying the same or similar information as the original matrix. In essence, it can be used to reduce data dimensionality and to provide fundamental insight into data. Existing algorithms are often negatively affected by the presence of noise in the data, which is a common case for real-world datasets. We present an initial study on an algorithm for approximate BMF that uses association rules in a novel way to identify possible noise. This allows us to suppress the impact of noise and improve the quality of results. Moreover, we show that association rules provide a suitable framework allowing the handling of noise in BMF in a justified way.

P. Krajča—was supported by the grant JG 2019 of Palacký University Olomouc, No. JG_2019_008. Martin Trnecka was supported by the grant JG 2020 of Palacký University Olomouc, No. JG_2020_003. Support by Grant No. IGA_PrF_2020_019 of IGA of Palacký University is also acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. UCI Machine Learning Repository (2020). http://archive.ics.uci.edu/ml

  2. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of ACM SIGMOD. ACM Press (1993)

    Google Scholar 

  3. Andrews, S.: A ‘best-of-breed’ approach for designing a fast algorithm for computing fixpoints of Galois connections. Inf. Sci. 295, 633–649 (2015)

    Article  MathSciNet  Google Scholar 

  4. Belohlávek, R., Trnecka, M.: From-below approximations in Boolean matrix factorization: geometry and new algorithm. J. Comput. Syst. Sci. 81(8), 1678–1697 (2015)

    Article  MathSciNet  Google Scholar 

  5. Belohlávek, R., Trnecka, M.: Handling noise in Boolean matrix factorization. Int. J. Approx. Reason. 96, 78–94 (2018)

    Article  MathSciNet  Google Scholar 

  6. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)

    Article  MathSciNet  Google Scholar 

  7. Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2

    Book  MATH  Google Scholar 

  8. Gupta, R., Fang, G., Field, B., Steinbach, M.S., Kumar, V.: Quantitative evaluation of approximate frequent pattern mining algorithms. In: Li, Y., Liu, B., Sarawagi, S. (eds.) Proceedings of ACM SIGKDD (2008)

    Google Scholar 

  9. Lucchese, C., Orlando, S., Perego, R.: A unifying framework for mining approximate top-k binary patterns. IEEE Trans. Knowl. Data Eng. 26(12), 2900–2913 (2014)

    Article  Google Scholar 

  10. Makhalova, T., Trnecka, M.: From-below Boolean matrix factorization algorithm based on mdl. Adv. Data Anal. Classif. 1–20 (2020)

    Google Scholar 

  11. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)

    Article  Google Scholar 

  12. Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)

    Article  Google Scholar 

  13. Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data. Inf. Sci. 185(1), 114–127 (2012)

    Article  MathSciNet  Google Scholar 

  14. Rauch, J.: Observational Calculi and Association Rules. Studies in Computational Intelligence, vol. 469. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-11737-4

    Book  MATH  Google Scholar 

  15. Trnecka, M., Vyjidacek, R.: Revisiting the Grecon algorithm for Boolean matrix factorization. In: Valverde-Albacete, F.J., Trnecka, M. (eds.) Proceedings of the Fifthteenth International Conference on Concept Lattices and Their Applications, Tallinn, Estonia, June 29-July 1, 2020. CEUR Workshop Proceedings, vol. 2668, pp. 59–70. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2668/paper4.pdf

  16. Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov. 23(2), 215–251 (2011)

    Article  MathSciNet  Google Scholar 

  17. Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Krajča .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krajča, P., Trnecka, M. (2021). Reducing Negative Impact of Noise in Boolean Matrix Factorization with Association Rules. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74251-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74250-8

  • Online ISBN: 978-3-030-74251-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics