skip to main content
10.1145/3460620.3460735acmotherconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

An Effective Algorithm for Extracting Maximal Bipartite Cliques

Published: 04 June 2021 Publication History

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the DATA 2021 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

Abstract

The reduction of bipartite clique enumeration problem into a clique enumeration problem is a well-known approach for extracting maximal bipartite cliques. In this approach, the graph inflation is used to transform a bipartite graph to a general graph, then any maximal clique enumeration algorithm can be used. However, between every two vertices (in the same set), the traditional inflation algorithm adds a new edge. Therefore incurring high computation overhead, which is impractical and cannot be scaled up to handle large graphs. This paper proposes a new algorithm for extracting maximal bipartite cliques based on an efficient graph inflation algorithm. The proposed algorithm adds the minimal number of edges that are required to convert all maximal bipartite cliques to maximal cliques. The proposed algorithm has been evaluated, using different real world benchmark graphs, according to the correctness of the algorithm, running time (in the inflation and enumeration steps), and according to the overhead of the inflation algorithm on the size of the generated general graph. The empirical evaluation proves that the proposed algorithm is accurate, efficient, effective, and applicable to real world graphs more than the traditional algorithm.

References

[1]
2003. Sandi graph. http://vlado.fmf.uni-lj.si/pub/networks/data/2mode/Sandi/Sandi.htm
[2]
Gabriela Alexe, Sorin Alexe, Yves Crama, Stephan Foldes, Peter L Hammer, and Bruno Simeone. 2004. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics 145, 1 (2004), 11–21.
[3]
Erich J Baker, Jeremy J Jay, Vivek M Philip, Yun Zhang, Zuopan Li, Roumyana Kirova, Michael A Langston, and Elissa J Chesler. 2009. Ontological discovery environment: A system for integrating gene–phenotype associations. Genomics 94, 6 (2009), 377–387.
[4]
Vladimir Batagelj and Andrej Mrvar. 2006. Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/(2006).
[5]
Alex Beutel. 2016. User behavior modeling with large-scale graph analysis. Ph.D. Dissertation. Ph. D. Thesis at Carnegie Mellon University.
[6]
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd international conference on World Wide Web. ACM, 119–130.
[7]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575–577.
[8]
Yizong Cheng and George M Church. 2000. Biclustering of expression data. In Ismb, Vol. 8. 93–103.
[9]
Elissa J Chesler and Michael A Langston. 2007. Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data. In Systems Biology and Regulatory Genomics. Springer, 150–165.
[10]
W de Nooy. 2006. Ringen om de macht. In Wilco Dekker&Ben van Raaij, De elite. De Volkskrant Top 200 van invloedrijkste Nederlanders. Meulenhoff, 85–94.
[11]
Grahne G and Jianfei Zhu. 2004. Reducing the main memory consumptions of FPmax* and FPclose. In Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), Aachen, Germany. Citeseer, 75.
[12]
Gösta Grahne and Jianfei Zhu. 2003. Efficiently using prefix-trees in mining frequent itemsets. In FIMI, Vol. 90.
[13]
Roumyana Kirova, Michael A Langston, Xinxia Peng, Andy D Perkins, and Elissa J Chesler. 2006. A systems genetic analysis of chronic fatigue syndrome: combinatorial data integration from SNPs to differential diagnosis of disease. In Proceedings, International Conference for the Critical Assessment of Microarray Data Analysis (CAMDA06).
[14]
Jinyan Li, Haiquan Li, Donny Soh, and Limsoon Wong. 2005. A correspondence between maximal complete bipartite subgraphs and closed patterns. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 146–156.
[15]
Jinyan Li, Guimei Liu, Haiquan Li, and Limsoon Wong. 2007. Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms. IEEE Transactions on Knowledge and Data Engineering 19, 12(2007), 1625–1637.
[16]
Guimei Liu, Kelvin Sim, and Jinyan Li. 2006. Efficient mining of large maximal bicliques. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 437–448.
[17]
Jinze Liu and Wei Wang. 2003. Op-cluster: Clustering by tendency in high dimensional space. In Third IEEE International Conference on Data Mining, ICDM 2003. IEEE, 187–194.
[18]
Kazuhisa Makino and Takeaki Uno. 2004. New algorithms for enumerating all maximal cliques. In Scandinavian Workshop on Algorithm Theory. Springer, 260–272.
[19]
Richard A Mushlin, Aaron Kershenbaum, Stephen T Gallagher, and Timothy R Rebbeck. 2007. A graph-theoretical approach for pattern discovery in epidemiological research. IBM systems journal 46, 1 (2007), 135–149.
[20]
Michael J Sanderson, Amy C Driskell, Richard H Ree, Oliver Eulenstein, and Sasha Langley. 2003. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Molecular biology and evolution 20, 7 (2003), 1036–1042.
[21]
Amos Tanay, Roded Sharan, and Ron Shamir. 2002. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, suppl_1 (2002), S136–S144.
[22]
Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. 2004. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In Fimi, Vol. 126.
[23]
Haixun Wang, Wei Wang, Jiong Yang, and Philip S Yu. 2002. Clustering by pattern similarity in large data sets. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM, 394–405.
[24]
Jianyong Wang, Jiawei Han, and Jian Pei. 2003. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 236–245.
[25]
Mohammed J Zaki and Ching-Jui Hsiao. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2002 SIAM international conference on data mining. SIAM, 457–473.
[26]
Mohammed Javeed Zaki and Mitsunori Ogihara. 1998. Theoretical foundations of association rules. In 3rd ACM SIGMOD workshop on research issues in data mining and knowledge discovery. 71–78.
[27]
Yun Zhang, Charles A Phillips, Gary L Rogers, Erich J Baker, Elissa J Chesler, and Michael A Langston. 2014. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics 15, 1 (2014), 110.

Cited By

View all
  • (2023)New Constant Dimension Subspace Codes From the Mixed Dimension ConstructionIEEE Transactions on Information Theory10.1109/TIT.2023.325592969:7(4333-4344)Online publication date: Jul-2023

Index Terms

  1. An Effective Algorithm for Extracting Maximal Bipartite Cliques
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      DATA'21: International Conference on Data Science, E-learning and Information Systems 2021
      April 2021
      277 pages
      ISBN:9781450388382
      DOI:10.1145/3460620
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 June 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bi-clique
      2. bipartite clique
      3. bipartite core
      4. bipartite graphs
      5. maximal bipartite clique enumeration problem

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      DATA'21

      Acceptance Rates

      Overall Acceptance Rate 74 of 167 submissions, 44%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)New Constant Dimension Subspace Codes From the Mixed Dimension ConstructionIEEE Transactions on Information Theory10.1109/TIT.2023.325592969:7(4333-4344)Online publication date: Jul-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media