Abstract
Itemset mining and graph mining have attracted considerable attention in the field of data mining, since they have many important applications in various areas such as biology, marketing, and social network analysis. However, most existing studies focus only on either itemset mining or graph mining, and only a few studies have addressed a combination of both. In this paper, we introduce a new problem which we call itemset-sharing subgraph (ISS) set enumeration, where the task is to find sets of subgraphs with common itemsets in a large graph in which each vertex has an associated itemset. The problem has various interesting potential applications such as in side-effect analysis in drug discovery and the analysis of the influence of word-of-mouth communication in marketing in social networks. We propose an efficient algorithm ROBIN for finding ISS sets in such graph; this algorithm enumerates connected subgraphs having common itemsets and finds their combinations. Experiments using a synthetic network verify that our method can efficiently process networks with more than one million edges. Experiments using a real biological network show that our algorithm can find biologically interesting patterns. We also apply ROBIN to a citation network and find successful collaborative research works.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499 (1994)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00, pp. 1–12 (2000)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001, pp. 313–320 (2001)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: ICDM ’02, pp. 721 (2002)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: KDD ’04, pp. 59–68 (2004)
Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics 24(16), i167–i173 (2008)
Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD ’07, pp. 647–656 (2007)
Bayardo, R.: Efficiently mining long patterns from databases. In: SIGMOD ’98, pp. 85–93 (1998)
Gasch, A.P., et al.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11(12), 4241–4257 (2000)
Knowledge Discovery Laboratory, University of Massachusetts Amherst: The Proximity DBLP database, http://kdl.cs.umass.edu/data/dblp/dblp-info.html
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: KDD ’04, pp. 581–586 (2004)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in knowledge discovery and data mining, pp. 307–328 (1996)
Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE TKDE 17(4), 462–478 (2005)
Ulitsky, I., Shamir, R.: Identification of functional modules using network topology and high throughput data. BMC Systems Biology 1 (2007)
Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. In: SDM ’09 (2009)
Seki, M., Sese, J.: Identification of active biological networks and common expression conditions. In: BIBE ’08 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fukuzaki, M., Seki, M., Kashima, H., Sese, J. (2010). Finding Itemset-Sharing Patterns in a Large Itemset-Associated Graph. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6119. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13672-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13672-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13671-9
Online ISBN: 978-3-642-13672-6
eBook Packages: Computer ScienceComputer Science (R0)