Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Ozdal, Muhammet Mustafa; Aykanat, Cevdet

doi:10.1023/B:DAMI.0000026903.59233.2a

Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Published: July 2004

Volume 9, pages 29–57, (2004)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Muhammet Mustafa Ozdal¹ &
Cevdet Aykanat²

363 Accesses
31 Citations
Explore all metrics

Abstract

In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depends on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two-phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: Final report, in Proc. of the DARPA Broadcast News Transcription and UnderstandingWorkshop, pp. 194–218.
Barnett, V. and Lewis, T. 1994. Outliers in Statistical Data, John Wiley & Sons.
Bonchi, F., Giannotti, F., Mainetto, G., and Pedeschi, D. 1999. A classification-based methodology for planning audit strategies in fraud detection. In Proc. of KDD-99, pp. 175–184.
Burge, P. and Shawe-Taylor, J. 1997. Detecting cellular fraud using adaptive prototypes. In Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 9–13.
Chan, P. and Stolfo, S. 1998. Toward scalable learning with non-uniform class and cost-distributions: A case study in credit card fraud detection. In Proc. of KDD-98, AAAI-Press, pp. 164–168.
Cover, T. and Thomas, J.A. 1991. Elements of Information Theory. Wiley-International.
Dempster, A.P., Laird, N.M., and Ribin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1): 1–38.
Google Scholar
Fawcett, T. and Provost, F. 1997. Combining data mining and machine learning for effective fraud detection. In Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 14–19.
Fawcett, T. and Provost, F. 1999. Activity monitoring: Noticing interesting changes in behavior. In Proc. of KDD-99, pp. 53–62.
Grabec, I. 1990. Self-organization of Neurons described by the maximum-entropy principle, Biological Cybernetics,63: 403–409.
Article Google Scholar
Guralnik, V. and Srivastava, J. 1999. Event detection from time series data. In Proc. KDD-99, pp. 33–42.
Hawkins, D.M. 1980. Identification of Outliers. Chapman and Hall, London.
Google Scholar
Hunt, L.A. and Jorgensen, M.A. 1999. Mixture model clustering: A brief introduction to the MULTMIX program, Australian & New Zealand Journal of Statistics, 40: 153–171.
Google Scholar
Knorr, E.M. and Ng, R.T. 1998. Algorithms for mining distance-based outliers in large datasets. In Proc. of the 24th VLDB Conference, pp. 392–403.
Knorr, E.M. and Ng, R.T. 1999. Finding intensional knowledge of distance-based outliers. In Proc. of the 25th VLDB Conference, pp. 211–222.
Krichevskii, R.E. and Trofimov, V.K. 1981. The performance of universal coding. IEEE Trans. Inform. Theory, IT-27(2): 199–207.
Article Google Scholar
Lane, T. and Brodley, C. 1998. Approaches to on-line learning and concept drift for user identification in computer security. In Proc. of KDD-98, AAAI Press, pp. 66–72.
Lee, W., Stolfo, S.J., and Mok, K.W. 1998. Mining audit data to build intrusion detection models. In Proc. of KDD-98.
Lee, W., Stolfo, S.J., and Mok, K.W. 1999. Mining in a data-flow environment: Experience in network intrusion detection. In Proc. of KDD-99, pp. 114–124.
Marron, J.S. and Wand, M.P. 1992. Exact mean integrated squared error. Annals of Statistics, 20: 712–736.
Google Scholar
McLachlan, G. and Peel, D. 2000. Finite Mixture Models. Wiley Series in Probability and Statistics, John Wiley and Sons.
Moreau, Y. and Vandewalle, J. Detection of mobile phone fraud using supervised neural networks: Afirst prototype, Available via: ftp: //ftp.esat.kuleuven.ac.jp/pub/SISTA/moreau/reports/icann97 TR97–44.ps.
Neal, R.M. and Hinton, G.E. 1993. A view of the EM algorithm that justifies incremental, sparse, and other variants, ftp: //ftp.cs.toronto.edu/pub/radford/www/publications.html
Ng, S.K. and McLachlan, G.J. 2002. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Statistics & Computing. In press. Available at http: //www.maths.uq.edu.au/gim/increm.ps
Rocke, D.M. 1996. Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24(3): 1327–1345.
Article Google Scholar
Rosset, S., Murad, U., Neumann, E., Idan,Y., and Pinkas, G. 1999. Discovery of fraud rules for telecommunicationschallenges and solutions. In Proc. of KDD-99, pp. 409–413.
Williams, G.J. and Huang, Z. 1997. Mining the knowledge mine: The hot spots methodology for mining large real world databases. In Advanced Topics in Artificial Intelligence Lecture Notes in Artificial Intelligence, volume 1342, Springer-Verlag, pp. 340–348.
Google Scholar
Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. 2000. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proc. of KDD2000, ACM Press, pp. 250–254.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, USA
Muhammet Mustafa Ozdal
Computer Engineering Department, Bilkent University, USA
Cevdet Aykanat

Authors

Muhammet Mustafa Ozdal
View author publications
You can also search for this author in PubMed Google Scholar
Cevdet Aykanat
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ozdal, M.M., Aykanat, C. Hypergraph Models and Algorithms for Data-Pattern-Based Clustering. Data Min Knowl Disc 9, 29–57 (2004). https://doi.org/10.1023/B:DAMI.0000026903.59233.2a

Download citation

Issue Date: July 2004
DOI: https://doi.org/10.1023/B:DAMI.0000026903.59233.2a

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Abstract

Access this article

Similar content being viewed by others

Hypergraph motifs and their extensions beyond binary

Mining Frequent Patterns from Hypergraph Databases

Clustering Hypergraphs for Discovery of Overlapping Communities in Folksonomies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Abstract

Access this article

Similar content being viewed by others

Hypergraph motifs and their extensions beyond binary

Mining Frequent Patterns from Hypergraph Databases

Clustering Hypergraphs for Discovery of Overlapping Communities in Folksonomies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation