Abstract
In this paper we consider the problem of clustering transaction data. Most of existing transactional clustering algorithms encounter difficulties in the presence of overlapping clusters with a large number outlier items that do not contribute to formation of clusters. Furthermore, the vast majority of existing approaches are dependent on multiple parameters which may be difficult to tune, especially in real-life applications. To these problems, we propose a parameter-free transactional clustering algorithm. Our algorithm first scans the data set in a sequential manner such that the destination of the next transaction is guided by a novel objective function. Once the first scan of the data set is completed, the algorithm performs a few other passes over the data set in order to refine the clustering. The proposed algorithm is able to automatically identify clusters in the presence of large number of outlier items in the data set without any parameters setting by the user. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Bouguessa, M., Wang, S.: Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 21(4), 507–522 (2009)
Li, T.: A General Model for Clustering Binary Data. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 188–197 (2005)
Wang, K., Xu, C., Liu, B.: Clustering Transactions Using Large Items. In: 8th ACM International Conference on Information and Knowledge Management, pp. 483–490 (1999)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2006)
Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 121–130. Springer, Heidelberg (2001)
Cesario, E., Manco, G., Ortale, R.: Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data. IEEE Transactions on Knowledge and Data Engineering vol 12(12), 1607–1623 (2007)
Yan, H., Chen, K., Liu, L., Bae, J., Yi, Z.: Efficiently Clustering Transactional Data with Weighted Coverage Density. In: 15th ACM International Conference on Information and Knowledge Management, pp. 367–376 (2006)
Yang, Y., Padmanabhan, B.: GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions. IEEE Transactions on Knowledge and Data Engineering 17(9), 1300–1304 (2005)
Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 682–687 (2002)
Giannotti, F., Gozzi, C., Manco, G.: Characterizing Web user accesses: a transactional approach to Web log clustering. In: IEEE International Conference on Information Technology: Coding and Computing, pp. 312–317 (2002)
Yun, C.-H., Chuang, K.-T., Chen, M.-S.: Adherence clustering: an efficient method for mining market-basket clusters. Information systems 31(3), 170–186 (2004)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards Parameter-Free Data Mining. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2004)
Gan, G., Wu, J.: Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations News letter 6(2), 87–94 (2004)
Barbara, D., Li, Y., Couto, J.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management, pp. 582–589 (2002)
Andritsos, P., Tsaparas, P., Miller, R., Sevcik, K.: LIMBO: Scalable Clustering of Categorical Data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)
Zaki, M., Peters, M., Assent, I., Seidl, T.: CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets. Data and Knowledge Engineering 60(1), 51–70 (2007)
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems 25(5), 345–366 (2001)
Zhong, S., Ghosh, J.: A Comparative Study of Generative Models for Documeent Clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Application (2003)
Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and data mining, pp. 877–885 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouguessa, M. (2011). A Practical Approach for Clustering Transaction Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-23199-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)