Abstract
In many applications, data is non-vector in nature. For example, one might have transaction data from a dialup access system, where each customer has an observed time-series of dialups which are different on start time and dialup duration from customer to customer. It’s difficult to convert this type of data to a vector form, so that the existing algorithms oriented on vector data [5] are hard to cluster the customers with their dialup events. This paper presents an efficient model-based algorithm to cluster individuals whose data is non-vector in nature. Then we evaluate on a large data set of dialup transaction, in order to show that this algorithm is fast and scalable for clustering, and accurate for prediction. At the same time, we compare this algorithm with vector clustering algorithm by predicting accuracy, to show that the former is fitter for non-vector data than the latter.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search In Sequence Database. In: Proc. of 4th Int. Conf. of Foundations of Data Organization and Algorithms, Chicago, IL (October 1993)
Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. In: Proc. of KDD 2000, pp. 280–284. ACM Press, New York (2000)
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal statistical Society Series B 39(1), 1–38 (1977)
Frank, E., Hall, M., Trigg, L., Kirkby, R., Schmidberger, G., Ware, M., Xu, X., Bouckaert, R., Wang, Y., Inglis, S., Witten, I.H., et al.: Weka 3: Machine Learning Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Han, J., Kambe, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luo, K., Wang, J., Li, D., Sun, J. (2003). Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_57
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive