Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

Luo, Kedong; Wang, Jianmin; Li, Deyi; Sun, Jiaguang

doi:10.1007/978-3-540-45080-1_57

Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

Kedong Luo⁷,
Jianmin Wang⁷,
Deyi Li⁷ &
…
Jiaguang Sun⁷

Conference paper

969 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Abstract

In many applications, data is non-vector in nature. For example, one might have transaction data from a dialup access system, where each customer has an observed time-series of dialups which are different on start time and dialup duration from customer to customer. It’s difficult to convert this type of data to a vector form, so that the existing algorithms oriented on vector data [5] are hard to cluster the customers with their dialup events. This paper presents an efficient model-based algorithm to cluster individuals whose data is non-vector in nature. Then we evaluate on a large data set of dialup transaction, in order to show that this algorithm is fast and scalable for clustering, and accurate for prediction. At the same time, we compare this algorithm with vector clustering algorithm by predicting accuracy, to show that the former is fitter for non-vector data than the latter.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search In Sequence Database. In: Proc. of 4th Int. Conf. of Foundations of Data Organization and Algorithms, Chicago, IL (October 1993)
Google Scholar
Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. In: Proc. of KDD 2000, pp. 280–284. ACM Press, New York (2000)
Chapter Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal statistical Society Series B 39(1), 1–38 (1977)
MathSciNet Google Scholar
Frank, E., Hall, M., Trigg, L., Kirkby, R., Schmidberger, G., Ware, M., Xu, X., Bouckaert, R., Wang, Y., Inglis, S., Witten, I.H., et al.: Weka 3: Machine Learning Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Han, J., Kambe, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, China
Kedong Luo, Jianmin Wang, Deyi Li & Jiaguang Sun

Authors

Kedong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Deyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaguang Sun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, K., Wang, J., Li, D., Sun, J. (2003). Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_57

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics