Skip to main content

Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach

  • Conference paper
  • 969 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Abstract

In many applications, data is non-vector in nature. For example, one might have transaction data from a dialup access system, where each customer has an observed time-series of dialups which are different on start time and dialup duration from customer to customer. It’s difficult to convert this type of data to a vector form, so that the existing algorithms oriented on vector data [5] are hard to cluster the customers with their dialup events. This paper presents an efficient model-based algorithm to cluster individuals whose data is non-vector in nature. Then we evaluate on a large data set of dialup transaction, in order to show that this algorithm is fast and scalable for clustering, and accurate for prediction. At the same time, we compare this algorithm with vector clustering algorithm by predicting accuracy, to show that the former is fitter for non-vector data than the latter.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search In Sequence Database. In: Proc. of 4th Int. Conf. of Foundations of Data Organization and Algorithms, Chicago, IL (October 1993)

    Google Scholar 

  2. Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. In: Proc. of KDD 2000, pp. 280–284. ACM Press, New York (2000)

    Chapter  Google Scholar 

  3. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal statistical Society Series B 39(1), 1–38 (1977)

    MathSciNet  Google Scholar 

  4. Frank, E., Hall, M., Trigg, L., Kirkby, R., Schmidberger, G., Ware, M., Xu, X., Bouckaert, R., Wang, Y., Inglis, S., Witten, I.H., et al.: Weka 3: Machine Learning Software in Java, http://www.cs.waikato.ac.nz/ml/weka/

  5. Han, J., Kambe, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luo, K., Wang, J., Li, D., Sun, J. (2003). Clustering Individuals in Non-vector Data and Predicting: A Novel Model-Based Approach. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45080-1_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40550-4

  • Online ISBN: 978-3-540-45080-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics