Efficient Clustering for Orders

Kamishima, Toshihiro; Akaho, Shotaro

doi:10.1007/978-3-540-88067-7_15

Toshihiro Kamishima⁴ &
Shotaro Akaho⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 165))

766 Accesses
18 Citations

Abstract

Lists of ordered objects are widely used as representational forms. Such ordered objects include Web search results or best-seller lists. Clustering is a useful data analysis technique for grouping mutually similar objects. To cluster orders, hierarchical clustering methods have been used together with dissimilarities defined between pairs of orders. However, hierarchical clustering methods cannot be applied to large-scale data due to their computational cost in terms of the number of orders. To avoid this problem, we developed an k-o’means algorithm. This algorithm successfully extracted grouping structures in orders, and was computationally efficient with respect to the number of orders. However, it was not efficient in cases where there are too many possible objects yet. We therefore propose a new method (k-o’means-EBC), grounded on a theory of order statistics. We further propose several techniques to analyze acquired clusters of orders.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Luaces, O., Bayón, G.F., Quevedo, J.R., Díez, J., del Coz, J.J., Bahamonde, A.: Analyzing sensory data using non-linear preference learning with feature subset selection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 286–297. Springer, Heidelberg (2004)
Google Scholar
Fujibuchi, W., Kiseleva, L., Horton, P.: Searching for similar gene expression profiles across platforms. In: Proc. of the 16th Int’l Conf. on Genome Informatics, p. 143 (2005)
Google Scholar
Everitt, B.S.: Cluster Analysis, 3rd edn. Edward Arnold (1993)
Google Scholar
Marden, J.I.: Analyzing and Modeling Rank Data. Monographs on Statistics and Applied Probability, vol. 64. Chapman & Hall, Boca Raton (1995)
MATH Google Scholar
Branting, L.K., Broos, P.S.: Automated acquisition of user preference. Int’l Journal of Human-Computer Studies 46, 55–77 (1997)
Article Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of The 8th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 133–142 (2002)
Google Scholar
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325 (1995)
Article MATH MathSciNet Google Scholar
Kamishima, T., Fujiki, J.: Clustering orders. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 194–207. Springer, Heidelberg (2003)
Google Scholar
Kendall, M., Gibbons, J.D.: Rank Correlation Methods, 5th edn. Oxford University Press, Oxford (1990)
MATH Google Scholar
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the Web. In: Proc. of The 10th Int’l Conf. on World Wide Web, pp. 613–622 (2001)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Fligner, M.A., Verducci, J.S.: Distance based ranking models. Journal of The Royal Statistical Society (B) 48(3), 359–369 (1986)
MATH MathSciNet Google Scholar
Thurstone, L.L.: A law of comparative judgment. Psychological Review 34, 273–286 (1927)
Article Google Scholar
Mosteller, F.: Remarks on the method of paired comparisons: I — the least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16(1), 3–9 (1951)
Article Google Scholar
de Borda, J.C.: On elections by ballot (1784). In: McLean, I., Urken, A.B. (eds.) Classics of Social Choice, pp. 81–89. The University of Michigan Press (1995)
Google Scholar
Mallows, C.L.: Non-null ranking models. I. Biometrika 44, 114–130 (1957)
MATH MathSciNet Google Scholar
Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: A First Course in Order Statistics. John Wiley & Sons, Inc., Chichester (1992)
MATH Google Scholar
Kamishima, T., Motoyoshi, F.: Learning from cluster examples. Machine Learning 53, 199–233 (2003)
Article MATH Google Scholar
Kamishima, T.: Nantonac collaborative filtering: Recommendation based on order responses. In: Proc. of The 9th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 583–588 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 2, Umezono 1–1–1, Tsukuba, Ibaraki, 305–8568, Japan
Toshihiro Kamishima & Shotaro Akaho

Authors

Toshihiro Kamishima
View author publications
You can also search for this author in PubMed Google Scholar
Shotaro Akaho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Lyon, Lyon, France
Djamel A. Zighed & Hakim Hacid &
Shimane University, Shimane, Japan
Shusaku Tsumoto
University of North Carolina, Charlotte, NC, USA
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kamishima, T., Akaho, S. (2009). Efficient Clustering for Orders. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds) Mining Complex Data. Studies in Computational Intelligence, vol 165. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88067-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-88067-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88066-0
Online ISBN: 978-3-540-88067-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics