Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection

Steinhauer, Jeremy; Delcambre, Lois M. L.; Lykke, Marianne; Ådland, Marit Kristine

doi:10.1007/s00799-014-0117-z

Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection

Published: 18 July 2014

Volume 14, pages 167–179, (2014)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Jeremy Steinhauer¹,
Lois M. L. Delcambre¹,
Marianne Lykke² &
…
Marit Kristine Ådland³

1254 Accesses
Explore all metrics

Abstract

We seek to improve information retrieval in a domain-specific collection by clustering user sessions from a click log and then classifying later user sessions in real time. As a preliminary step, we explore the main assumption of this approach: whether user sessions in such a site are related to the question that they are answering. Since a large class of machine learning algorithms use a distance measure at the core, we evaluate the suitability of common machine learning distance measures to distinguish sessions of users searching for the answer to same or different questions. We found that two distance measures work very well for our task and three others do not. As a further step, we then investigate how effective the distance measures are when used in clustering. For our dataset, we conducted a user study where we had multiple users answer the same set of questions. This data, grouped by question, was used as our gold standard for evaluating the clusters produced by the clustering algorithms. We found that the observed difference between the two classes of distance measures affected the quality of the clusterings, as expected. We also found that one of the two distance measures that worked well to differentiate sessions, worked significantly better than the other when clustering. Finally, we discuss why some distance metrics performed better than others in the two parts of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Recommender Systems: Techniques, Applications, and Challenges

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Article 18 October 2014

References

Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’00, pp. 407–416. ACM, New York (2000). doi:10.1145/347090.347176
Castellano, G., Fanelli, A.M., Torsello, M.A.: Mining usage profiles from access data using fuzzy clustering. In: Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization. SMO’06, pp. 157–160. World Scientific and Engineering Academy and Society (WSEAS), Wisconsin (2006). http://dl.acm.org/citation.cfm?id=1369472.1369500. Accessed 12 May 2014
Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions and the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’01, pp. 490–497. ACM, New York (2001). doi:10.1145/365024.365325
Fu, Y., Sandhu, K., Shih, M.Y.: Clustering of web users based on access patterns. In: Proceedings of the 1999 KDD Workshop on Web Mining. Springer, Berlin (1999)
Li, C.: Research on web session clustering. JSW 4(5), 460–468 (2009). doi:10.4304/jsw.4.5.460-468
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000). doi:10.1145/345124.345169
Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R.: Mining web access logs using relational competitive fuzzy clustering. In: Proceedings of the Eight International Fuzzy Systems Association World Congress (1999). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4050. Accessed 12 May 2014
Pallis, G., Angelis, L., Vakali, A.: Validation and interpretation of web users’ sessions clusters. Inf. Process. Manage. 43(5), 1348–1367 (2007). doi:10.1016/j.ipm.2006.10.010
Wang, W., Zaïane, O.R.: Clustering web sessions by sequence alignment. In: Proceedings of the 13th international workshop on database and expert systems applications (DEXA 2002). Aix-en-Provence, pp. 394–398. Springer, Berlin (2002)
Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of the Fifth International World Wide Web Conference on Computer Networks and ISDN Systems. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp.1007–1014 (1996). http://dl.acm.org/citation.cfm?id=232710.232725. Accessed 12 May 2014
Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Inform. Storage Retr. 7, 217–240 (1971)
Voorhees, E.M.: The cluster hypothesis revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’85, pp. 188–196. ACM, New York (1985). doi:10.1145/253495.253524
Steinhauer, J., Delcambre, L.M.L., Lykke, M., Aadland, M.K.: Do user (browse and click) sessions relate to their questions in a domain-specific collection? In: Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 8092, pp. 96–107. Springer, Berlin, Heidelberg (2013)
Strehl, A. Strehl, E., Ghosh, J. Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search, AAAI 2000, pp. 58–64 (2000)
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000)
Ageev, M., Guo, Q., Lagun, D., Agichtein, E.: Find it if you can: a game for modeling different types of web search success using interaction data. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’11, pp. 345–354. ACM, New York (2011). doi:10.1145/2009916.2009965
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Min. Knowl. Discov. 6, 61–82 (2002)
Buhrmester M., Kwang T., Gosling S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1):3–5 (2011). doi:10.1177/1745691610393980
Mahout. http://mahout.apache.org/. Accessed 12 May 2014
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
Achtert, E., Goldhofer, S., Kriegel, H.P., Schubert, E., Zimek, A.: Evaluation of clusterings—metrics and visual support. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1285–1288 (2012)

Download references

Acknowledgments

We acknowledge the support of the Danish Cancer Society and Mr. Tor Øyan, our contact. We also received support from the National Science Foundation, award 0812260. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the NSF. We thank Ms. Tesca Fitzgerald, Ms. Suzanna Kanga, Ms. Flery Decker, and Jonathon Britell, MD, Board Certified Oncologist.

Author information

Authors and Affiliations

Department of Computer Science, Portland State University, Portland, OR, USA
Jeremy Steinhauer & Lois M. L. Delcambre
Department of Communication and Psychology, Aalborg University, Aalborg, Denmark
Marianne Lykke
Department of Library and Information Science, Oslo University College, Oslo, Norway
Marit Kristine Ådland

Authors

Jeremy Steinhauer
View author publications
You can also search for this author in PubMed Google Scholar
Lois M. L. Delcambre
View author publications
You can also search for this author in PubMed Google Scholar
Marianne Lykke
View author publications
You can also search for this author in PubMed Google Scholar
Marit Kristine Ådland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeremy Steinhauer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steinhauer, J., Delcambre, L.M.L., Lykke, M. et al. Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection. Int J Digit Libr 14, 167–179 (2014). https://doi.org/10.1007/s00799-014-0117-z

Download citation

Received: 31 October 2013
Revised: 16 May 2014
Accepted: 21 May 2014
Published: 18 July 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00799-014-0117-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Recommender Systems: Techniques, Applications, and Challenges

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating distance-based clustering for user (browse and click) sessions in a domain-specific collection

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Recommender Systems: Techniques, Applications, and Challenges

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation