Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Bouras, Christos; Tsogkas, Vassilis

doi:10.1007/s13042-014-0264-y

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Original Article
Published: 11 May 2014

Volume 7, pages 171–184, (2016)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Christos Bouras^1,2 &
Vassilis Tsogkas¹

257 Accesses
5 Citations
Explore all metrics

Abstract

Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the “new user problem”. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordNet database and proves to be converging fast to the actual user interests based on minimal user ratings, which are provided during the registration process. In addition, we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web, using n-grams that are extracted from the articles at an offline stage. This technique is then compared with the single minded “bag-of-words” representation that our clustering algorithm, W-kmeans, previously used. Our experimentation reveals that via fine tuning the weighting parameters between keyword and n-grams, as well as the n value itself, a significant improvement regarding the clustering results metrics can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

Abou-Assaleh T, Cercone N, Keselj V, Sweidan R (2004) Detection of new malicious code using n-grams signatures. Second annual conference on privacy, security and trust. Fredericton, NB, pp 193–196
Google Scholar
Barron-Cedeno A, Rosso P (2009) On automatic plagiarism detection based on n-grams comparisons. In: Proceedings of the European conference on information retrieval, ECIR-2009, pp 696–700
Balabanovie M, Shoham Y (1997) Fab: content-based collaborative recommendation. Commun ACM 40:66–72
Article Google Scholar
Bouras C, Poulopoulos V, Tsogkas V (2008) PeRSSonal’s core functionality evaluation: enhancing text labeling through personalized summaries. Data Knowl Eng J 64(1):330–345 Elsevier Science
Article Google Scholar
Bouras C, Tsogkas V, (2010) W-kmeans: clustering news articles using wordnet. In: Proceedings of KES vol. 3, pp. 379–388
Bouras C, Tsogkas V (2011) Clustering user preferences using W-kmeans. In: Proceedings of the seventh international conference on signal-image technology and internet-based systems (SITIS), pp. 75–82
Cavnar W, Trenkle J (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94
Cleary J, Bell T, Witten I (1990) Text Compression, Prentice Hall
Crane M (2011) The new user problem in collaborative filtering. Thesis for the degree of Master of Science, Department of Computer Science, University of Otago
Damerau F, Apte C, Weiss S (1994) Toward language independent automated learning of text categorization models. In: Proceedings SIGIR-94
Ekstrand M.D, Riedl J.T,.Konstan J.A (2011). In: Collaborative filtering recommender systems, Found. Trends Hum. Comput. Interact 4
Furnkranz J (1998) A study using n-grams features for text categorization. Technical Report OEFAI-TR-98-30, Austrian research institute for artificial intelligence
Golbandi N, Koren Y, Lempel R (2010) On bootstrapping recommender systems. In: Proceedings of the 19th ACM International Conference of Information and Knowledge Management, ACM, pp. 1805–1808
Golbandi N Koren Y, Lempel R (2011) Adaptive bootstrapping of recommender systems using decision trees. In: Proceedings of the forth acm international conference on web search and data mining, pp. 595–604
Good N, Schafer J. B, Konstan J. A, Borchers A. Sarwar B. J, Herlocker, Riedl J (1990) Combining collaborative filtering with personal agents for better recommendations. In: Proccedings of the 16th international conference on artificial intelligence and the 11th innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida, United States, pp.439–446
Jung K.–Y, Park D., Lee J (2004) Hybrid collaborative filtering and content-based filtering for improved recommender system. Computational Science-ICCS, pp. 295–302
Jurafsky D, James H. M (2001) Speech and language processing. Prentice-Hall, Inc, 2000
Kohrs A, Merialdo B (2001) Improving collaborative filtering for new-users by smart object selection. In: Proceedings of international conference on media features, international conference on media futures, Florence, Italy
Koren Y, Bell R. M (2011) Advances in collaborative filtering. Recommender Systems Handbook, pages 145–186
Mahoui M, Witten I, Bray Z, Teahan W (1999) Text mining: a new frontier for lossless compression. In: Proceedings of the IEEE Data Compression Conference (DCC)
Nguyen H, Haddawy P (1998) The decision-theoretic video advisor. AAAI-98. Workshop on recommender systems, Madison, pp 77–80
Google Scholar
Park S, Pennock D, Madani O, Good N. DeCoste D (2006) Naïve filterbots for robust cold-start recommendations. In: Proceedings of the 12th ACM SIGKDD International conference on knowledge discovery and data mining, ACM, pp. 699–705
Pilaszy I. Tikk D (2009) Recommending new movies: even a few ratings are more valuable than metadata. In: Proceedings of the third acm conference on recommender systems, pp. 93–100
Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybernet 4(4):391–400
Article Google Scholar
Rashid AM, Istvan A, Cosley D, Lam SK, McNee SM, Konstan JA, Riedl J (2002) Getting to know you: learning new user preferences in recommender systems. Proceedings of the 7th international conference on Intelligent user interfaces. California, San Francisco, pp 127–134
Google Scholar
Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100
Article Google Scholar
Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybernet 4(2):107–117
Article Google Scholar
Wang X, Wang Y, Wang L (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit Letters 25(10):1123–1132
Article Google Scholar
Yeung DS, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. Pattern Anal Mach Intell, IEEE Transactions on 24(4):556–561
Article MathSciNet Google Scholar
Zhang L (2013) N-Gram Extraction Tools, http://homepages.inf.ed.ac.uk/lzhang10/ngram.html
Zhao Y, Karypi G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331
Article Google Scholar

Download references

Acknowledgments

This research has been co-financed by the European Union (European Social Fund–ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund.

Author information

Authors and Affiliations

Computer Engineering and Informatics Department, University of Patras, Patras, Greece
Christos Bouras & Vassilis Tsogkas
Computer Technology Institute and Press “Diophantus”, Rion, 26500, Patras, Greece
Christos Bouras

Authors

Christos Bouras
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Tsogkas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christos Bouras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouras, C., Tsogkas, V. Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem. Int. J. Mach. Learn. & Cyber. 7, 171–184 (2016). https://doi.org/10.1007/s13042-014-0264-y

Download citation

Received: 12 December 2013
Accepted: 29 April 2014
Published: 11 May 2014
Issue Date: April 2016
DOI: https://doi.org/10.1007/s13042-014-0264-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation