Skip to main content
Log in

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the “new user problem”. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordNet database and proves to be converging fast to the actual user interests based on minimal user ratings, which are provided during the registration process. In addition, we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web, using n-grams that are extracted from the articles at an offline stage. This technique is then compared with the single minded “bag-of-words” representation that our clustering algorithm, W-kmeans, previously used. Our experimentation reveals that via fine tuning the weighting parameters between keyword and n-grams, as well as the n value itself, a significant improvement regarding the clustering results metrics can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abou-Assaleh T, Cercone N, Keselj V, Sweidan R (2004) Detection of new malicious code using n-grams signatures. Second annual conference on privacy, security and trust. Fredericton, NB, pp 193–196

    Google Scholar 

  2. Barron-Cedeno A, Rosso P (2009) On automatic plagiarism detection based on n-grams comparisons. In: Proceedings of the European conference on information retrieval, ECIR-2009, pp 696–700

  3. Balabanovie M, Shoham Y (1997) Fab: content-based collaborative recommendation. Commun ACM 40:66–72

    Article  Google Scholar 

  4. Bouras C, Poulopoulos V, Tsogkas V (2008) PeRSSonal’s core functionality evaluation: enhancing text labeling through personalized summaries. Data Knowl Eng J 64(1):330–345 Elsevier Science

    Article  Google Scholar 

  5. Bouras C, Tsogkas V, (2010) W-kmeans: clustering news articles using wordnet. In: Proceedings of KES vol. 3, pp. 379–388

  6. Bouras C, Tsogkas V (2011) Clustering user preferences using W-kmeans. In: Proceedings of the seventh international conference on signal-image technology and internet-based systems (SITIS), pp. 75–82

  7. Cavnar W, Trenkle J (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94

  8. Cleary J, Bell T, Witten I (1990) Text Compression, Prentice Hall

  9. Crane M (2011) The new user problem in collaborative filtering. Thesis for the degree of Master of Science, Department of Computer Science, University of Otago

  10. Damerau F, Apte C, Weiss S (1994) Toward language independent automated learning of text categorization models. In: Proceedings SIGIR-94

  11. Ekstrand M.D, Riedl J.T,.Konstan J.A (2011). In: Collaborative filtering recommender systems, Found. Trends Hum. Comput. Interact 4

  12. Furnkranz J (1998) A study using n-grams features for text categorization. Technical Report OEFAI-TR-98-30, Austrian research institute for artificial intelligence

  13. Golbandi N, Koren Y, Lempel R (2010) On bootstrapping recommender systems. In: Proceedings of the 19th ACM International Conference of Information and Knowledge Management, ACM, pp. 1805–1808

  14. Golbandi N Koren Y, Lempel R (2011) Adaptive bootstrapping of recommender systems using decision trees. In: Proceedings of the forth acm international conference on web search and data mining, pp. 595–604

  15. Good N, Schafer J. B, Konstan J. A, Borchers A. Sarwar B. J, Herlocker, Riedl J (1990) Combining collaborative filtering with personal agents for better recommendations. In: Proccedings of the 16th international conference on artificial intelligence and the 11th innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida, United States, pp.439–446

  16. Jung K.–Y, Park D., Lee J (2004) Hybrid collaborative filtering and content-based filtering for improved recommender system. Computational Science-ICCS, pp. 295–302

  17. Jurafsky D, James H. M (2001) Speech and language processing. Prentice-Hall, Inc, 2000

  18. Kohrs A, Merialdo B (2001) Improving collaborative filtering for new-users by smart object selection. In: Proceedings of international conference on media features, international conference on media futures, Florence, Italy

  19. Koren Y, Bell R. M (2011) Advances in collaborative filtering. Recommender Systems Handbook, pages 145–186

  20. Mahoui M, Witten I, Bray Z, Teahan W (1999) Text mining: a new frontier for lossless compression. In: Proceedings of the IEEE Data Compression Conference (DCC)

  21. Nguyen H, Haddawy P (1998) The decision-theoretic video advisor. AAAI-98. Workshop on recommender systems, Madison, pp 77–80

    Google Scholar 

  22. Park S, Pennock D, Madani O, Good N. DeCoste D (2006) Naïve filterbots for robust cold-start recommendations. In: Proceedings of the 12th ACM SIGKDD International conference on knowledge discovery and data mining, ACM, pp. 699–705

  23. Pilaszy I. Tikk D (2009) Recommending new movies: even a few ratings are more valuable than metadata. In: Proceedings of the third acm conference on recommender systems, pp. 93–100

  24. Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybernet 4(4):391–400

    Article  Google Scholar 

  25. Rashid AM, Istvan A, Cosley D, Lam SK, McNee SM, Konstan JA, Riedl J (2002) Getting to know you: learning new user preferences in recommender systems. Proceedings of the 7th international conference on Intelligent user interfaces. California, San Francisco, pp 127–134

    Google Scholar 

  26. Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100

    Article  Google Scholar 

  27. Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybernet 4(2):107–117

    Article  Google Scholar 

  28. Wang X, Wang Y, Wang L (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit Letters 25(10):1123–1132

    Article  Google Scholar 

  29. Yeung DS, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. Pattern Anal Mach Intell, IEEE Transactions on 24(4):556–561

    Article  MathSciNet  Google Scholar 

  30. Zhang L (2013) N-Gram Extraction Tools, http://homepages.inf.ed.ac.uk/lzhang10/ngram.html

  31. Zhao Y, Karypi G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331

    Article  Google Scholar 

Download references

Acknowledgments

This research has been co-financed by the European Union (European Social Fund–ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Bouras.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouras, C., Tsogkas, V. Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem. Int. J. Mach. Learn. & Cyber. 7, 171–184 (2016). https://doi.org/10.1007/s13042-014-0264-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-014-0264-y

Keywords

Navigation