Abstract
The results returned by a search, datamining or database engine often contains an overload of potentially interesting information. A daunting and challenging problem for a user is to pick out the useful information. In this paper we propose an interactive framework to efficiently explore and (re)rank the objects retrieved by such an engine, according to feedback provided on part of the initially retrieved objects. In particular, given a set of objects, a similarity measure applicable to the objects and an initial set of objects that are of interest to the user, our algorithm computes the k most similar objects. This problem, previously coined as ’cluster on demand’ [10], is solved by transforming the data into a weighted graph. On this weighted graph we compute a relevance score between the initial set of nodes and the remaining nodes based upon random walks with restart in graphs. We apply our algorithm “Tell Me More” (TMM) on text, numerical and zero/one data. The results show that TMM for almost every experiment significantly outperforms a k-nearest neighbor approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: ADMA, pp. 307–328 (1996)
Cilibrasi, R., Vitányi, P., Wolf, R.: Algorithmic clustering of music. In: 4th International Conference on WEB Delivering of Music, pp. 110–117 (2004)
Coenen, F.: The lucs-kdd discretised/normalised arm and carm data library
Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246 (2007)
Dean, J., Henzinger, M.: Finding related pages in the world wide web. Computer Networks 31(11-16), 1467–1479 (1999)
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1), 64–69 (2006)
Denoyer, L., Gallinari, P.: Report on the XML mining track at inex 2005 and inex 2006: categorization and clustering of XML documents. SIGIR Forum 41(1), 79–90 (2007)
Faloutsos, C., Megalooikonomou, V.: On data mining, compression, and Kolmogorov complexity. Data Mining and Knowledge Discovery 15(1), 3–20 (2007)
Fuxman, A., Tsaparas, P., Achan, K., Agrawal, R.: Using the wisdom of the crowds for keyword generation. In: WWW (2008)
Ghahramani, Z., Heller, K.: Bayesian sets. In: Advances in Neural Information Processing Systems (2005)
Haveliwala, T.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)
Haveliwala, T., Gionis, A., Klein, D., Indyk, P.: Evaluating strategies for similarity search on the web. In: WWW, pp. 432–442 (2002)
De Knijf, J.: Mining tree patterns with almost smallest supertrees. In: SIAM International Conference on Data Mining. SIAM, Philadelphia (2008)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Onuma, K., Tong, H., Faloutsos, C.: Tangent: a novel, ’surprise me’, recommendation algorithm. In: KDD, pp. 657–666 (2009)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: KDD, pp. 653–658 (2004)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SIAM International Conference on Data Mining (2006)
Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: IEE Intl. Conf. on Data Mining, pp. 418–425 (2005)
Tong, H., Faloutsos, C.: Center-piece subgraphs: problem definition and fast solutions. In: KDD, pp. 404–413 (2006)
Tong, H., Faloutsos, C., Pan, J.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14(3), 327–346 (2008)
Voorhees, E.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: SIGIR, pp. 315–323 (1998)
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data & Knowledge Engeneering 60(1), 5–29 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Knijf, J., Liekens, A., Goethals, B. (2011). “Tell Me More”: Finding Related Items from User Provided Feedback. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-24477-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)