IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization

Claveau, Vincent

doi:10.1007/978-3-642-32115-3_53

Vincent Claveau²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1958 Accesses
1 Citations

Abstract

In this article, we report on our participation in the JRS Data-Mining Challenge. The approach used by our system is a lazy-learning one, based on a simple k-nearest-neighbors technique. We more specifically addressed this challenge as an opportunity to test Information Retrieval (IR) inspired techniques in such a data-mining framework. In particular, we tested different similarity measures, including one called vectorization that we have proposed and tested in IR and Natural Language Processing frameworks. The resulting system is simple and efficient while offering good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Impact of Data Normalization on KNN Rendering

Pyndri: A Python Interface to the Indri Search Engine

Information retrieval: a view from the Chinese IR community

Article 29 September 2020

References

Berry, M., Martin, D.: Principal component analysis for information retrieval. In: Kontoghiorghes, E. (ed.) Handbook of Parallel Computing and Statistics. Statistics: A Series of Textbooks and Monographs (2005)
Google Scholar
Bourgain, J.: On Lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52(1) (1985)
Google Scholar
Claveau, V., Lefvre, S.: Topic segmentation of tv-streams by mathematical morphology and vectorization. In: Procedings of the InterSpeech Conference, Florence, Italy (2011)
Google Scholar
Claveau, V., Tavenard, R., Amsaleg, L.: Vectorisation des processus d’appariement document-requête. In: 7e Conférence en Recherche d’informations et Applications, CORIA 2010, Sousse, Tunisie, pp. 313–324 (March 2010)
Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA (2004)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)
MATH Google Scholar
Dumais, S.: Latent semantic analysis. ARIST Review of Information Science and Technology 38(4) (2004)
Google Scholar
Fox, E., Shaw, J.: Combination of multiple searches. In: Proceedings of the 2nd Text Retrieval Conference (TREC-2), pp. 243–252. NIST Special Publication (1994)
Google Scholar
Harter, S.: A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science 26(6), 197–206 (1975)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of SIGIR, Berkeley, USA (1999)
Google Scholar
Lee, J.: Combining multiple evidence from different properties of weighting schemes. In: Proceedings of the 18th Annual ACM-SIGIR, pp. 180–188 (1995)
Google Scholar
Lejsek, H., Asmundsson, F., Jónsson, B., Amsaleg, L.: Nv-tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. on Pattern Analysis and Machine Intelligence 99(1) (2008)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal on Research and Development 2(2) (1958)
Google Scholar
Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1) (1972)
Google Scholar
Spärck Jones, K., Walker, S.G., Robertson, S.E.: Probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management 36(6) (2000)
Google Scholar
Stein, B.: Principles of hash-based text retrieval. In: Proc. of SIGIR, Amsterdam, Pays-Bas (2007)
Google Scholar
Vempala, S.: The Random Projection Method. In: Discrete Mathematics and Theoretical Computer Science, vol. 65. AMS (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

IRISA – CNRS, Campus de Beaulieu, 35042, Rennes, France
Vincent Claveau

Authors

Vincent Claveau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
JingTao Yao
School of Information Science and Technology, Southwest Jiaotong University, 610031, Chengdu, P.R. China
Yan Yang
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland
Roman Słowiński
Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco
School of Management and Engineering, Nanjing University, 210093, Nanjing, Jiangsu, P.R. China
Huaxiong Li
Machine Intelligence Unit, Indian Statistical Institute (ISI), 700108, Kolkata, India
Sushmita Mitra
Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Lech Polkowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Claveau, V. (2012). IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-642-32115-3_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics