Skip to main content

IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization

  • Conference paper
Rough Sets and Current Trends in Computing (RSCTC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

Abstract

In this article, we report on our participation in the JRS Data-Mining Challenge. The approach used by our system is a lazy-learning one, based on a simple k-nearest-neighbors technique. We more specifically addressed this challenge as an opportunity to test Information Retrieval (IR) inspired techniques in such a data-mining framework. In particular, we tested different similarity measures, including one called vectorization that we have proposed and tested in IR and Natural Language Processing frameworks. The resulting system is simple and efficient while offering good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Berry, M., Martin, D.: Principal component analysis for information retrieval. In: Kontoghiorghes, E. (ed.) Handbook of Parallel Computing and Statistics. Statistics: A Series of Textbooks and Monographs (2005)

    Google Scholar 

  2. Bourgain, J.: On Lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52(1) (1985)

    Google Scholar 

  3. Claveau, V., Lefvre, S.: Topic segmentation of tv-streams by mathematical morphology and vectorization. In: Procedings of the InterSpeech Conference, Florence, Italy (2011)

    Google Scholar 

  4. Claveau, V., Tavenard, R., Amsaleg, L.: Vectorisation des processus d’appariement document-requête. In: 7e Conférence en Recherche d’informations et Applications, CORIA 2010, Sousse, Tunisie, pp. 313–324 (March 2010)

    Google Scholar 

  5. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA (2004)

    Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)

    MATH  Google Scholar 

  7. Dumais, S.: Latent semantic analysis. ARIST Review of Information Science and Technology 38(4) (2004)

    Google Scholar 

  8. Fox, E., Shaw, J.: Combination of multiple searches. In: Proceedings of the 2nd Text Retrieval Conference (TREC-2), pp. 243–252. NIST Special Publication (1994)

    Google Scholar 

  9. Harter, S.: A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science 26(6), 197–206 (1975)

    Article  Google Scholar 

  10. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of SIGIR, Berkeley, USA (1999)

    Google Scholar 

  11. Lee, J.: Combining multiple evidence from different properties of weighting schemes. In: Proceedings of the 18th Annual ACM-SIGIR, pp. 180–188 (1995)

    Google Scholar 

  12. Lejsek, H., Asmundsson, F., Jónsson, B., Amsaleg, L.: Nv-tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. on Pattern Analysis and Machine Intelligence 99(1) (2008)

    Google Scholar 

  13. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal on Research and Development 2(2) (1958)

    Google Scholar 

  14. Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1) (1972)

    Google Scholar 

  15. Spärck Jones, K., Walker, S.G., Robertson, S.E.: Probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management 36(6) (2000)

    Google Scholar 

  16. Stein, B.: Principles of hash-based text retrieval. In: Proc. of SIGIR, Amsterdam, Pays-Bas (2007)

    Google Scholar 

  17. Vempala, S.: The Random Projection Method. In: Discrete Mathematics and Theoretical Computer Science, vol. 65. AMS (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Claveau, V. (2012). IRISA Participation in JRS 2012 Data-Mining Challenge: Lazy-Learning with Vectorization. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32115-3_53

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32114-6

  • Online ISBN: 978-3-642-32115-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics