Skip to main content

Vocabulary Reduction in BoW Representing by Topic Modeling

  • Conference paper
Pattern Recognition and Image Analysis (IbPRIA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7887))

Included in the following conference series:

  • 1868 Accesses

Abstract

In this work, a new approach to vocabulary reduction is presented. It is based on filtering words in the topic feature space instead of directly in the original word space. The main goal is to analyze the differences between the application of the Cumulative Count-based word filter (f cc ) in word feature space (BoW: Bag of Words) with respect to its application in topic descriptions (obtained by LDA: Latent Dirichlet Allocation). Three well-known text datasets (Reuters, WebKB and NewsGroup) have been used to show the performance of the proposed approach.

This work was partially supported by FPU-AP-2009-4435 from the Spanish Ministry of Education, PROMETEO/2010/028 project from Generalitat Valenciana and P1-1B2010-27 project from the Plan de Promoció de la Investigació UJI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: IEEE Computer Vision and Pattern Recognition, pp. 524–531 (2005)

    Google Scholar 

  2. Sivic, J.: Efficient visual search of videos cast as text retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(4), 591–605 (2009)

    Article  Google Scholar 

  3. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman Hall/CRC (2007)

    Google Scholar 

  4. Blei, D.: Probabilistic topic models. Communications of the ACM 55(4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  5. Blei, D., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Brants, T., Chen, F., Tsochantaridis, I.: Topic-based document segmentation with probabilistic latent semantic analysis. In: International Conference on Information and Knowledge Management (CIKM), McLean, VA, pp. 211–218 (2002)

    Google Scholar 

  7. Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: 11th ACM International Conference on Multimedia, pp. 275–278. ACM, New York (2003)

    Google Scholar 

  8. Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Farrahi, K., Gatica-Perez, D.: Discovering Routines from Large-Scale Human Locations using Probabilistic Topic Models. ACM Transactions on Intelligent Systems and Technology, Special Issue on Activity Recognition 2(1) (2011)

    Google Scholar 

  10. Montoliu, R.: Discovering mobility patterns on bicycle-based public transportation system by using probabilistic topic models. In: Novais, P., Hallenborg, K., Tapia, D.I., Rodríguez, J.M.C. (eds.) Ambient Intelligence - Software and Applications. AISC, vol. 153, pp. 145–153. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T.: A Thousand Words in a Scene. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(9), 1575–1589 (2007)

    Article  Google Scholar 

  12. Cardoso-Cachopo, A., Oliveira, A.: Combining LSI with other Classifiers to Improve Accuracy of Single-label Text Categorization. In: First European Workshop on Latent Semantic Analysis in Technology Enhanced Learning (2007)

    Google Scholar 

  13. Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann (1997)

    Google Scholar 

  14. van Rijsbergen, C.J., Robertson, S.E., Porter, M.F.: New models in probabilistic information retrieval. British Library, London (1980) (British Library Research and Development Report, no. 5587)

    Google Scholar 

  15. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based. Learning Methods, 1st edn. Cambridge University Press (2000)

    Google Scholar 

  16. Hsu, C.-W., Lin, C.-J.: A comparison of methods for multi-class support vector machines. IEEE Trans. on Neural Networks 13, 415–425 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fernández-Beltran, R., Montoliu, R., Pla, F. (2013). Vocabulary Reduction in BoW Representing by Topic Modeling. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds) Pattern Recognition and Image Analysis. IbPRIA 2013. Lecture Notes in Computer Science, vol 7887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38628-2_77

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38628-2_77

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38627-5

  • Online ISBN: 978-3-642-38628-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics