Skip to main content

Mining the Discriminative Word Sets for Bag-of-Words Model Based on Distributional Similarity Graph

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9461))

Included in the following conference series:

  • 637 Accesses

Abstract

Most of the previous distributional clustering methods are fundamentally unsupervised, and the discriminative property of words is not well modeled in the clustering procedure. In this paper, we propose a supervised model which involves the class conditional probability in measuring the word similarity, and transform the word-set extraction to a supervised graph-partition optimization model. A greedy algorithm is proposed to solve this model, which combines the word selecting method and the word grouping method in the unified framework. By grouping the related words, this method essentially transforms the exact match between word bins to fuzzy match between groups of related-word bins, which to some extent avoid the synonymous problems in BoW model. Experiments on data sets demonstrate that the proposed method is applicable for both text sets and image sets, and has advantages in producing better retrieval precision and meanwhile reducing the lexicon size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://research.microsoft.com/en-us/people/xingx/pi100.aspx.

  2. 2.

    http://www.cs.umd.edu/~sen/lbc-proj/LBC.html.

References

  1. Yogatama, D., Smith, N.: Making the most of bag of words: sentence regularization with alternating direction method of multipliers. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 656–664 (2014)

    Google Scholar 

  2. Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 809–816 (2011)

    Google Scholar 

  3. Burghouts, G.J., Schutte, K.: Spatio-temporal layout of human actions for improved bag-of-words action detection. Pattern Recogn. Lett. 34(15), 1861–1869 (2013)

    Article  Google Scholar 

  4. Metzler, D.A., Jr.: Beyond bags of words: effectively modeling dependence and features in information retrieval. Dissertation, University of Massachusetts Amherst (2007)

    Google Scholar 

  5. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)

    Google Scholar 

  6. Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)

    Google Scholar 

  7. Wang, F., Guibas, L.J.: Supervised earth mover’s distance learning and its computer vision applications. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 442–455. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Budanitsky, A., Hirst, G.: Evaluating worldnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  9. Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 72(2), 133–157 (2007)

    Article  Google Scholar 

  10. Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23(3), 447–462 (2011)

    Article  Google Scholar 

  11. Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103 (1998)

    Google Scholar 

  12. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–215. ACM (2000)

    Google Scholar 

  13. Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 183–190 (1993)

    Google Scholar 

  14. Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1–8 (2008)

    Google Scholar 

  15. Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2007) pp. 1–8 (2007)

    Google Scholar 

  16. Menéndez-Mora, R.E., Ichise, R.: Effect of semantic differences in wordnet-based similarity measures. In: Garcia-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part II. LNCS, vol. 6097, pp. 545–554. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Mojsilović, A., Gomes, J., Rogowitz, B.: Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. Int. J. Comput. Vis. 56(1–2), 79–107 (2004)

    Article  Google Scholar 

  18. Wan, X.: A novel document similarity measure based on earth mover’s distance. Inf. Sci. 177(18), 3718–3730 (2007)

    Article  Google Scholar 

  19. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)

    Article  MATH  Google Scholar 

  20. Van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1271–1283 (2010)

    Article  Google Scholar 

  21. Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1243–1256 (2008)

    Article  Google Scholar 

  22. Slonim, N., Friedman, N., Tishby, N.: Agglomerative multivariate information bottleneck. Advances in Neural Information Processing Systems, pp. 929–936 (2001)

    Google Scholar 

  23. Xie, X., Lu, L., Jia, M., Li, H., Seide, F., Ma, W.: Mobile search with multimodal queries. Proc. IEEE 96(4), 589–601 (2008)

    Article  Google Scholar 

  24. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  25. Sen, P., Getoor, L.: Link-based classification, University of Maryland Technical report CS-TR-4858 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Wen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wen, W., Hao, Z., Cai, R. (2015). Mining the Discriminative Word Sets for Bag-of-Words Model Based on Distributional Similarity Graph. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28121-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28120-9

  • Online ISBN: 978-3-319-28121-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics