Abstract
Most of the previous distributional clustering methods are fundamentally unsupervised, and the discriminative property of words is not well modeled in the clustering procedure. In this paper, we propose a supervised model which involves the class conditional probability in measuring the word similarity, and transform the word-set extraction to a supervised graph-partition optimization model. A greedy algorithm is proposed to solve this model, which combines the word selecting method and the word grouping method in the unified framework. By grouping the related words, this method essentially transforms the exact match between word bins to fuzzy match between groups of related-word bins, which to some extent avoid the synonymous problems in BoW model. Experiments on data sets demonstrate that the proposed method is applicable for both text sets and image sets, and has advantages in producing better retrieval precision and meanwhile reducing the lexicon size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yogatama, D., Smith, N.: Making the most of bag of words: sentence regularization with alternating direction method of multipliers. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 656–664 (2014)
Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 809–816 (2011)
Burghouts, G.J., Schutte, K.: Spatio-temporal layout of human actions for improved bag-of-words action detection. Pattern Recogn. Lett. 34(15), 1861–1869 (2013)
Metzler, D.A., Jr.: Beyond bags of words: effectively modeling dependence and features in information retrieval. Dissertation, University of Massachusetts Amherst (2007)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)
Wang, F., Guibas, L.J.: Supervised earth mover’s distance learning and its computer vision applications. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 442–455. Springer, Heidelberg (2012)
Budanitsky, A., Hirst, G.: Evaluating worldnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 72(2), 133–157 (2007)
Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23(3), 447–462 (2011)
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103 (1998)
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–215. ACM (2000)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 183–190 (1993)
Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1–8 (2008)
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2007) pp. 1–8 (2007)
Menéndez-Mora, R.E., Ichise, R.: Effect of semantic differences in wordnet-based similarity measures. In: Garcia-Pedrajas, N., Herrera, F., Fyfe, C., BenÃtez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part II. LNCS, vol. 6097, pp. 545–554. Springer, Heidelberg (2010)
Mojsilović, A., Gomes, J., Rogowitz, B.: Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. Int. J. Comput. Vis. 56(1–2), 79–107 (2004)
Wan, X.: A novel document similarity measure based on earth mover’s distance. Inf. Sci. 177(18), 3718–3730 (2007)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1271–1283 (2010)
Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1243–1256 (2008)
Slonim, N., Friedman, N., Tishby, N.: Agglomerative multivariate information bottleneck. Advances in Neural Information Processing Systems, pp. 929–936 (2001)
Xie, X., Lu, L., Jia, M., Li, H., Seide, F., Ma, W.: Mobile search with multimodal queries. Proc. IEEE 96(4), 589–601 (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Sen, P., Getoor, L.: Link-based classification, University of Maryland Technical report CS-TR-4858 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wen, W., Hao, Z., Cai, R. (2015). Mining the Discriminative Word Sets for Bag-of-Words Model Based on Distributional Similarity Graph. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-28121-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28120-9
Online ISBN: 978-3-319-28121-6
eBook Packages: Computer ScienceComputer Science (R0)