Abstract
Feature selection methods are often applied to many machine learning problems, one of the applications involves selecting most informative Visual Words for image categorization task. In Bag of Visual Words framework, image is represented as vector of frequencies of Visual Words, typically of length from hundreds to thousands elements. A dictionary of Visual Words is produced from image keypoints detected by SIFT algorithm and quantized into words by k-means clustering. In the paper we use Mutual Information and Information Gain as methods for selecting these words that are the most important for efficient image classification. There are four novel methods, which expand use of classic Mutual Information and Information Gain in line with our previous feature selection methods. We consider two basic selection strategies: one-vs-all and one-vs-one, as well as multi class and multi attribute value problems. The experimental session we have conducted has shown a positive effect of our modification, when applied to image classification by Support Vector Machines. The results showed that visual word selection based on modified Mutual Information in most cases wins over methods based on Information Gain.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Artiemjew, P.: Classifiers based on rough mereology in analysis of dna microarray data. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 273–278 (December 2010)
Artiemjew, P.: The extraction method of DNA microarray features based on experimental A statistics. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 642–648. Springer, Heidelberg (2011)
Artiemjew, P.: Review of the extraction methods of dna microarray features based on central decision class separation vs rough set classifier. Foundations of Computing and Decision Sciences 37, 239–252 (2012)
Selvadoss Thanamani, A., Azhagusundari, B.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (2013)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. ECCV, pp. 404–417. Springer, Heidelberg (2006)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Appavu, S., Rajaram, R., Nagammai, M., Priyanga, N., Priyanka, S.: Bayes theorem and information gain based feature selection for maximizing the performance of classifiers. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds.) CCSIT 2011, Part I. CCIS, vol. 131, pp. 501–511. Springer, Heidelberg (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: Binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010)
Everingham et al: The PASCAL Visual Object Classes Challenge (2010), http://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html
Jiang, et al.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12(1), 42–53 (2010)
Nilsback, et al.: A visual vocabulary for flower classification. In: Proc. of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1447–1454. IEEE Computer Society, Washington, DC (2006)
Novovičová, J., Somol, P., Haindl, M., Pudil, P.: Conditional mutual information based feature selection for classification task. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 417–426. Springer, Heidelberg (2007)
Parkhi, et al.: Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3498–3505 (2012)
Philbin, J., et al.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2007)
Mukras, R., et al.: Information gain feature selection for ordinal text classification using probability re-distribution
Rublee, A., et al.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, Barcelona (2011)
Yan, X., et al.: A study on mutual information-based feature selection for text categorization. Journal of Computational Information Systems 3(3), 1007–1012 (2007)
Yang, et al.: Evaluating bag-of-visual-words representations in scene classification. In: Proc. of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 197–206. ACM, New York (2007)
evalYang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Gorecki, P., Artiemjew, P.: Dna microarray classification by means of weighted voting based on rough set classifier. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 269–272 (December 2010)
Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Shoes-dataset, http://wmii.uwm.edu.pl/~kmmi/sites/default/files/grant/shoes200.zip
Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Visual words selection based on class separation measures. In: 2013 12th IEEE International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), pp. 409–414 (2013)
Górecki, P., Sopyła, K., Drozda, P.: Ranking by K-means voting algorithm for similar image retrieval. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 509–517. Springer, Heidelberg (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proc. of the Indian Conference on Computer Vision, Graphics and Image Processing (December 2008)
Quinlan, J.R.: Programs for machine learning. Morgan Kaufmann Publishers (1993)
VOC. 250 words dictionary size, http://213.184.8.16/~artem/voc2006normobj250.zip
Zhang, J., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision 73 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Artiemjew, P., Górecki, P. (2014). Visual Dictionary Pruning Using Mutual Information and Information Gain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-07176-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07175-6
Online ISBN: 978-3-319-07176-3
eBook Packages: Computer ScienceComputer Science (R0)