Visual Dictionary Pruning Using Mutual Information and Information Gain

Artiemjew, Piotr; Górecki, Przemysław

doi:10.1007/978-3-319-07176-3_1

Visual Dictionary Pruning Using Mutual Information and Information Gain

Piotr Artiemjew²⁴ &
Przemysław Górecki²⁴

Conference paper

2166 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8468))

Abstract

Feature selection methods are often applied to many machine learning problems, one of the applications involves selecting most informative Visual Words for image categorization task. In Bag of Visual Words framework, image is represented as vector of frequencies of Visual Words, typically of length from hundreds to thousands elements. A dictionary of Visual Words is produced from image keypoints detected by SIFT algorithm and quantized into words by k-means clustering. In the paper we use Mutual Information and Information Gain as methods for selecting these words that are the most important for efficient image classification. There are four novel methods, which expand use of classic Mutual Information and Information Gain in line with our previous feature selection methods. We consider two basic selection strategies: one-vs-all and one-vs-one, as well as multi class and multi attribute value problems. The experimental session we have conducted has shown a positive effect of our modification, when applied to image classification by Support Vector Machines. The results showed that visual word selection based on modified Mutual Information in most cases wins over methods based on Information Gain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Artiemjew, P.: Classifiers based on rough mereology in analysis of dna microarray data. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 273–278 (December 2010)
Google Scholar
Artiemjew, P.: The extraction method of DNA microarray features based on experimental A statistics. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 642–648. Springer, Heidelberg (2011)
Chapter Google Scholar
Artiemjew, P.: Review of the extraction methods of dna microarray features based on central decision class separation vs rough set classifier. Foundations of Computing and Decision Sciences 37, 239–252 (2012)
Article Google Scholar
Selvadoss Thanamani, A., Azhagusundari, B.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (2013)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. ECCV, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Appavu, S., Rajaram, R., Nagammai, M., Priyanga, N., Priyanka, S.: Bayes theorem and information gain based feature selection for maximizing the performance of classifiers. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds.) CCSIT 2011, Part I. CCIS, vol. 131, pp. 501–511. Springer, Heidelberg (2011)
Chapter Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: Binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010)
Chapter Google Scholar
Everingham et al: The PASCAL Visual Object Classes Challenge (2010), http://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html
Jiang, et al.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12(1), 42–53 (2010)
Article Google Scholar
Nilsback, et al.: A visual vocabulary for flower classification. In: Proc. of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1447–1454. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Novovičová, J., Somol, P., Haindl, M., Pudil, P.: Conditional mutual information based feature selection for classification task. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 417–426. Springer, Heidelberg (2007)
Chapter Google Scholar
Parkhi, et al.: Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3498–3505 (2012)
Google Scholar
Philbin, J., et al.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2007)
Google Scholar
Mukras, R., et al.: Information gain feature selection for ordinal text classification using probability re-distribution
Google Scholar
Rublee, A., et al.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, Barcelona (2011)
Google Scholar
Yan, X., et al.: A study on mutual information-based feature selection for text categorization. Journal of Computational Information Systems 3(3), 1007–1012 (2007)
Google Scholar
Yang, et al.: Evaluating bag-of-visual-words representations in scene classification. In: Proc. of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 197–206. ACM, New York (2007)
Chapter Google Scholar
evalYang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Gorecki, P., Artiemjew, P.: Dna microarray classification by means of weighted voting based on rough set classifier. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 269–272 (December 2010)
Google Scholar
Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Shoes-dataset, http://wmii.uwm.edu.pl/~kmmi/sites/default/files/grant/shoes200.zip
Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Visual words selection based on class separation measures. In: 2013 12th IEEE International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), pp. 409–414 (2013)
Google Scholar
Górecki, P., Sopyła, K., Drozda, P.: Ranking by K-means voting algorithm for similar image retrieval. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 509–517. Springer, Heidelberg (2012)
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)
Article Google Scholar
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proc. of the Indian Conference on Computer Vision, Graphics and Image Processing (December 2008)
Google Scholar
Quinlan, J.R.: Programs for machine learning. Morgan Kaufmann Publishers (1993)
Google Scholar
VOC. 250 words dictionary size, http://213.184.8.16/~artem/voc2006normobj250.zip
Zhang, J., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision 73 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Warmia and Mazury Olsztyn, Poland
Piotr Artiemjew & Przemysław Górecki

Authors

Piotr Artiemjew
View author publications
You can also search for this author in PubMed Google Scholar
Przemysław Górecki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski & Rafał Scherer &
Częstochowa University of Technology, 42-200, Częstochowa, Poland
Marcin Korytkowski
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Artiemjew, P., Górecki, P. (2014). Visual Dictionary Pruning Using Mutual Information and Information Gain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-07176-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07175-6
Online ISBN: 978-3-319-07176-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics