Skip to main content

Visual Dictionary Pruning Using Mutual Information and Information Gain

  • Conference paper
  • 2166 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8468))

Abstract

Feature selection methods are often applied to many machine learning problems, one of the applications involves selecting most informative Visual Words for image categorization task. In Bag of Visual Words framework, image is represented as vector of frequencies of Visual Words, typically of length from hundreds to thousands elements. A dictionary of Visual Words is produced from image keypoints detected by SIFT algorithm and quantized into words by k-means clustering. In the paper we use Mutual Information and Information Gain as methods for selecting these words that are the most important for efficient image classification. There are four novel methods, which expand use of classic Mutual Information and Information Gain in line with our previous feature selection methods. We consider two basic selection strategies: one-vs-all and one-vs-one, as well as multi class and multi attribute value problems. The experimental session we have conducted has shown a positive effect of our modification, when applied to image classification by Support Vector Machines. The results showed that visual word selection based on modified Mutual Information in most cases wins over methods based on Information Gain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artiemjew, P.: Classifiers based on rough mereology in analysis of dna microarray data. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 273–278 (December 2010)

    Google Scholar 

  2. Artiemjew, P.: The extraction method of DNA microarray features based on experimental A statistics. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 642–648. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Artiemjew, P.: Review of the extraction methods of dna microarray features based on central decision class separation vs rough set classifier. Foundations of Computing and Decision Sciences 37, 239–252 (2012)

    Article  Google Scholar 

  4. Selvadoss Thanamani, A., Azhagusundari, B.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (2013)

    Google Scholar 

  5. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. ECCV, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Appavu, S., Rajaram, R., Nagammai, M., Priyanga, N., Priyanka, S.: Bayes theorem and information gain based feature selection for maximizing the performance of classifiers. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds.) CCSIT 2011, Part I. CCIS, vol. 131, pp. 501–511. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: Binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Everingham et al: The PASCAL Visual Object Classes Challenge (2010), http://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html

  10. Jiang, et al.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12(1), 42–53 (2010)

    Article  Google Scholar 

  11. Nilsback, et al.: A visual vocabulary for flower classification. In: Proc. of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1447–1454. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  12. Novovičová, J., Somol, P., Haindl, M., Pudil, P.: Conditional mutual information based feature selection for classification task. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 417–426. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Parkhi, et al.: Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3498–3505 (2012)

    Google Scholar 

  14. Philbin, J., et al.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  15. Mukras, R., et al.: Information gain feature selection for ordinal text classification using probability re-distribution

    Google Scholar 

  16. Rublee, A., et al.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, Barcelona (2011)

    Google Scholar 

  17. Yan, X., et al.: A study on mutual information-based feature selection for text categorization. Journal of Computational Information Systems 3(3), 1007–1012 (2007)

    Google Scholar 

  18. Yang, et al.: Evaluating bag-of-visual-words representations in scene classification. In: Proc. of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 197–206. ACM, New York (2007)

    Chapter  Google Scholar 

  19. evalYang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  20. Gorecki, P., Artiemjew, P.: Dna microarray classification by means of weighted voting based on rough set classifier. In: 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 269–272 (December 2010)

    Google Scholar 

  21. Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Shoes-dataset, http://wmii.uwm.edu.pl/~kmmi/sites/default/files/grant/shoes200.zip

  22. Gorecki, P., Artiemjew, P., Drozda, P., Sopyla, K.: Visual words selection based on class separation measures. In: 2013 12th IEEE International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), pp. 409–414 (2013)

    Google Scholar 

  23. Górecki, P., Sopyła, K., Drozda, P.: Ranking by K-means voting algorithm for similar image retrieval. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 509–517. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)

    Article  Google Scholar 

  25. Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proc. of the Indian Conference on Computer Vision, Graphics and Image Processing (December 2008)

    Google Scholar 

  26. Quinlan, J.R.: Programs for machine learning. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  27. VOC. 250 words dictionary size, http://213.184.8.16/~artem/voc2006normobj250.zip

  28. Zhang, J., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision 73 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Artiemjew, P., Górecki, P. (2014). Visual Dictionary Pruning Using Mutual Information and Information Gain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07176-3_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07175-6

  • Online ISBN: 978-3-319-07176-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics