Skip to main content
Log in

Sensor band selection for multispectral imaging via average normalized information

  • Special Issue
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The information-rich scene descriptors created by multispectral sensors can act as a bottleneck in further analysis, e.g., real-time scene capturing. Many of the spectral band selection methods treat the two underlying tasks (feature bands selection and redundancy reduction) in isolation. Furthermore, the majority of the work assumes reflectance data. However, the captured surface radiance varies with scene geometry and illumination. We propose a new band selection method, which uses spectral gradient entropy to choose bands that are more stable to such variations. Equally important, our measurement, the average normalized information (ANI) of a set of selected bands, combines feature band selection and band redundancy together. Since feature stability is an important criterion for band selection in ANI, our method favors features whose probability density can be accurately estimated. As a result, our technique selects the most representative feature bands that can be efficiently used in classification. In our experiments, ANI exhibited comparable performance with mutual information on reflectance data but outperformed mutual information when applied on surface radiance data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Note that \({ \hat {D}'}\) and \({ \hat {D}''}\) may be the estimates for different probability densities.

References

  1. Angelopoulou, E: Objective colour from multispectral imaging. In: Proceedings of 6th European Conference in Computer Vision, pp. 359–374 (2000)

  2. Angelopoulou, E.: Understanding the color of human skin. SPIE Conference on Human Vision and Electronic Imaging VI. SPIE 4299, 243–251 (2001)

    Google Scholar 

  3. Bajcsy, P., Groves, P.: Methodology for hyperspectral band selection. Photogram. Eng. Remote Sens. J. 70, 793–802 (2004)

    Google Scholar 

  4. Bassett, E.M., Shen, S.S.: Information Theory-Based Band Selection for Multispectral Systems. Proc. SPIE 3118, 28–35 (1997)

    Google Scholar 

  5. Belhumeur, PN., Hespanha, JP., Kriegman, DJ.: Eigenfaces versus Fisherfaces: recognition using class specific linear projection. IEEE Trans. PAMI, 19(7), 711–720 (1997)

    Google Scholar 

  6. Chang, CI., Du, Q., Sun, TL., Althouse, LG.: A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 37(6), 2631– 2641 (1999)

    Article  Google Scholar 

  7. Chang, CI., Ren, H., Chiang, SS.: Real-time processing algorithms for target detection and classification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 39, 4 (2001)

    Google Scholar 

  8. Chellapa, R., Wilson, C., Sirohey, S.: Human and machine recognition of faces: a survey. Proc. IEEE 85(5), 705–740 (1995)

    Article  Google Scholar 

  9. Du, H., Qi, H., Wang, X., Ramanath, R., Snyder, WE.: Band selection using independent component analysis for hyperspectral image processing. In: Proceedings AIPR workshop, pp. 93–98 (2003)

  10. Gat, N.: Imaging spectroscopy using tunable filters: a review. Proc. SPIE 4056, 50–64 (2000)

    Google Scholar 

  11. Healey, G., Slater, D.: Invariant recognition in hyperspectral images. In: Proceeding IEEE Conference on Computer Vision and Pattern Recognition, pp. 438–443 (1999)

  12. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)

    Article  Google Scholar 

  13. Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. PAMI 19(2), 153–158 (1997)

    Google Scholar 

  14. Koller, D., Sahami, M.: Towards optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, pp. 284–292 (1996)

  15. Lennon, M., Mercier, G., Mouchot, MC., Hubert-Moy, L.: Independent component analysis as a tool for the dimensionality reduction and the representation of hyperspectral images. In: International Geoscience and Remote Sensing Symposium (IGARSS) (2001)

  16. Papoulis, A.: Probability, Random Variables, and Stochastic Process, 3rd edn. McGraw-Hill, New York (1991)

    Google Scholar 

  17. Parkkinen, J., Oja, E., Jääskeläinen, T.: Color analysis by learning subspaces and optical processing, In: Proceedings of the International Conference on Neural Networks, San Diego, USA, vol. 2, pp. 421–427 July 24–27 (1988)

  18. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  19. Rennich, B.D.: Active multispectral band selection and reflectance measurement system. Master Thesis, Air Force Institute of Technology (1999)

  20. Richards, A.: Alien Vision. SPIE Press, Bellingham (2001)

  21. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  22. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. Journal 27:379–423 and pp. 623–656 (1948)

    Google Scholar 

  23. Shi, D., Angelopoulou, E.: Dimensionality reduction for multispectral skin data. Institute of Technology Technical Report CS-2004-9 (2004)

  24. Slater, D., Healey, G.: Physics-based model acquisition and identification in airborne spectral images. In: International conference on computer vision, pp. 257–262 (2001)

  25. Soriano, M., Marszalec, E., Pietikäinen, M.: Color correction of face images under different illuminants by RGB Eigenfaces. In: Proceedings of 2nd audio- and video-based biometric person authentication conference (AVBPA99), Washington DC USA 148–153 (1999)

  26. Sotoca, J.M., Pla, F., Klaren, A.C.: Unsupervised band selection for multispectral images using information theory. In: International conference on pattern recognition (2004)

  27. Tenenbaum, J.B., Silva, V., Langford, J.C.: A global framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  28. Turk, M.A., Pentland, A.P.: Face recognition using Eigenfaces. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591 (1991)

  29. Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: International Conference on Computer vision, pp. 281–288 (2003)

  30. Withagen, PJ., Breejen, E., Franken, EM., de Jong, AN., Winkel, H.: Band Selection From a hyperspectral data-cube for a real-time multispectral 3CCD camera. In: Proceedings of SPIE AeroSense, Algorithms for Multi-Hyper, and Ultraspectral Imagery VIIs, Florida, April 16–20 (2001)

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under Grant number IIS-0133549.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Appendices

Appendix A

1.1 Properties of ANI attributed to joint entropy

Since ANI is based on joint entropy, we expect it to be more robust to inaccuracies in the underlying probability density function estimation. As we will prove, smaller joint entropies are associated with more accurate probability density estimates. ANI, by favoring smaller joint entropies, is using feature bands with more accurate probability density estimations.

More explicitly, let X i be random samples, which are independent and identically distributed from a probability density. In the discrete case, let C p be the total number of states that X has. The probability density can be denoted by P(X is located in the ith state) = p i . Let D be a random vector \(D = \{ d_{1} ,d_{2} , \ldots ,d_{{c^{p} }} \} \) where d i is a random variable that has value 1 with probability p i and has value 0 with probability 1 − p i . For any sample from D, there is one and only one d i  = 1, i.e., \({{\sum\nolimits_{i = 1}^{C^{p}} {d_{i}}} = 1}.\) D is generated from a sample in the following way: d i =  1 if the sample is located in the ith state and all other d j =  0, for j ≠ i. Thus, \({E(d_{i})=p_{i}}.\) Since E(D)=\({{\left\{{E{\left( {d_{1}} \right)},E{\left( {d_{2}} \right)},\ldots,E{\left( {d_{{C^{p}}}} \right)}} \right\}}={\left\{{p_{1}, p_{2}, \ldots,p_{C^{p}}} \right\}}, E(D)}\) is the real probability density and D is an estimate of the probability density from one sample. Consider N samples. Let D n be the random vector generated from the nth sample, where n  =  1,2,...,N. The estimated probability density from N samples is \({ \hat {D} = {{\sum\nolimits_{n = 1}^N {D_{n}}}} \mathord{\left/ {\vphantom {{{\sum\limits_{n = 1}^N {D_{n}}}} N}} \right. \kern-\nulldelimiterspace} N}.\)

Since \({{\left\| {\hat {D} - E{\left( D \right)}} \right\|} \le {\sum\nolimits_{n = 1}^N {{\left\| {D_{i} /N - E{\left( D \right)}/N} \right\|}}}},\) we have:

$$ \begin{aligned} P{\left( {{\left\| {\hat {D} - E{\left( D \right)}} \right\|} \ge \varepsilon} \right)} & \le P{\left( {{\sum\limits_{n = 1}^N {{\left\| {D_{n} /N - E{\left( D \right)}/N} \right\|}}} \ge \varepsilon} \right)}\\ &= P{\left( {{\sum\limits_{n = 1}^N {{\left\| {D - E{\left( D \right)}} \right\|}/N}} \ge \varepsilon} \right)}\\ &= P{\left( {{\left\| {D - E{\left( D \right)}} \right\|} \ge \varepsilon} \right)} (\varepsilon > 0) \end{aligned} $$
(13)

We define the error of the estimated probability density, \({ \hat {D}},\) as:

$$A{\left( {\hat{D}} \right)} = {\int\limits_X {{\left| {P{\left( {x|E{\left( D \right)}} \right)} - P{\left( {x|\hat {D}} \right)}} \right|}}}{\rm d}x$$
(14)

where x is a state of the class and X is the support of x. Since the integral is over the whole support, (14) is:

$$A{\left( {\hat {D}} \right)} = {\left\| {E{\left( D \right)} - \hat {D}} \right\|}$$

By (13) and Chebyshev’s inequality, for any positive integer k and any positive real ε, we can evaluate the error of the estimated probability density by:

$$\begin{aligned} P{\left( {{\left\| {\hat {D} - E{\left( D \right)}} \right\|} \ge \varepsilon} \right)}& \le P{\left( {{\left\| {D - E{\left( D \right)}} \right\|} \ge \varepsilon} \right)}\\ & \le \varepsilon ^{{- k}} E({\left\| {D - E{\left( D \right)}} \right\|}^{k})\\ & \approx \varepsilon ^{{- k}} {\sum\limits_{i = 1}^{C^{p}} {\frac{{N_{i}}}{N}{\left| {\frac{{N - N_{i}}}{N} + {\sum\limits_{j \ne i} {\frac{{N_{j}}}{N}}}} \right|}^{k}}}\\ & = \varepsilon ^{{- k}} {\sum\limits_{i = 1}^{C^{p}} {\frac{{N_{i}}}{N}{\left( {2\frac{{N - N_{i}}}{N}} \right)}^{k}}}\end{aligned}$$
(15)

where N i is the number of samples among the N samples that are located in the Ith state. The approximation is made by using \({ \hat {D}}\) to approximate E(D).

Definition Let \({ \hat {D}'}\) and \({ \hat {D}''}\) be two estimated probability densities. Footnote 1 \({ \hat {D}'}\) is more accurate than \({\hat {D}''}\) iff for any positive integer k and real ε, the upper bound for \({ \hat {D}'}\) given in (15) is less than the upper bound for \({ \hat {D}''}.\)

Note that when k = 2, by the weak law of large numbers, we can get a better upper bound than the one given in (15):

$$P{\left( {{\left\| {\hat {D} - E{\left( D \right)}} \right\|} \ge \varepsilon} \right)} \le \frac{{E{\left( {{\left\| {D - E{\left( D \right)}} \right\|}^{2}} \right)}}}{{N\varepsilon ^{2}}}$$
(16)

If \({ \hat {D}'}\) is more accurate than \({ \hat {D}''},\) according to the above definition \({ \hat {D}'}\) also has a smaller upper bound measured in (16) than \({ \hat {D}''}.\) Also by the above definition and (15), if \({ \hat {D}'}\) is more accurate than \({ \hat {D}''},\) for any positive integer k, we have:

$${\sum\limits_{i = 1}^{C^{{p'}}} {\frac{{N_{i} '}}{N}{\left( {\frac{{N - N_{i} '}}{N}} \right)}^{k}}} < {\sum\limits_{i = 1}^{C^{{p''}}} {\frac{{N_{i} ''}}{N}{\left( {\frac{{N - N_{i} ''}}{N}} \right)}^{k}}}$$
(17)

Theorem Let \({ \hat {D}'}\) and \({ \hat {D}''}\) be two estimated probability densities. Let \({H{\left( {\hat {D}'} \right)}}\) and \({H{\left( {\hat {D}''} \right)}}\) be their entropies. If \({\hat {D}'}\) is more accurate than \({\hat {D}''},\) then \({H{\left( {\hat {D}'} \right)} < H{\left( {\hat {D}''} \right)}}.\) Proof By the Taylor expansion, \({\ln {\left( x \right)} = {\sum\nolimits_{i = 1}^\infty {{\left( {- 1} \right)}^{{i - 1}} i^{{- 1}} {\left( {x - 1} \right)}^{i}}}},\) and (14), we have:

$$\begin{aligned} H{\left( {\hat {D}'} \right)} &= - {\sum\limits_{k = 1}^{C^{{p'}}} {p'_{k} \ln {\left( {p'_{k}} \right)}}}\\ & = - {\sum\limits_{k = 1}^{C^{{p'}}} {\frac{{N_{k} '}}{N}\ln {\left( {\frac{{N_{k} '}}{N}} \right)}}}\\ &= {\sum\limits_{k = 1}^{C^{{p'}}} {\frac{{N_{k} '}}{N}{\sum\limits_{i = 1}^\infty {i^{{- 1}} {\left( {1 - \frac{{N_{k} '}}{N}} \right)}^{i}}}}}\\ & = {\sum\limits_{i = 1}^\infty {i^{{- 1}} {\sum\limits_{k = 1}^{C^{{p'}}} {\frac{{N_{k} '}}{N}{\left( {\frac{{N - N_{k} '}}{N}} \right)}^{i}}}}}\\ & < {\sum\limits_{i = 1}^\infty {i^{{- 1}} {\sum\limits_{k = 1}^{C^{{p''}}} {\frac{{N_{k} ''}}{N}{\left( {\frac{{N - N_{k} ''}}{N}} \right)}^{i}}}}}\\ & = H{\left( {\hat {D}''} \right)}\end{aligned}$$

In other words, the easier a distribution can be estimated, the smaller amount of information it should contain. Given two feature sets of equal appearance entropy, ANI always prefers the feature set that has smaller joint entropy value. Thus, it also prefers the feature set whose joint probability density can be estimated with higher accuracy. This property is especially important when the size of the training samples is small. When the training sample is small, the estimated joint probability density usually exhibits a big deviation from the true density and the classification error due to the probability density estimation error may be dominant. Selecting the feature set with small joint entropy alleviates this effect.

Appendix B

1.1 Additional classification tests

We performed a second classification test, where we tested the Oulu human skin data against paper data (http://spectral.joensu.fi/databases/download/paper.htm) from the University of Joensuu (Fig. 7a). The paper database contains two parts: (1) thin paper data, 216 samples; (2) cardboard data 494 samples. All paper data is in the 400–700 nm range at 10 nm spectral resolution. The different types and colors of paper resulted in greatly varying spectral plots.

Fig. 7
figure 7

a Reflectance spectra of paper samples from the Joensuu database; b classification accuracy on spectra of human skin versus papers

We used all 345 Oulu skin samples for band selection. All further processing was performed on only the selected skin feature bands. For the classification test, we used 216 skin training samples and 216 paper training samples. The spectral derivatives at the pre-selected feature bands were computed for each of the 512 training samples. These samples were then used to train a three-layer feedforward neural network. After training, classification was performed on the remaining human skin and paper samples. To avoid randomness caused by the initialization of the neural net, training and classification were repeated 50 times. The average performance over the 50 tests is shown in Fig. 7b.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Angelopoulou, E. Sensor band selection for multispectral imaging via average normalized information. J Real-Time Image Proc 1, 109–121 (2006). https://doi.org/10.1007/s11554-006-0014-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-006-0014-9

Keywords

Navigation