skip to main content
10.1145/1601966.1601983acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Supervised clustering via principal component analysis in a retrieval application

Published: 28 June 2009 Publication History

Abstract

In regression problems where the number of predictors exceeds the number of observations and the correlation between the predictors is high, a dimensionality reduction or a variable selection approach is demanded. In this paper we deal with a real application where we want to retrieve the physical characteristics of a combustion process from the measurements obtained with a spectroscopic sensor. This application shows up a multicollinearity problem but furthermore it is considered an ill-posed problem.
Guided by this application scenario, we propose a clustering approach to find out homogeneous subsets of data which are embedded in arbitrary oriented linear manifold. This model is developed under certain assumptions guided by a priori problem knowledge. The resulting division preserves both, the priori assumptions and the homogeneity in the models. Thereby we break the whole problem in n subproblems improving its individual prediction accuracy versus a global solution. We show the obtained improvements in a real application scenario related with estimating the temperature from spectroscopic data in a remote sensing framework.

References

[1]
E. Bair, T. Hastie, D. Paul, and R. Tibshirani. Prediction by supervised principal components. J. Am. Stat. Assoc., 101:119--137, 2006.
[2]
H. L. De Veaux RD. Multicollinearity: a tale of two nonparametric regressions. Selecting Models from Data: AI and Statistics IV, (ed.Cheeseman P, Oldford R. W.), pages 293--302, 1994.
[3]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.
[4]
R. D. Fierro, G. H. Golub, P. C. Hansen, and D. P. O'Leary. Regularization by truncated total least squares. SIAM Journal on Scientific Computing, 18(4):1223--1241, 1997.
[5]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1990.
[6]
E. Garcia-Cuesta, F. de la Torre, and A. J. de Castro. Advances in computational algorithms and data analysis: Machine learning approaches for the inversion of the radiative transfer equation. Lectures Notes in Electrical Engineering, 14:319--333, 2008.
[7]
E. Garcia-Cuesta, I. M. Galvan, and A. J. de Castro. Multilayer perception as inverse model in a ground-based remote sensing temperature retrieval problem. J. Eng. Appl. Artif. Intell., 21(1):26--34, February 2008.
[8]
R. M. Goody and Y. Yung. Atmospheric Radiation. Theoretical basis (Chap. 2).
[9]
P. D. Groen. An introduction to total least squares. Nieuw Archief Voor Wiskunde, 2:237--254, 1996.
[10]
T. Hastie, J. Taylor, R. Tibshirani, and G. Walther. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics, pages 1--29, 2007.
[11]
A. E. Hoerl and R. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:55--67, 1970.
[12]
I. T. Jollife. Principal Component Analysis (2nd Ed.). Springer Series in Statistics Springer-Verlag (Chap. 8), New York, 2002.
[13]
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 97(1--2):273--324, 1997.
[14]
L. H. Liu and J. Jiang. Inverse radiation problem for reconstruction of temperature profile in axisymmetric free flames. J. Quant. Spectrosc. Radit. Transfer, 70:207--215, 2001.
[15]
G. Lu, Y. Yan, and M. Colechin. A digital imaging based multifuncional flame monitoring system. IEEE T. Instrum. Meas., 53:1152--1158, 2004.
[16]
I. Markovsky and S. V. Huffel. Overview of total least-squares methods. Signal Processing, 10:2283--2302, 2007.
[17]
L. Meier and P. Buhlmann. Smoothing l1-penalized estimators for high dimensional time-course data. Electronic Journal of Statistics, pages 597--615, 2007.
[18]
J.-D. Rolle. Optimal subspaces and constrained principal component analysis. Electronic Journal of Linear Algebra, 10:201--211, July 2003.
[19]
C. Romero, K. S. Li, X., and R. Rossow. Spectrometer-based combustion monitoring for flame stoichiometry and temperature control. Appl. Therm. Eng., 25:659--676, 2005.
[20]
L. S. Rothman and et al. The hitran molecular spectroscopic database: edition of 2000 including updates through 2001. J. Quant. Spectrosc. Radiat. Transfer, 2003.
[21]
M. Thakur, A. Vyas, and C. Shakher. Measurement of temperature and temperature profile of an axisymmetric gaseous flames using lau phase interferometer with linear gratings. Opt. Laser Eng., 36:373--380, 2001.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SensorKDD '09: Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
June 2009
150 pages
ISBN:9781605586687
DOI:10.1145/1601966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. dimensionality reduction
  3. principal components analysis
  4. regression
  5. remote sensing
  6. supervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

KDD09
Sponsor:

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 160
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media