Abstract
In recent years there has been an explosion in the rate of acquisition of astronomical data. The analysis of astronomical data presents unprecedented opportunities and challenges for data mining in tasks, such as clustering, object discovery and classification. In this work, we address the feature selection problem in classification of photometric and spectroscopic data collected from the SDSS survey. We present a comparison of five feature selection algoritms: best first (BF), scatter search (SS), genetic algorithm (GA), best incremental ranked subset (BI) and best agglomerative ranked subset (BA). Up to now all these strategies were first applied to this paper to study relevant features in SDSS data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abazajian, K., et al.: The third data release of the Sloan Digital Sky Survey. The Astronomical Journal 129, 1755–1759 (2005)
Adams, A., Woolley, A.: Hubble classification of galaxies using neural networks. Vistas in Astronomy 38(3), 273–280 (1994)
Adelman-McCarthy, J.K., et al.: The fourth data release of the Sloan Digital Sky Survey. The Astrophysical Journal Supplement Series 162(1), 38–48 (2006)
Auld, T., Bridges, M., Hobson, M.P., Gull, S.F.: Fast cosmological parameter estimation using neural networks. Monthly Notices of the Royal Astronomical Society 376(1), L11–L15 (2007)
Bailer-Jones, C., Irwin, M., Gilmore, G., von Hippel, T.: Physical parametrization of stellar spectra: the neural network approach. Monthly Notices of the Royal Astronomical Society 292, 157–166 (1997)
Ball, N.M., Brunner, R.J., Myers, A.D.: Robust machine learning applied to astronomical data sets. I. star-galaxy classification of the Sloan Digital Sky Survey DR3 using decision trees. The Astrophysical Journal 650, 497–509 (2006)
García, F., García-Torres, M., Melián, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: Solving feature subset selection problem by a parallel scatter search. European Journal of Operational Research 169(2), 477–489 (2006)
García, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
Goldberg, D.E.: Genetic Algorithms for Search Optimization and Machine Learning. Addison-Wesley, Reading (1989)
Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1998)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance, pp. 140–144 (1994)
Liu, H., Setiono, R.: A probabilistic approach to feature selection: a filter solution. In: Proceedings of the 13th International Conference on Machine Learning, pp. 319–327. Morgan Kaufmann, San Francisco (1996)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)
McGlynn, T.A., Suchkov, A.A., Winter, E.L., Hanisch, R.J., White, R.L., Ochsenbein, F., Derriere, S., Voges, W., Corcoran, M.F., Drake, S.A., Donahue, M.: Automated classification of ROSAT sources using heterogeneous multiwavelength source catalogs. The Astrophysical Journal 616, 1284–1300 (2004)
Odewahn, S.C., Nielsen, M.L.: Star-galaxy separation using neural networks. Vistas in Astronomy 38, 281–286 (1994)
Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)
Qu, M., Shih, F.Y., Jing, J., Wang, H.: Automatic solar flare detection using MLP, RBF, and SVM. Solar Physics 217(1), 157–172 (2003)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Ruiz, R., Aguilar-Ruiz, J.S., Riquelme, J.C.: Best Agglomerative Ranked Subset for Feature Selection. In: JMLR Workshop and Conference Proceedings. New challenges for feature selection in data mining and knowledge discovery, vol. 4, pp. 148–162 (2008)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray expression data for cancer classification. Pattern Recognition 39, 2383–2392 (2006)
Sodré, L., Cuevas, H.: Spectral classification of galaxies. Vistas in Astronomy 38, 287–291 (1994)
Storrie-Lombardi, M.C., Irwin, M.J., von Hippel, T., Storrie-Lombardi, L.J.: Spectral classification with principal component analysis and artificial neural networks. Vistas in Astronomy 38(3), 331–340 (1994)
Storrie-Lombardi, M.C., Lahav, O., Sodr, L., Storrie-Lombardi, L.J.: Morphological classification of galaxies by artificial neural networks. Monthly Notices of the Royal Astronomical Society 259, 8–12 (1992)
Stoughton, C., et al.: Sloan Digital Sky Survey: Early Data Release. The Astronomical Journal 123, 485–548 (2002)
Wadadekar, Y.: Estimating photometric redshifts using support vector machines. Publications of the Astronomical Society of the Pacific 117(827), 79–85 (2005)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Wozniak, P.R., Williams, S.J., Vestrand, W.T., Gupta, V.: Identifying Red Variables in the Northern Sky Variability Survey. The Astronomical Journal 128(6), 2965–2976 (2004)
York, D.G., Adelman, J., Anderson, J.E., Anderson, S.F., et al.: The Sloan Digital Sky Survey technical summary. The Astronomical Journal 120, 1579–1587 (2000)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)
Zhang, Y., Zhao, Y.: Automated clustering algorithms for classification of astronomical objects. Astronomy & Astrophysics 422(3), 1113–1121 (2004)
Zhang, Y., Zhao, Y.: A Comparison of BBN, ADTree and MLP in separating quasars from large survey catalogues. Chinese Journal of Astronomy and Astrophysics 7(2), 289–296 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Montero, M.Á., Ruíz, R., García-Torres, M., Sarro, L.M. (2010). Feature Selection Applied to Data from the Sloan Digital Sky Survey. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-13022-9_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)