Skip to main content

Feature Selection Applied to Data from the Sloan Digital Sky Survey

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6096))

Abstract

In recent years there has been an explosion in the rate of acquisition of astronomical data. The analysis of astronomical data presents unprecedented opportunities and challenges for data mining in tasks, such as clustering, object discovery and classification. In this work, we address the feature selection problem in classification of photometric and spectroscopic data collected from the SDSS survey. We present a comparison of five feature selection algoritms: best first (BF), scatter search (SS), genetic algorithm (GA), best incremental ranked subset (BI) and best agglomerative ranked subset (BA). Up to now all these strategies were first applied to this paper to study relevant features in SDSS data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abazajian, K., et al.: The third data release of the Sloan Digital Sky Survey. The Astronomical Journal 129, 1755–1759 (2005)

    Article  Google Scholar 

  2. Adams, A., Woolley, A.: Hubble classification of galaxies using neural networks. Vistas in Astronomy 38(3), 273–280 (1994)

    Article  Google Scholar 

  3. Adelman-McCarthy, J.K., et al.: The fourth data release of the Sloan Digital Sky Survey. The Astrophysical Journal Supplement Series 162(1), 38–48 (2006)

    Article  Google Scholar 

  4. Auld, T., Bridges, M., Hobson, M.P., Gull, S.F.: Fast cosmological parameter estimation using neural networks. Monthly Notices of the Royal Astronomical Society 376(1), L11–L15 (2007)

    Article  Google Scholar 

  5. Bailer-Jones, C., Irwin, M., Gilmore, G., von Hippel, T.: Physical parametrization of stellar spectra: the neural network approach. Monthly Notices of the Royal Astronomical Society 292, 157–166 (1997)

    Google Scholar 

  6. Ball, N.M., Brunner, R.J., Myers, A.D.: Robust machine learning applied to astronomical data sets. I. star-galaxy classification of the Sloan Digital Sky Survey DR3 using decision trees. The Astrophysical Journal 650, 497–509 (2006)

    Article  Google Scholar 

  7. García, F., García-Torres, M., Melián, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: Solving feature subset selection problem by a parallel scatter search. European Journal of Operational Research 169(2), 477–489 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  8. García, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

    Google Scholar 

  9. Goldberg, D.E.: Genetic Algorithms for Search Optimization and Machine Learning. Addison-Wesley, Reading (1989)

    Google Scholar 

  10. Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  11. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  12. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  13. Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance, pp. 140–144 (1994)

    Google Scholar 

  14. Liu, H., Setiono, R.: A probabilistic approach to feature selection: a filter solution. In: Proceedings of the 13th International Conference on Machine Learning, pp. 319–327. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  15. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)

    Article  MATH  Google Scholar 

  16. McGlynn, T.A., Suchkov, A.A., Winter, E.L., Hanisch, R.J., White, R.L., Ochsenbein, F., Derriere, S., Voges, W., Corcoran, M.F., Drake, S.A., Donahue, M.: Automated classification of ROSAT sources using heterogeneous multiwavelength source catalogs. The Astrophysical Journal 616, 1284–1300 (2004)

    Article  Google Scholar 

  17. Odewahn, S.C., Nielsen, M.L.: Star-galaxy separation using neural networks. Vistas in Astronomy 38, 281–286 (1994)

    Article  Google Scholar 

  18. Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)

    Google Scholar 

  19. Qu, M., Shih, F.Y., Jing, J., Wang, H.: Automatic solar flare detection using MLP, RBF, and SVM. Solar Physics 217(1), 157–172 (2003)

    Article  Google Scholar 

  20. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  21. Ruiz, R., Aguilar-Ruiz, J.S., Riquelme, J.C.: Best Agglomerative Ranked Subset for Feature Selection. In: JMLR Workshop and Conference Proceedings. New challenges for feature selection in data mining and knowledge discovery, vol. 4, pp. 148–162 (2008)

    Google Scholar 

  22. Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray expression data for cancer classification. Pattern Recognition 39, 2383–2392 (2006)

    Article  Google Scholar 

  23. Sodré, L., Cuevas, H.: Spectral classification of galaxies. Vistas in Astronomy 38, 287–291 (1994)

    Article  Google Scholar 

  24. Storrie-Lombardi, M.C., Irwin, M.J., von Hippel, T., Storrie-Lombardi, L.J.: Spectral classification with principal component analysis and artificial neural networks. Vistas in Astronomy 38(3), 331–340 (1994)

    Article  Google Scholar 

  25. Storrie-Lombardi, M.C., Lahav, O., Sodr, L., Storrie-Lombardi, L.J.: Morphological classification of galaxies by artificial neural networks. Monthly Notices of the Royal Astronomical Society 259, 8–12 (1992)

    Google Scholar 

  26. Stoughton, C., et al.: Sloan Digital Sky Survey: Early Data Release. The Astronomical Journal 123, 485–548 (2002)

    Article  Google Scholar 

  27. Wadadekar, Y.: Estimating photometric redshifts using support vector machines. Publications of the Astronomical Society of the Pacific 117(827), 79–85 (2005)

    Article  Google Scholar 

  28. Witten, I.H., Frank, E.: Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  29. Wozniak, P.R., Williams, S.J., Vestrand, W.T., Gupta, V.: Identifying Red Variables in the Northern Sky Variability Survey. The Astronomical Journal 128(6), 2965–2976 (2004)

    Article  Google Scholar 

  30. York, D.G., Adelman, J., Anderson, J.E., Anderson, S.F., et al.: The Sloan Digital Sky Survey technical summary. The Astronomical Journal 120, 1579–1587 (2000)

    Article  Google Scholar 

  31. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)

    Article  Google Scholar 

  32. Zhang, Y., Zhao, Y.: Automated clustering algorithms for classification of astronomical objects. Astronomy & Astrophysics 422(3), 1113–1121 (2004)

    Article  Google Scholar 

  33. Zhang, Y., Zhao, Y.: A Comparison of BBN, ADTree and MLP in separating quasars from large survey catalogues. Chinese Journal of Astronomy and Astrophysics 7(2), 289–296 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Montero, M.Á., Ruíz, R., García-Torres, M., Sarro, L.M. (2010). Feature Selection Applied to Data from the Sloan Digital Sky Survey. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13022-9_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13021-2

  • Online ISBN: 978-3-642-13022-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics