Abstract
In experimental sciences, diversity tends to difficult predictive models’ proper generalization across data provided by different laboratories. Thus, training on a data set produced by one lab and testing on data provided by another lab usually results in low classification accuracy. Despite the fact that the same protocols were followed, variability on measurements can introduce unforeseen variations that affect the quality of the model. This paper proposes a Genetic Programming based approach, where a transformation of the data from the second lab is evolved driven by classifier performance. A real-world problem, prostate cancer diagnosis, is presented as an example where the proposed approach was capable of repairing the fracture between the data of two different laboratories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wyse, N., Dubes, R., Jain, A.: A critical evaluation of intrinsic dimensionality algorithmsa critical evaluation of intrinsic dimensionality algorithms. In: Gelsema, E.S., Kanal, L.N. (eds.) Pattern recognition in practice, Amsterdam, pp. 415–425. Morgan Kauffman Publishers, Inc., San Francisco (1980)
Kim, K.A., Oh, S.Y., Choi, H.C.: Facial feature extraction using pca and wavelet multi-resolution images. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, p. 439. IEEE Computer Society, Los Alamitos (2004)
Podolak, I.T.: Facial component extraction and face recognition with support vector machines. In: FGR 2002: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, p. 83. IEEE Computer Society, Los Alamitos (2002)
Pei, M., Goodman, E.D., Punch, W.F.: Pattern discovery from data using genetic algorithms. In: Proceeding of 1st Pacific-Asia Conference Knowledge Discovery & Data Mining, PAKDD 1997 (1997)
Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective. SECS, vol. 453. Kluwer Academic, Boston (1998)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
Tackett, W.A.: Genetic programming for feature discovery and image discrimination. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 303–311. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Sherrah, J.R., Bogner, R.E., Bouzerdoum, A.: The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming. In: Proc. 2nd International Conference on Genetic Programming (GP 1997), pp. 304–312. Morgan Kaufmann, San Francisco (1997)
Kotani, M., Ozawa, S., Nakai, M., Akazawa, K.: Emergence of feature extraction function using genetic programming. In: KES, pp. 149–152 (1999)
Bot, M.C.J.: Feature extraction for the k-nearest neighbour classifier with genetic programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 256–267. Springer, Heidelberg (2001)
Zhang, Y., Rockett, P.I.: A generic optimal feature extraction method using multiobjective genetic programming. Technical Report VIE 2006/001, Department of Electronic and Electrical Engineering, University of Sheffield, UK (2006)
Guo, H., Nandi, A.K.: Breast cancer diagnosis using genetic programming generated feature. Pattern Recognition 39(5), 980–987 (2006)
Zhang, Y., Rockett, P.I.: A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evolutionary Computation 17(1), 89–115 (2009)
Harris, C.: An investigation into the Application of Genetic Programming techniques to Signal Analysis and Feature Detection,September. University College, London (September 26, 1997)
Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265–281 (2005)
Wang, K., Zhou, S., Fu, C.A., Yu, J.X., Jeffrey, F., Yu, X.: Mining changes of classification by correspondence tracing. In: Proceedings of the 2003 SIAM International Conference on Data Mining, SDM 2003 (2003)
Yang, Y., Wu, X., Zhu, X.: Conceptual equivalence for contrast mining in classification learning. Data & Knowledge Engineering 67(3), 413–429 (2008)
Cieslak, D.A., Chawla, N.V.: A framework for monitoring classifiers’ performance: when and why failure occurs? Knowledge and Information Systems 18(1), 83–108 (2009)
Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)
AmericanCancerSociety: How many men get prostate cancer? http://www.cancer.org/docroot/CRI/content/CRI_2_2_1X_How_many_men_get_prostate_cancer_36.asp
Fernandez, D.C., Bhargava, R., Hewitt, S.M., Levin, I.W.: Infrared spectroscopic imaging for histopathologic recognition. Nature Biotechnology 23(4), 469–474 (2005)
Levin, I.W., Bhargava, R.: Fourier transform infrared vibrational spectroscopic imaging: integrating microscopy and molecular recognition. Annual Review of Physical Chemistry 56, 429–474 (2005)
Llorà, X., Reddy, R., Matesic, B., Bhargava, R.: Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation GECCO 2007, pp. 2098–2105. ACM, New York (2007)
Llorà, X., Priya, A., Bhargava, R.: Observer-invariant histopathology using genetics-based machine learning. Natural Computing: An International Journal 8(1), 101–120 (2009)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
García, S., Herrera, F.: An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
García, S., Fernández, A., Luengo, J., Herrera, F.: A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability. Soft Computing 13(10), 959–977 (2009)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10), 2044–2064 (2010)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1(6), 80–83 (1945)
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moreno-Torres, J.G., Llorà, X., Goldberg, D.E., Bhargava, R. (2010). On the Homogenization of Data from Two Laboratories Using Genetic Programming. In: Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.V. (eds) Learning Classifier Systems. IWLCS IWLCS 2009 2008. Lecture Notes in Computer Science(), vol 6471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17508-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-17508-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17507-7
Online ISBN: 978-3-642-17508-4
eBook Packages: Computer ScienceComputer Science (R0)