Abstract
In almost all datasets some number of abnormal observations is present. Such outliers may affect the process of data analysis. However several methods of outlier detection already exist, there is still a need to look for a new, more effective ones. In this paper we propose a set of objectives that allows to efficiently identify outliers with the use of multiobjective genetic algorithm. Conducted research shown that such a method can be successfully used with the most common genetic algorithms designed for multiobjective optimization. The results of tests, which were conducted on the set of medical data from the repository, indicate that our method can be successfully applied to the medical problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Outlier detection in categorical, text and mixed attribute data. In: Outlier Analysis, pp. 199–223. Springer (2013)
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. ACM SIGMOD Rec. 30, 37–46 (2001)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38. ACM (2003)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Chomatek, L., Duraj, A.: Multiobjective genetic algorithm for outliers detection. In: 2017 IEEE International Conference on Innovations in Intelligent SysTems and Applications (INISTA), pp. 379–384. IEEE (2017)
Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M.J.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, pp. 283–290. Morgan Kaufmann Publishers Inc. (2001)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Duraj, A., Krawczyk, A.: Finding outliers for large medical datasets. Przeglad Elektrotechniczny 86, 188–191 (2010)
Duraj, A., Chomatek, L.: Supporting breast cancer diagnosis with multi-objective genetic algorithm for outlier detection. In: International Conference on Diagnostics of Processes and Systems, pp. 304–315. Springer (2017)
Duraj, A., Szczepaniak., P.S.: Information outliers and their detection. In: Information Studies and the Quest for Transdisciplinarity, pp. 413–437. World Scientific Publishing Company (2017)
Durillo, J.J., Nebro, A.J.: jMetal: a java framework for multi-objective optimization. Adv. Eng. Softw. 42(10), 760–771 (2011)
Durillo, J.J., Nebro, A.J., Alba, E.: The jMetal framework for multi-objective optimization: design and architecture. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2010)
He, Z., Deng, S., Xu, X.: Outlier detection integrating semantic knowledge. In: International Conference on Web-Age Information Management, pp. 126–131. Springer (2002)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9), 1641–1650 (2003)
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Jiang, F., Sui, Y., Cao, C.: Outlier detection using rough set theory. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 79–87 (2005)
Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: VLDB, vol. 99, pp. 211–222 (1999)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. Int. J. Very Large Data Bases (VLDB) 8(3–4), 237–253 (2000)
Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer (1998)
Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 91(9), 992–1007 (2006)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lilford, R., Mohammed, M.A., Spiegelhalter, D., Thomson, R.: Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet 363(9415), 1147–1154 (2004)
Petrovskiy, M.: A hybrid method for patterns mining and outliers detection in the web usage log. In: Advances in Web Intelligence, pp. 954–954 (2003)
Ren, D., Wang, B., Perrizo, W.: Rdf: A density-based outlier detection method using vertical data representation. In: 2004 Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 503–506. IEEE (2004)
Shaari, F., Bakar, A.A., Hamdan, A.R.: A predictive analysis on medical data based on outlier detection method using non-reduct computation. In: International Conference on Advanced Data Mining and Applications. pp. 603–610. Springer (2009)
Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. In: IS & T/SPIE’s Symposium on Electronic Imaging: Science and Technology, pp. 861–870. International Society for Optics and Photonics (1993)
Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.: A robust outlier detection scheme for large data sets. In: 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Citeseer (2001)
Theodore, J., Ivy, K., Raymong, T.: Fast computation of 2D depth contours. ACM SIG KDD, pp. 224–228 (1998)
Wolberg, W.H., Street, W.N., Mangasarian, O.: Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. Cancer Lett. 77(2–3), 163–171 (1994)
Yamanishi, K., Takeuchi, J.i.: Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–394. ACM (2001)
Zitzler, E., Laumanns, M., Thiele, L., et al.: Spea2: Improving the strength pareto evolutionary algorithm (2001)
Acknowledgement
This work was supported by a grant of the Dean of the Faculty of Technical Physics, Information Technology and Applied Mathematics, Lodz University of Technology. The dataset used in our research was taken from the UCI Machine Learning Repository [21].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Chomatek, L., Duraj, A. (2019). Efficient Genetic Algorithm for Breast Cancer Diagnosis. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds) Information Technology in Biomedicine. ITIB 2018. Advances in Intelligent Systems and Computing, vol 762. Springer, Cham. https://doi.org/10.1007/978-3-319-91211-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-91211-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91210-3
Online ISBN: 978-3-319-91211-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)