Abstract
Medical professionals need a reliable methodology to predict the survivability of patients with breast cancer. In this work, a classical association rule mining algorithm-Apriori was adopted for analyzing the related association relationship between medical attributes of records and the survivability of patients. The SEER Dataset was used in this research. After the dataset was preprocessed, 29606 records was obtained. Each record contains 17 breast cancer related attributes. Then apriori algorithm was applied in these preprocessed records, 326 association rules about ‘survived’ and 22 association rules about ‘not survived’ were obtained finally. These discovered association rules indicate that the attributes of EOD-Lymph Node Involv and SEER historic stage A play important roles in the survivability of patients after analyzed and compared.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
American Cancer Society. Breast Cancer Facts & Figures 2005–2006. Atalanta: American Cancer Society, Inc. (http://www.cancer.org/)
Parkin, D.M., Bray, M.F., Ferlay, M.J., et al.: Global cancer statistics. CA Cancer J. Clin. 55(2), 74–108 (2005)
Ferlay, J., Bray, F., Pisani, P., et al.: GLOBOCAN 2002: cancer incidence, mortality and prevalence worldwide. IARC CancerBase No. 5. version 2.0. Lyon: IARCPress (2004)
Chakrabarti, S., Cox, E., Frank, E., et al.: Data Mining: Know It All. Morgan Kaufmann, San Francisco (2008). pp. 32–33
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, pp. 487–499 (1994)
Richards, G., Rayward-Smith, V.J., Sönksen, P.H., et al.: Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22(3), 215–231 (2001)
Agrawal, A., Choudhary, A.: Identifying hotspots in lung cancer data using association rule mining. In: 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver, BC, pp. 995–1002. IEEE (2011)
Fan, Q., Zhu, C.J., Xiao, J.Y., et al.: An application of apriori algorithm in SEER breast cancer data. In: International Conference on Artificial Intelligence & Computational Intelligence, Sanya, China, pp. 114–116. IEEE (2010)
Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artif. Intell. Med. 26(1–2), 1–24 (2002)
Houston, A.L., Chen, H., Hubbard, S.M., et al.: Medical data mining on the internet: research on a cancer information system. Nucl. Eng. Des. 223(3), 255–262 (1999)
Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Public-Use Data (1973–2012), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, based on the submission, November 2014
Rajesh, K., Anand, S.: Analysis of SEER dataset for breast cancer diagnosis using C4. 5 classification algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 1(2), 1021–2278 (2012)
Rosenberg, J., Chia, Y.L., Plevritis, S.: The effect of age, race, tumor size, tumor grade, and disease stage on invasive ductal breast cancer survival in the US SEER database. Breast Cancer Res. Treat. 89(1), 47–54 (2005)
Bellaachia, A., Guven, E.: Predicting breast cancer survivability using data mining techniques. Age 58(13), 10–110 (2006)
Liu, Yaqin: Study on The Prognosis Model for Breast Cancer. Shanghai Jiao Tong University, Shanghai (2008). (in Chinese)
Acknowledgments
This study is supported by the China Postdoctoral Science Foundation (2016M592450), and the Hunan Provincial Natural Science Foundation of China (2016JJ4119). Sincerely thanks to the National Cancer Institute, USA for providing the SEER cancer database in public.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, F., Duan, Y. (2016). An Analysis of the Survivability in SEER Breast Cancer Data Using Association Rule Mining. In: Wang, G., Ray, I., Alcaraz Calero, J., Thampi, S. (eds) Security, Privacy and Anonymity in Computation, Communication and Storage. SpaCCS 2016. Lecture Notes in Computer Science(), vol 10067. Springer, Cham. https://doi.org/10.1007/978-3-319-49145-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-49145-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49144-8
Online ISBN: 978-3-319-49145-5
eBook Packages: Computer ScienceComputer Science (R0)