Skip to main content

An Analysis of the Survivability in SEER Breast Cancer Data Using Association Rule Mining

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10067))

Abstract

Medical professionals need a reliable methodology to predict the survivability of patients with breast cancer. In this work, a classical association rule mining algorithm-Apriori was adopted for analyzing the related association relationship between medical attributes of records and the survivability of patients. The SEER Dataset was used in this research. After the dataset was preprocessed, 29606 records was obtained. Each record contains 17 breast cancer related attributes. Then apriori algorithm was applied in these preprocessed records, 326 association rules about ‘survived’ and 22 association rules about ‘not survived’ were obtained finally. These discovered association rules indicate that the attributes of EOD-Lymph Node Involv and SEER historic stage A play important roles in the survivability of patients after analyzed and compared.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. American Cancer Society. Breast Cancer Facts & Figures 2005–2006. Atalanta: American Cancer Society, Inc. (http://www.cancer.org/)

  2. Parkin, D.M., Bray, M.F., Ferlay, M.J., et al.: Global cancer statistics. CA Cancer J. Clin. 55(2), 74–108 (2005)

    Article  Google Scholar 

  3. Ferlay, J., Bray, F., Pisani, P., et al.: GLOBOCAN 2002: cancer incidence, mortality and prevalence worldwide. IARC CancerBase No. 5. version 2.0. Lyon: IARCPress (2004)

    Google Scholar 

  4. Chakrabarti, S., Cox, E., Frank, E., et al.: Data Mining: Know It All. Morgan Kaufmann, San Francisco (2008). pp. 32–33

    Google Scholar 

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, pp. 487–499 (1994)

    Google Scholar 

  6. Richards, G., Rayward-Smith, V.J., Sönksen, P.H., et al.: Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22(3), 215–231 (2001)

    Article  Google Scholar 

  7. Agrawal, A., Choudhary, A.: Identifying hotspots in lung cancer data using association rule mining. In: 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver, BC, pp. 995–1002. IEEE (2011)

    Google Scholar 

  8. Fan, Q., Zhu, C.J., Xiao, J.Y., et al.: An application of apriori algorithm in SEER breast cancer data. In: International Conference on Artificial Intelligence & Computational Intelligence, Sanya, China, pp. 114–116. IEEE (2010)

    Google Scholar 

  9. Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artif. Intell. Med. 26(1–2), 1–24 (2002)

    Article  Google Scholar 

  10. Houston, A.L., Chen, H., Hubbard, S.M., et al.: Medical data mining on the internet: research on a cancer information system. Nucl. Eng. Des. 223(3), 255–262 (1999)

    Google Scholar 

  11. Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Public-Use Data (1973–2012), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, based on the submission, November 2014

  12. Rajesh, K., Anand, S.: Analysis of SEER dataset for breast cancer diagnosis using C4. 5 classification algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 1(2), 1021–2278 (2012)

    Google Scholar 

  13. Rosenberg, J., Chia, Y.L., Plevritis, S.: The effect of age, race, tumor size, tumor grade, and disease stage on invasive ductal breast cancer survival in the US SEER database. Breast Cancer Res. Treat. 89(1), 47–54 (2005)

    Article  Google Scholar 

  14. Bellaachia, A., Guven, E.: Predicting breast cancer survivability using data mining techniques. Age 58(13), 10–110 (2006)

    Google Scholar 

  15. Liu, Yaqin: Study on The Prognosis Model for Breast Cancer. Shanghai Jiao Tong University, Shanghai (2008). (in Chinese)

    Google Scholar 

Download references

Acknowledgments

This study is supported by the China Postdoctoral Science Foundation (2016M592450), and the Hunan Provincial Natural Science Foundation of China (2016JJ4119). Sincerely thanks to the National Cancer Institute, USA for providing the SEER cancer database in public.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Li, F., Duan, Y. (2016). An Analysis of the Survivability in SEER Breast Cancer Data Using Association Rule Mining. In: Wang, G., Ray, I., Alcaraz Calero, J., Thampi, S. (eds) Security, Privacy and Anonymity in Computation, Communication and Storage. SpaCCS 2016. Lecture Notes in Computer Science(), vol 10067. Springer, Cham. https://doi.org/10.1007/978-3-319-49145-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49145-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49144-8

  • Online ISBN: 978-3-319-49145-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics