Skip to main content

Feature Dimensionality Reduction for Mammographic Report Classification

  • Chapter
  • First Online:
Resource Management for Big Data Platforms

Abstract

The amount and the variety of available medical data coming from multiple and heterogeneous sources can inhibit analysis, manual interpretation, and use of simple data management applications. In this paper a deep overview of the principal algorithms for dimensionality reduction is carried out; moreover, the most effective techniques are applied on a dataset composed of 4461 mammographic reports is presented. The most useful medical terms are converted and represented using a TF-IDF matrix, in order to enable data mining and retrieval tasks. A series of query have been performed on the raw matrix and on the same matrix after the dimensionality reduction obtained using the most useful techniques, such as LSI, PCA, and SVD. The obtained query results are comparable to the results achieved using the raw unprocessed matrix, where the processed matrix contains less than 13 % of the raw TF-IDF data using PCA-LSI techniques and less than 6 % of the raw TF-IDF data using SVD technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fayyad, U.M., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining, vol. 21. AAAI Press Menlo Park (1996)

    Google Scholar 

  2. Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inform. Manage. 19(2), 65 (2011)

    Google Scholar 

  3. Farruggia, A., Magro, R., Vitabile, S.: Bayesian network based classification of mammography structured reports. In: 2013 International Conference on Computer Medical Applications (ICCMA), pp. 1–5. IEEE (2013)

    Google Scholar 

  4. Duan, L., Street, W.N., Xu, E.: Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterp. Inform. Syst. 5(2), 169–181 (2011)

    Article  Google Scholar 

  5. Farruggia, A., Magro, R., Vitabile, S.: A text based indexing system for mammographic image retrieval and classification. Future Gener. Comput. Syst. 37, 243–251 (2014)

    Google Scholar 

  6. Agnello, L., Comelli, A., Ardizzone, E., Vitabile, S.: Unsupervised tissue classification of brain MR images for voxel-based morphometry analysis. Int. J. Imaging Syst. Technol. 26(2), 136–150 (2016)

    Google Scholar 

  7. Farruggia, A., Magro, R., Vitabile, S.: A novel web service for mammography images indexing. In: 2013 27th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 225–230. IEEE (2013)

    Google Scholar 

  8. Anchala, R., Pant, H., Prabhakaran, D., Franco, O. H.: Decision support system (DSS) for prevention of cardiovascular disease (CVD) among hypertensive (HTN) patients in Andhra Pradesh, India—a cluster randomised community intervention trial. BMC Public Health 12(1), 1 (2012)

    Google Scholar 

  9. Comelli, A., Agnello, L., Vitabile, S.: An ontology-based retrieval system for mammographic reports. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 1001–1006. IEEE (2015)

    Google Scholar 

  10. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)

    Google Scholar 

  11. Golub, G.H., Van Loan, C.F.: Matrix computations, vol. 3. JHU Press (2012)

    Google Scholar 

  12. Yang, Q., Li, F.: Support vector machine for intrusion detection based on LSI feature selection. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 4113–4117. IEEE (2006)

    Google Scholar 

  13. Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, 2009. ICCTD’09, vol. 1, pp. 58–62. IEEE (2009)

    Google Scholar 

  14. Lin, P., Zhang, J., An, R.: Data dimensionality reduction approach to improve feature selection performance using sparsified SVD. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1393–1400. IEEE (2014)

    Google Scholar 

  15. Zhu, W., Allen, R.B.: Active learning for text classification: Using the LSI subspace signature model. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 149–155. IEEE (2014)

    Google Scholar 

  16. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse process. 25(2–3), 259–284 (1998)

    Google Scholar 

  17. Jolliffe, I.: Principal component analysis. John Wiley & Sons, Ltd (2002)

    Google Scholar 

  18. Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. In: A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, US (2003)

    Google Scholar 

  19. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)

    Google Scholar 

  20. Gorrell, G.: Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. In: EACL, vol. 6, pp. 97–104 (2006)

    Google Scholar 

  21. Saunders, M.A.: Large-scale linear programming using the Cholesky factorization (1972)

    Google Scholar 

  22. O’Leary, D.P., Whitman, P.: Parallel QR factorization by Householder and modified Gram-Schmidt algorithms. Parallel Comput. 16(1), 99–112 (1990)

    Google Scholar 

  23. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvatore Vitabile .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Agnello, L., Comelli, A., Vitabile, S. (2016). Feature Dimensionality Reduction for Mammographic Report Classification. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44881-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44880-0

  • Online ISBN: 978-3-319-44881-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics