Abstract
The amount and the variety of available medical data coming from multiple and heterogeneous sources can inhibit analysis, manual interpretation, and use of simple data management applications. In this paper a deep overview of the principal algorithms for dimensionality reduction is carried out; moreover, the most effective techniques are applied on a dataset composed of 4461 mammographic reports is presented. The most useful medical terms are converted and represented using a TF-IDF matrix, in order to enable data mining and retrieval tasks. A series of query have been performed on the raw matrix and on the same matrix after the dimensionality reduction obtained using the most useful techniques, such as LSI, PCA, and SVD. The obtained query results are comparable to the results achieved using the raw unprocessed matrix, where the processed matrix contains less than 13 % of the raw TF-IDF data using PCA-LSI techniques and less than 6 % of the raw TF-IDF data using SVD technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fayyad, U.M., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining, vol. 21. AAAI Press Menlo Park (1996)
Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inform. Manage. 19(2), 65 (2011)
Farruggia, A., Magro, R., Vitabile, S.: Bayesian network based classification of mammography structured reports. In: 2013 International Conference on Computer Medical Applications (ICCMA), pp. 1–5. IEEE (2013)
Duan, L., Street, W.N., Xu, E.: Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterp. Inform. Syst. 5(2), 169–181 (2011)
Farruggia, A., Magro, R., Vitabile, S.: A text based indexing system for mammographic image retrieval and classification. Future Gener. Comput. Syst. 37, 243–251 (2014)
Agnello, L., Comelli, A., Ardizzone, E., Vitabile, S.: Unsupervised tissue classification of brain MR images for voxel-based morphometry analysis. Int. J. Imaging Syst. Technol. 26(2), 136–150 (2016)
Farruggia, A., Magro, R., Vitabile, S.: A novel web service for mammography images indexing. In: 2013 27th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 225–230. IEEE (2013)
Anchala, R., Pant, H., Prabhakaran, D., Franco, O. H.: Decision support system (DSS) for prevention of cardiovascular disease (CVD) among hypertensive (HTN) patients in Andhra Pradesh, India—a cluster randomised community intervention trial. BMC Public Health 12(1), 1 (2012)
Comelli, A., Agnello, L., Vitabile, S.: An ontology-based retrieval system for mammographic reports. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 1001–1006. IEEE (2015)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)
Golub, G.H., Van Loan, C.F.: Matrix computations, vol. 3. JHU Press (2012)
Yang, Q., Li, F.: Support vector machine for intrusion detection based on LSI feature selection. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 4113–4117. IEEE (2006)
Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, 2009. ICCTD’09, vol. 1, pp. 58–62. IEEE (2009)
Lin, P., Zhang, J., An, R.: Data dimensionality reduction approach to improve feature selection performance using sparsified SVD. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1393–1400. IEEE (2014)
Zhu, W., Allen, R.B.: Active learning for text classification: Using the LSI subspace signature model. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 149–155. IEEE (2014)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse process. 25(2–3), 259–284 (1998)
Jolliffe, I.: Principal component analysis. John Wiley & Sons, Ltd (2002)
Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. In: A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, US (2003)
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)
Gorrell, G.: Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. In: EACL, vol. 6, pp. 97–104 (2006)
Saunders, M.A.: Large-scale linear programming using the Cholesky factorization (1972)
O’Leary, D.P., Whitman, P.: Parallel QR factorization by Householder and modified Gram-Schmidt algorithms. Parallel Comput. 16(1), 99–112 (1990)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Agnello, L., Comelli, A., Vitabile, S. (2016). Feature Dimensionality Reduction for Mammographic Report Classification. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-44881-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)