Abstract
The paper presents a methodology for bioinformatics data analysis. First, it describes the use of data analysis in bioinformatics—data preprocessing approaches, missing data processing approaches, data dimensionality reduction and classification algorithms. Then, the next section determines the most appropriate data analysis methods, which should be used in bioinformatics data analysis methodology to solve diagnostic classification task. The methodology was practically approbated in experiments using WEKA software and real-world bioinformatics data sets. This allowed determination of specific method realizations that show the best classification result; all intermediate results are recorded. Finally, the best preprocessing method sequence for this methodology is determined.
Similar content being viewed by others
REFERENCES
Nigles, M. and Linge, J.P., Bioinformatics, France: Institut Pasteur, 2015. http://www.pasteur.fr/recherche/ unites/Binfs/definition/bioinformatics_definition.html. Accessed April 12, 2015.
Lu, Y. and Han, J., Cancer classification using gene expression data, Inf. Syst., 2003, vol. 28, pp. 243–268.
Zhang, N. and Lu, W.F., An efficient data preprocessing method for mining customer survey data, Industrial Informatics, 5th IEEE International Conference, Vienna, 2007, pp. 573–578.
Data preprocessing techniques for data mining, Winter School on “Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets,” Indian Agricultural Statistics Research Institute, 2002, pp. 139–144. http://www.iasri.res.in/ebook/win_school_aa/notes/Data_Preprocessing.pdf. Accessed April 15, 2015.
Tan, P.-N., Steinbach, M., and Kumar, V., Introduction to Data Mining, Addison-Wesley, 2005.
Li, D., Deogun, J., Spaulding, W., et al., Rough sets and current trends in computing, Proceedings of 4th International Conference, RSCTC 2004, Uppsala, Berlin Heidelberg: Springer, 2004, pp. 573–579.
Maimon, O. and Rokach, L., Data Mining and Knowledge Discovery Handbook, Berlin Heidelberg: Springer, 2010.
Saeys, Y., Inza, I., and Larranaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 2007, vol. 23, no. 19, pp. 2507–2517.
Han, J., Kamber, M., and Pie, J., Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers, 2005, 2nd ed.
Gasparovica, M. and Aleksejeva, L., Feature selection for bioinformatics data sets—is it recommended?, Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012), Latvia, Jelgava, April 26–27, 2012, Jelgava, 2012, pp. 325–335.
Alcalá-Fdez, J., Fernandez, A. Luengo, J., et al., KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, J. Mult.-Valued Logic Soft Comput., 2011, vol. 17, nos. 2–3, pp. 255–287.
Hall, M., Frank, E., Holmes, G., et al., The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl., 2009, vol. 11, no. 1, pp. 10–18.
Sensitivity and Specificity, Michigan State University, Office of Medical Education Research and Development, College of Human Medicine, 2008. http://omerad.msu.edu/ebm/Diagnosis/Diagnosis4.html. Accessed April 21, 2015.
Yu, L. and Liu, H., Feature selection for high-dimensional data: A fast correlation-based filter solution, Proceedings of the 20th International Conference on Machine Learning (ICML-2003), August 21–24, 2003, Washington, DC: AAAI Press, Menlo Park, California, 2003, pp. 856–863.
Gasparovica-Asite, M., Fuzzy classification methodology for processing and analyzing bioinformatics data, PhD Thesis, Riga: Riga Technical University, 2015.
Author information
Authors and Affiliations
Corresponding authors
Additional information
The article is published in the original.
About this article
Cite this article
Gasparovica-Asīte, M., Aleksejeva, L. Classification Methodology for Bioinformatics Data Analysis. Aut. Control Comp. Sci. 53, 28–38 (2019). https://doi.org/10.3103/S0146411619010073
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411619010073