Skip to main content
Log in

Classification Methodology for Bioinformatics Data Analysis

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

The paper presents a methodology for bioinformatics data analysis. First, it describes the use of data analysis in bioinformatics—data preprocessing approaches, missing data processing approaches, data dimensionality reduction and classification algorithms. Then, the next section determines the most appropriate data analysis methods, which should be used in bioinformatics data analysis methodology to solve diagnostic classification task. The methodology was practically approbated in experiments using WEKA software and real-world bioinformatics data sets. This allowed determination of specific method realizations that show the best classification result; all intermediate results are recorded. Finally, the best preprocessing method sequence for this methodology is determined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Similar content being viewed by others

REFERENCES

  1. Nigles, M. and Linge, J.P., Bioinformatics, France: Institut Pasteur, 2015. http://www.pasteur.fr/recherche/ unites/Binfs/definition/bioinformatics_definition.html. Accessed April 12, 2015.

  2. Lu, Y. and Han, J., Cancer classification using gene expression data, Inf. Syst., 2003, vol. 28, pp. 243–268.

    Article  MATH  Google Scholar 

  3. Zhang, N. and Lu, W.F., An efficient data preprocessing method for mining customer survey data, Industrial Informatics, 5th IEEE International Conference, Vienna, 2007, pp. 573–578.

  4. Data preprocessing techniques for data mining, Winter School on “Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets,” Indian Agricultural Statistics Research Institute, 2002, pp. 139–144. http://www.iasri.res.in/ebook/win_school_aa/notes/Data_Preprocessing.pdf. Accessed April 15, 2015.

  5. Tan, P.-N., Steinbach, M., and Kumar, V., Introduction to Data Mining, Addison-Wesley, 2005.

    Google Scholar 

  6. Li, D., Deogun, J., Spaulding, W., et al., Rough sets and current trends in computing, Proceedings of 4th International Conference, RSCTC 2004, Uppsala, Berlin Heidelberg: Springer, 2004, pp. 573–579.

  7. Maimon, O. and Rokach, L., Data Mining and Knowledge Discovery Handbook, Berlin Heidelberg: Springer, 2010.

    Book  MATH  Google Scholar 

  8. Saeys, Y., Inza, I., and Larranaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 2007, vol. 23, no. 19, pp. 2507–2517.

    Article  Google Scholar 

  9. Han, J., Kamber, M., and Pie, J., Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers, 2005, 2nd ed.

    Google Scholar 

  10. Gasparovica, M. and Aleksejeva, L., Feature selection for bioinformatics data sets—is it recommended?, Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012), Latvia, Jelgava, April 26–27, 2012, Jelgava, 2012, pp. 325–335.

  11. Alcalá-Fdez, J., Fernandez, A. Luengo, J., et al., KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, J. Mult.-Valued Logic Soft Comput., 2011, vol. 17, nos. 2–3, pp. 255–287.

    Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., et al., The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl., 2009, vol. 11, no. 1, pp. 10–18.

    Article  Google Scholar 

  13. Sensitivity and Specificity, Michigan State University, Office of Medical Education Research and Development, College of Human Medicine, 2008. http://omerad.msu.edu/ebm/Diagnosis/Diagnosis4.html. Accessed April 21, 2015.

  14. Yu, L. and Liu, H., Feature selection for high-dimensional data: A fast correlation-based filter solution, Proceedings of the 20th International Conference on Machine Learning (ICML-2003), August 21–24, 2003, Washington, DC: AAAI Press, Menlo Park, California, 2003, pp. 856–863.

  15. Gasparovica-Asite, M., Fuzzy classification methodology for processing and analyzing bioinformatics data, PhD Thesis, Riga: Riga Technical University, 2015.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to M. Gasparovica-Asīte or L. Aleksejeva.

Additional information

The article is published in the original.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gasparovica-Asīte, M., Aleksejeva, L. Classification Methodology for Bioinformatics Data Analysis. Aut. Control Comp. Sci. 53, 28–38 (2019). https://doi.org/10.3103/S0146411619010073

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411619010073

Keywords:

Navigation