Skip to main content

ProPythia: A Python Automated Platform for the Classification of Proteins Using Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1240))

Abstract

A challenging problem in Bioinformatics is to predict protein structure, properties, activities or interactions from their aminoacid sequences. Sequence-derived physicochemical features of proteins have been used to support the development of Machine Learning (ML) models. However, tools and platforms to calculate features from protein sequences and train ML models are scarce and have limitations in terms of performance, user-friendliness and domains of application.

Here, a generic modular semi-automated platform for the classification of proteins based on their physicochemical properties using ML is proposed. The tool, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate several types of protein descriptors, pre-process datasets, execute feature selection and dimensionality reduction, perform clustering, train and optimize ML models and make predictions with 8 different algorithms. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool to apply ML analysis over protein sequences. This platform was tested in the classification of membrane active anticancer and antimicrobial peptides. The package, its source code and documentation, including an user guide and case studies freely available at https://github.com/BioSystemsUM/propythia, it can also be installed through ‘pip install propythia’.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Awad, M., Khanna, R.: Efficient Learning Machines. Apress Media (2015)

    Google Scholar 

  2. Bhadra, P., Yan, J., Li, J., Fong, S., Siu, S.W.I.: AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8(1), 1–10 (2018)

    Article  Google Scholar 

  3. Cao, D.S., et al.: PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies. J. Chem. Inf. Model. 53(11), 3086–3096 (2013)

    Article  Google Scholar 

  4. Cao, D.S., Xu, Q.S., Liang, Y.Z.: Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29(7), 960–962 (2013)

    Article  Google Scholar 

  5. Chen, Z., et al.: iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34(14), 2499–2502 (2018)

    Article  Google Scholar 

  6. Chen, Z., et al.: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinform. (2019)

    Google Scholar 

  7. Dong, J., et al.: PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J. Cheminformatics. 10(1), 16 (2018)

    Article  Google Scholar 

  8. Lee, E.Y., Fulan, B.M., Wong, G.C.L., Ferguson, A.L.: Mapping membrane activity in undiscovered peptide sequence space using machine learning. 113(48), 13588–13593 (2016)

    Google Scholar 

  9. Liu, B.: BioSeq-analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Briefings in Bioinform. 1–15 (2017)

    Google Scholar 

  10. Manavalan, B., et al.: MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget 8(44), 77121–77136 (2017)

    Article  Google Scholar 

  11. Müller, A.T., Gabernet, G., Hiss, J.A., Schneider, G.: modlAMP: Python for antimicrobial peptides. Bioinformatics (Oxford, England) 33(17), 2753–2755 (2017)

    Article  Google Scholar 

  12. Pande, A., et al.: Computing wide range of protein/peptide features from their sequence and structure. bioRxiv p. 599126 (2019)

    Google Scholar 

Download references

Acknowledgments

This study was supported by FCT through project PTDC/CCI-BIO/28200/2017 and the strategic funding of UID/BIO/04469/2020, and also by the European Regional Development Fund under the scope of Norte2020, through the projects DeepBio (ref. NORTE-01-0247-FEDER-039831). This work was also financially supported by Project LISBOA-01-0145-FEDER-007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 - Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT - Fundação para a Ciência e a Tecnologia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Marta Sequeira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sequeira, A.M., Lousa, D., Rocha, M. (2021). ProPythia: A Python Automated Platform for the Classification of Proteins Using Machine Learning. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 14th International Conference (PACBB 2020). PACBB 2020. Advances in Intelligent Systems and Computing, vol 1240. Springer, Cham. https://doi.org/10.1007/978-3-030-54568-0_4

Download citation

Publish with us

Policies and ethics