ProPythia: A Python Automated Platform for the Classification of Proteins Using Machine Learning

Sequeira, Ana Marta; Lousa, Diana; Rocha, Miguel

doi:10.1007/978-3-030-54568-0_4

ProPythia: A Python Automated Platform for the Classification of Proteins Using Machine Learning

Ana Marta Sequeira¹⁹,
Diana Lousa²⁰ &
Miguel Rocha¹⁹

Conference paper
First Online: 23 July 2020

415 Accesses
3 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1240))

Abstract

A challenging problem in Bioinformatics is to predict protein structure, properties, activities or interactions from their aminoacid sequences. Sequence-derived physicochemical features of proteins have been used to support the development of Machine Learning (ML) models. However, tools and platforms to calculate features from protein sequences and train ML models are scarce and have limitations in terms of performance, user-friendliness and domains of application.

Here, a generic modular semi-automated platform for the classification of proteins based on their physicochemical properties using ML is proposed. The tool, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate several types of protein descriptors, pre-process datasets, execute feature selection and dimensionality reduction, perform clustering, train and optimize ML models and make predictions with 8 different algorithms. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool to apply ML analysis over protein sequences. This platform was tested in the classification of membrane active anticancer and antimicrobial peptides. The package, its source code and documentation, including an user guide and case studies freely available at https://github.com/BioSystemsUM/propythia, it can also be installed through ‘pip install propythia’.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Awad, M., Khanna, R.: Efficient Learning Machines. Apress Media (2015)
Google Scholar
Bhadra, P., Yan, J., Li, J., Fong, S., Siu, S.W.I.: AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8(1), 1–10 (2018)
Article Google Scholar
Cao, D.S., et al.: PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies. J. Chem. Inf. Model. 53(11), 3086–3096 (2013)
Article Google Scholar
Cao, D.S., Xu, Q.S., Liang, Y.Z.: Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29(7), 960–962 (2013)
Article Google Scholar
Chen, Z., et al.: iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34(14), 2499–2502 (2018)
Article Google Scholar
Chen, Z., et al.: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinform. (2019)
Google Scholar
Dong, J., et al.: PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J. Cheminformatics. 10(1), 16 (2018)
Article Google Scholar
Lee, E.Y., Fulan, B.M., Wong, G.C.L., Ferguson, A.L.: Mapping membrane activity in undiscovered peptide sequence space using machine learning. 113(48), 13588–13593 (2016)
Google Scholar
Liu, B.: BioSeq-analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Briefings in Bioinform. 1–15 (2017)
Google Scholar
Manavalan, B., et al.: MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget 8(44), 77121–77136 (2017)
Article Google Scholar
Müller, A.T., Gabernet, G., Hiss, J.A., Schneider, G.: modlAMP: Python for antimicrobial peptides. Bioinformatics (Oxford, England) 33(17), 2753–2755 (2017)
Article Google Scholar
Pande, A., et al.: Computing wide range of protein/peptide features from their sequence and structure. bioRxiv p. 599126 (2019)
Google Scholar

Download references

Acknowledgments

This study was supported by FCT through project PTDC/CCI-BIO/28200/2017 and the strategic funding of UID/BIO/04469/2020, and also by the European Regional Development Fund under the scope of Norte2020, through the projects DeepBio (ref. NORTE-01-0247-FEDER-039831). This work was also financially supported by Project LISBOA-01-0145-FEDER-007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 - Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT - Fundação para a Ciência e a Tecnologia.

Author information

Authors and Affiliations

CEB-Centre Biological Engineering, University of Minho, 4710-057, Braga, Portugal
Ana Marta Sequeira & Miguel Rocha
ITQB NOVA, Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, 2780-157, Oeiras, Portugal
Diana Lousa

Authors

Ana Marta Sequeira
View author publications
You can also search for this author in PubMed Google Scholar
Diana Lousa
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Marta Sequeira .

Editor information

Editors and Affiliations

Enhanced Regenerative Medicine, Istituto Italiano di Tecnologia, Genoa, Genova, Italy
Gabriella Panuccio
Department de Informática, Universidade do Minho, Braga, Portugal
Miguel Rocha
Computer Science Department, University of Vigo, Vigo, Spain
Florentino Fdez-Riverola
Institute for Artificial Intelligence and Big Data (AIBIG), Universiti Malaysia Kelantan, Kampus Kota, Kota Bharu, Malaysia
Mohd Saberi Mohamad
Biotechnology, Intelligent Systems and Educational Technology (BISITE) Research Group, University of Salamanca, Salamanca, Salamanca, Spain
Roberto Casado-Vara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sequeira, A.M., Lousa, D., Rocha, M. (2021). ProPythia: A Python Automated Platform for the Classification of Proteins Using Machine Learning. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 14th International Conference (PACBB 2020). PACBB 2020. Advances in Intelligent Systems and Computing, vol 1240. Springer, Cham. https://doi.org/10.1007/978-3-030-54568-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-54568-0_4
Published: 23 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54567-3
Online ISBN: 978-3-030-54568-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics