loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Bishnu Sarker ; David W. Ritchie and Sabeur Aridhi

Affiliation: University of Lorraine, Inria, Loria, CNRS, F-54000, Nancy and France

Keyword(s): Machine Learning, Representation Learning, Protein Function Annotation, Bioinformatics, Domain Embedding.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; BioInformatics & Pattern Discovery ; Computational Intelligence ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Machine Learning ; Soft Computing ; Symbolic Systems

Abstract: Due to the recent advancement in genomic sequencing technologies, the number of protein sequences in public databases is growing exponentially. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. The May 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 158 million protein sequences. For the complete exploitation of this huge knowledge base, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. However, there is only about half a million sequences (UniprotKB/SwissProt) are reviewed and functionally annotated by expert curators using information extracted from the published literature and computational analyses. The manual annotation by experts are expensive, slow and insufficient to fill the gap between the annotated and unannotated protein sequences. In this paper, we present an automatic functional anno tation technique using neural network based based word embedding exploiting domain and family information of proteins. Domains are the most conserved regions in protein sequences and constitute the building blocks of 3D protein structures. To do the experiment, we used fastText1, a library for learning of word embeddings and text classification developed by Facebook’s AI Research lab. The experimental results show that domain embeddings perform much better than k-mer based word embeddings. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 35.153.156.108

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sarker, B.; Ritchie, D. and Aridhi, S. (2019). Functional Annotation of Proteins using Domain Embedding based Sequence Classification. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR; ISBN 978-989-758-382-7; ISSN 2184-3228, SciTePress, pages 163-170. DOI: 10.5220/0008353401630170

@conference{kdir19,
author={Bishnu Sarker. and David W. Ritchie. and Sabeur Aridhi.},
title={Functional Annotation of Proteins using Domain Embedding based Sequence Classification},
booktitle={Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR},
year={2019},
pages={163-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008353401630170},
isbn={978-989-758-382-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR
TI - Functional Annotation of Proteins using Domain Embedding based Sequence Classification
SN - 978-989-758-382-7
IS - 2184-3228
AU - Sarker, B.
AU - Ritchie, D.
AU - Aridhi, S.
PY - 2019
SP - 163
EP - 170
DO - 10.5220/0008353401630170
PB - SciTePress