Authors:
Abdur Rahman M. A. Basher
1
and
Steven J. Hallam
1
;
2
Affiliations:
1
Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC V5Z 4S6, Canada
;
2
Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
Keyword(s):
Pathway Group, Relabeling, Data Augmentation, Correlated Models, Metabolic Pathway Prediction, MetaCyc.
Abstract:
Metabolic pathway prediction from genomic sequence information is an essential step in determining the capacity of living things to transform matter and energy at different levels of biological organization. A detailed
and accurate pathway map enables researchers to interpret and engineer the flow of biological information
from genotype to phenotype in both organismal and multi-organismal contexts. In this paper, we propose
two novel hierarchical mixture models, SOAP (sparse correlated pathway group) and SPREAT (distributed
sparse correlated pathway group), to improve pathway prediction outcomes. Both models leverage pathway
abundance to represent an organismal genome as a mixed distribution of groups, and each group, in turn, is
a mixture of pathways. Moreover, both models deal with missing potential pathways in the training set by
provisioning supplementary pathways into the learning framework as part of noise reduction efforts. Because
the introduction of supplementary pa
thways may lead to overestimation of some pathways, dual sparseness is
applied. The resulting pathway group dataset is then used to train multi-label learning algorithms. Model effectiveness was evaluated on metabolic pathway prediction where correlated models, in particular, SOAP was
able to equal or exceed the performance of previous pathway prediction algorithms on organismal genomes.
(More)