Abstract
Since labor intensive and time consuming issue, manual curation in metabolic information extraction currently was replaced by text mining (TM). While TM in metabolic domain has been attempted previously, it is still challenging due to variety of specific terms and their meanings in different contexts. Named Entity Recognition (NER) generally used to identify interested keyword (protein and metabolite terms) in sentence, this preliminary task therefore highly influences the performance of metabolic TM framework. Conditional Random Fields (CRFs) NER has been actively used during a last decade, because it explicitly outperforms other approaches. However, an efficient CRFs-based NER depends purely on a quality of corpus which is a nontrivial task to produce. This paper introduced a hybrid solution which combines CRFs-based NER, dictionary usage, and complementary modules (constructed from existing corpus) in order to improve the performance of metabolic NER and another similar domain.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baumgartner, W.A., Cohen, K.B., Fox, L.M., Acquaah-Mensah, G., Hunter, L.: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 23(13), i41–i48 (2007)
Andersen, M.R., Nielsen, M.L., Nielsen, J.: Metabolic model integration of the bibliome, genome, metabolome and reactome of Aspergillus niger. Mol. Syst. Biol. 4(1), 178 (2008)
Hettne, K.M., Williams, A.J., van Mulligen, E.M., Kleinjans, J., Tkachenko, V., Kors, J.A.: Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining. J. Cheminform. 2(1), 3 (2010)
Patumcharoenpol, P., Doungpan, N., Meechai, A., Shen, B., Chan, J.H., Vongsangnak, W.: An integrated text mining framework for metabolic interaction network reconstruction. PeerJ 4, e1811 (2016)
Kongburan, W., Padungweang, P., Krathu, W., Chan, J.H.: Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts. In: 8th International Conference on Advanced Computational Intelligence (2016)
Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 522–532 (1998)
Nobata, C., Dobson, P.D., Iqbal, S.A., Mendes, P., Tsujii, J.I., Kell, D.B., Ananiadou, S.: Mining metabolites: extracting the yeast metabolome from the literature. Metabolomics 7(1), 94–101 (2011)
Herrgard, M.J., Swainston, N., Dobson, P., Dunn, W.B., Arga, K.Y., Arvas, M., Hucka, M.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat. Biotechnol. 26(10), 1155–1160 (2008)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)
Gerner, M., Nenadic, G., Bergman, C.M.: LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform. 11(1), 1 (2010)
Rocktschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28(12), 1633–1640 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kongburan, W., Padungweang, P., Krathu, W., Chan, J.H. (2016). Metabolite Named Entity Recognition: A Hybrid Approach. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-46687-3_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)