Abstract
We propose a novel bio-inspired solution for biomedical article classification. Our method draws from an existing model of T-cell cross-regulation in the vertebrate immune system (IS), which is a complex adaptive system of millions of cells interacting to distinguish between harmless and harmful intruders. Analogously, automatic biomedical article classification assumes that the interaction and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of articles—at a minimum, two classes with relevant and irrelevant articles for a given concept (e.g. articles with protein-protein interaction information). Our agent-based method for document classification expands the existing analytical model of Carneiro et al. [1], by allowing us to deal simultaneously with many distinct T-cell features (epitomes) and their collective dynamics using agent based modeling. We already extended this model to develop a bio-inspired spam-detection system [2, 3]. Here we develop our agent-base model further, and test it on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge [4]. We study several new parameter configurations leading to encouraging results comparable to state-of-the-art classifiers. These results help us understand both T-cell cross-regulation and its applicability to document classification in general. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Carneiro, J., Leon, K., Caramalho, Í., van den Dool, C., Gardner, R., Oliveira, V., Bergman, M., Sepúlveda, N., Paixão, T., Faro, J., et al.: When three is not a crowd: a Crossregulation Model of the dynamics and repertoire selection of regulatory CD4 T cells. Immunological Reviews 216(1), 48–68 (2007)
Abi-Haidar, A., Rocha, L.: Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift. In: Bentley, P.J., Lee, D., Jung, S. (eds.) ICARIS 2008. LNCS, vol. 5132, p. 36. Springer, Heidelberg (2008)
Abi-Haidar, A., Rocha, L.: Adaptive spam detection inspired by the immune system. In: Bullock, S., Noble, J., Watson, R., Bedau, M.A. (eds.) Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems, pp. 1–8. MIT Press, Cambridge (2008)
Krallinger, M., et al.: The BioCreative II. 5 challenge overview. In: Proc. the BioCreative II. 5 Workshop 2009 on Digital Annotations, pp. 7–9 (2009)
Myers, G.: Whole-genome DNA sequencing. Computing in Science & Engineering [see also IEEE Computational Science and Engineering] 1(3), 33–43 (1999)
Schena, M., Shalon, D., Davis, R., Brown, P., et al.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science (Washington) 270(5235), 467–470 (1995)
Hunter, L., Cohen, K.: Biomedical Language Processing: What’s Beyond PubMed? Molecular Cell 21(5), 589–594 (2006)
Pubmed
Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)
Feldman, R., Sanger, J.: The Text Mining Handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge (2006)
Abi-Haidar, A., Kaur, J., Maguitman, A., Radivojac, P., Rechtsteiner, A., Verspoor, K., Wang, Z., Rocha, L.: Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. Genome Biology 9(2), S11 (2008)
Krallinger, M., Valencia, A.: Evaluating the detection and ranking of protein interaction relevant articles: the BioCreative challenge interaction article sub-task (IAS). In: Proceedings of the Second Biocreative Challenge Evaluation Workshop (2007)
Kolchinsky, A., Abi-Haidar, A., Kaur, J., Hamed, A., Rocha, L.: Classication of protein-protein interaction documents using text and citation network features (in press)
Hofmeyr, S.: An Interpretative Introduction to the Immune System. In: Design Principles for the Immune System and Other Distributed Autonomous Systems (2001)
Timmis, J.: Artificial immune systems today and tomorrow. Natural Computing 6(1), 1–18 (2007)
Twycross, J., Cayzer, S.: An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK (2002)
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam, CEAS (2006)
Joachims, T.: Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer Academic Publishers, Dordrecht (2002)
Abi-Haidar, A., Kaur, J., Maguitman, A., Radivojac, P., Retchsteiner, A., Verspoor, K., Wang, Z., Rocha, L.: Uncovering protein-protein interactions in the bibliome. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 247–255 (2007) ISBN 84-933255-6-2
Kolchinsky, A., Abi-Haidar, A., Kaur, J., Hamed, A., Rocha, L.: Classification of protein-protein interaction documents using text and citation network features. In: BioCreative II.5 Workshop 2009: Special Session on Digital Annotations, Madrid, Spain, October 7-9, p. 34 (2009)
de Sepulveda, N.H.S.: How is the t-cell repertoire shaped (2009)
Porter, M.: An algorithm for suffix stripping. In: Program 1966-2006: Celebrating 40 Years of ICT in Libraries, Museums and Archives (2006)
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abi-Haidar, A., Rocha, L.M. (2010). Biomedical Article Classification Using an Agent-Based Model of T-Cell Cross-Regulation. In: Hart, E., McEwan, C., Timmis, J., Hone, A. (eds) Artificial Immune Systems. ICARIS 2010. Lecture Notes in Computer Science, vol 6209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14547-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-14547-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14546-9
Online ISBN: 978-3-642-14547-6
eBook Packages: Computer ScienceComputer Science (R0)