skip to main content
10.1145/2492517.2500281acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Combining information extraction and text mining for cancer biomarker detection

Authors Info & Claims
Published:25 August 2013Publication History

ABSTRACT

Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.

References

  1. D. Applet, et al. SRI International FASTUS system: Muc-6 test results and analysis. Proc. of the Message Understanding Conference, pp. 237--248, 199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Cohen, et al. Using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts. BMC Bioinformatics, 6(1): 103, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. B. Domon, R. Aebersold. Mass spectrometry and protein analysis. Science; 312(5771): 212--7. 2006.Google ScholarGoogle Scholar
  4. P. B. Dobrokhotov, et al. Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation. Bioinformatics, 19 Suppl 1, 2003.Google ScholarGoogle Scholar
  5. I. Donaldson, et al. Prebind and textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4(1): 11, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning, Proc. of IJCAI, pp. 1022--1029, 1993.Google ScholarGoogle Scholar
  7. I. Gat-Viks, A. Tanay and R. Shamir. Modeling and analysis of heterogeneousregulation in biological networks. Journal of Computational Biology, 11(6): 1034--49, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Glenisson, et al. Evaluation of the vector space representation in text-based gene clustering. In Proc of PSB, pp. 391--402, 2003.Google ScholarGoogle Scholar
  9. P. Glenisson, et al. TXTGate: profiling gene groups with text-based information. Genome Biology, 5: R43+, 2004.Google ScholarGoogle Scholar
  10. L. Huiqing, L. Jinyan and W. Limsoon. A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics 13: 51--60, 2002.Google ScholarGoogle Scholar
  11. D. Hanisch, et al. Playing biology's name game: identifying protein names in scientific text. Proc. of PSB, pp. 403--414, Lihue, Hawaii, 2003.Google ScholarGoogle Scholar
  12. L. Hirschman, A. A. Morgan and A. S. Yeh. Rutabaga by any other name: extracting biological names. Journal of Biomedica Informatics, 35(4): 247--259, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kulasingam and E. P. Diamandis. Strategies for discovering novel cancer biomarkers through utilization of emerging technologies. Nature Clinical Practice Oncology. 2008. (10): 588--99.Google ScholarGoogle Scholar
  14. Y. Lu and J. Han. Cancer classification using gene expression data. Information Systems; 28(4): 243--68, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Liu and C. Friedman. Mining terminological knowledge in large biomedical corpora. Proc. of PSB, pp. 415--426, 2003.Google ScholarGoogle Scholar
  16. S. Novichkova, S. Egorov and N. Daraselia. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics, 19(13): 1699--1706, Sept. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. T. Ono, et al. Automated extraction of information on protein - protein interactions from the biological literature.Google ScholarGoogle Scholar
  18. J. H. Park, et al. Protein Expr. Purif. 22, 60--6, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. M. Perou, et al. Molecular portraits of human breast tumours. Nature. 2000; 406(6797): 747--52.Google ScholarGoogle Scholar
  20. S. Raychaudhuri, H. Schutze, and R. B. Altman. Using text analysis to identify functionally coherent gene groups. Genome Research, 12(10): 1582--1590, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Sekimizu, H. S. Park, T. Jun'ichi. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. Genome Inform Ser Workshop, 9: 62--71, 1998.Google ScholarGoogle Scholar
  22. L. Tanabe and W. J. Wilbur. Tagging gene and protein names in biomedical text. Bioinformatics, 18(8): 1124--1132, Aug. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. P. Vert and M. Kanehisa. Extracting active pathways from gene expression data. Bioinformatics 2003, 19(Suppl 2): II238--II244.Google ScholarGoogle Scholar
  24. R. Varshavsky, et al. Novel unsupervised feature filtering of biological data. Bioinformatics, 22, e507-e513, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Weigelt, et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Research. 2005; 65(20): 9155--8.Google ScholarGoogle Scholar
  26. M. Weeber, et al. Text-based discovery in biomedicine: the architecture of the DAD-system. Proceedings / AMIA... Annual Symposium. AMIA Symposium, pages 903--907, 2000.Google ScholarGoogle Scholar
  27. H. Xu, et al. Facilitating cancer research using natural language processing of pathology reports. Studies in health technology and informatics, 107(Pt 1): 565--572, 2004.Google ScholarGoogle Scholar
  28. Y. Xu, Z. Wang, Y. Lei, Y. Zhao, and Y. Xue. Mba: a literature mining system for extracting biomedical abbreviations. BMC Bioinformatics, 10(1): 14, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Yakushiji, et al. Event extraction from biomedical papers using a full parser. Proc. of PSB. 6, 408--419 2001.Google ScholarGoogle Scholar
  30. H. Yu and E. Agichtein. Extracting synonymous gene and protein terms from biological literature. Bioinformatics, 19 Suppl 1(suppl 1): i340--i349, July 2003.Google ScholarGoogle ScholarCross RefCross Ref
  31. H. Yu, et al. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp, pages 919--923, 2002.Google ScholarGoogle Scholar
  32. G. Zhou, et al. Recognizing names in biomedical texts: a machine learning approach. Bioinformatics, 20(7): 1178--1190, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining information extraction and text mining for cancer biomarker detection

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
            August 2013
            1558 pages
            ISBN:9781450322409
            DOI:10.1145/2492517

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 August 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate116of549submissions,21%

            Upcoming Conference

            KDD '24
          • Article Metrics

            • Downloads (Last 12 months)1
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader