research-article

Combining information extraction and text mining for cancer biomarker detection

Authors:
Khaled Dawoud

University of Calgary, Calgary, Alberta, Canada

University of Calgary, Calgary, Alberta, Canada
View Profile

,
Shang Gao

University of Calgary, Calgary, Alberta, Canada

University of Calgary, Calgary, Alberta, Canada
View Profile

,
Ala Qabaja

University of Calgary, Calgary, Alberta, Canada

University of Calgary, Calgary, Alberta, Canada
View Profile

,
Panagiotis Karampelas

Hellenic American University, Manchester, NH

Hellenic American University, Manchester, NH
View Profile

,
Reda Alhajj

University of Calgary, Calgary, Alberta, Canada and Hellenic American University, Manchester, NH and Global University, Beirut, Lebanon

University of Calgary, Calgary, Alberta, Canada and Hellenic American University, Manchester, NH and Global University, Beirut, Lebanon
View Profile

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningAugust 2013Pages 948–955https://doi.org/10.1145/2492517.2500281

Published:25 August 2013Publication History

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Pages 948–955

ABSTRACT

Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.

References

D. Applet, et al. SRI International FASTUS system: Muc-6 test results and analysis. Proc. of the Message Understanding Conference, pp. 237--248, 199. Google ScholarDigital Library
A. Cohen, et al. Using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts. BMC Bioinformatics, 6(1): 103, 2005.Google ScholarCross Ref
B. Domon, R. Aebersold. Mass spectrometry and protein analysis. Science; 312(5771): 212--7. 2006.Google Scholar
P. B. Dobrokhotov, et al. Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation. Bioinformatics, 19 Suppl 1, 2003.Google Scholar
I. Donaldson, et al. Prebind and textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4(1): 11, 2003.Google ScholarCross Ref
U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning, Proc. of IJCAI, pp. 1022--1029, 1993.Google Scholar
I. Gat-Viks, A. Tanay and R. Shamir. Modeling and analysis of heterogeneousregulation in biological networks. Journal of Computational Biology, 11(6): 1034--49, 2004.Google ScholarCross Ref
P. Glenisson, et al. Evaluation of the vector space representation in text-based gene clustering. In Proc of PSB, pp. 391--402, 2003.Google Scholar
P. Glenisson, et al. TXTGate: profiling gene groups with text-based information. Genome Biology, 5: R43+, 2004.Google Scholar
L. Huiqing, L. Jinyan and W. Limsoon. A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics 13: 51--60, 2002.Google Scholar
D. Hanisch, et al. Playing biology's name game: identifying protein names in scientific text. Proc. of PSB, pp. 403--414, Lihue, Hawaii, 2003.Google Scholar
L. Hirschman, A. A. Morgan and A. S. Yeh. Rutabaga by any other name: extracting biological names. Journal of Biomedica Informatics, 35(4): 247--259, Aug. 2002. Google ScholarDigital Library
V. Kulasingam and E. P. Diamandis. Strategies for discovering novel cancer biomarkers through utilization of emerging technologies. Nature Clinical Practice Oncology. 2008. (10): 588--99.Google Scholar
Y. Lu and J. Han. Cancer classification using gene expression data. Information Systems; 28(4): 243--68, 2003. Google ScholarDigital Library
H. Liu and C. Friedman. Mining terminological knowledge in large biomedical corpora. Proc. of PSB, pp. 415--426, 2003.Google Scholar
S. Novichkova, S. Egorov and N. Daraselia. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics, 19(13): 1699--1706, Sept. 2003.Google ScholarCross Ref
T. Ono, et al. Automated extraction of information on protein - protein interactions from the biological literature.Google Scholar
J. H. Park, et al. Protein Expr. Purif. 22, 60--6, 2001.Google ScholarCross Ref
C. M. Perou, et al. Molecular portraits of human breast tumours. Nature. 2000; 406(6797): 747--52.Google Scholar
S. Raychaudhuri, H. Schutze, and R. B. Altman. Using text analysis to identify functionally coherent gene groups. Genome Research, 12(10): 1582--1590, 2002.Google ScholarCross Ref
T. Sekimizu, H. S. Park, T. Jun'ichi. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. Genome Inform Ser Workshop, 9: 62--71, 1998.Google Scholar
L. Tanabe and W. J. Wilbur. Tagging gene and protein names in biomedical text. Bioinformatics, 18(8): 1124--1132, Aug. 2002.Google ScholarCross Ref
J. P. Vert and M. Kanehisa. Extracting active pathways from gene expression data. Bioinformatics 2003, 19(Suppl 2): II238--II244.Google Scholar
R. Varshavsky, et al. Novel unsupervised feature filtering of biological data. Bioinformatics, 22, e507-e513, 2006. Google ScholarDigital Library
B. Weigelt, et al. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Research. 2005; 65(20): 9155--8.Google Scholar
M. Weeber, et al. Text-based discovery in biomedicine: the architecture of the DAD-system. Proceedings / AMIA... Annual Symposium. AMIA Symposium, pages 903--907, 2000.Google Scholar
H. Xu, et al. Facilitating cancer research using natural language processing of pathology reports. Studies in health technology and informatics, 107(Pt 1): 565--572, 2004.Google Scholar
Y. Xu, Z. Wang, Y. Lei, Y. Zhao, and Y. Xue. Mba: a literature mining system for extracting biomedical abbreviations. BMC Bioinformatics, 10(1): 14, 2009.Google ScholarCross Ref
A. Yakushiji, et al. Event extraction from biomedical papers using a full parser. Proc. of PSB. 6, 408--419 2001.Google Scholar
H. Yu and E. Agichtein. Extracting synonymous gene and protein terms from biological literature. Bioinformatics, 19 Suppl 1(suppl 1): i340--i349, July 2003.Google ScholarCross Ref
H. Yu, et al. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp, pages 919--923, 2002.Google Scholar
G. Zhou, et al. Recognizing names in biomedical texts: a machine learning approach. Bioinformatics, 20(7): 1178--1190, 2004. Google ScholarDigital Library

Index Terms

Combining information extraction and text mining for cancer biomarker detection

Recommendations

Community Based Cancer Biomarker Identification from Gene Co-expression Network
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Finding the biomarkers of cancers and the analysis of cancer-driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Gene expression profiling has been widely used for cancer research, and its patterns,...
Read More
Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification

Motivation: For the early detection of cancer, highly sensitive and specific biomarkers are needed. Particularly, biomarkers in bio-fluids are relatively more useful because those can be used for non-biopsy tests. Although the altered metabolic ...
Read More
Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
August 2013
1558 pages
ISBN:9781450322409
DOI:10.1145/2492517
General Chairs:
Jon Rokne
University of Calgary, Calgary, AB, Canada
,
Christos Faloutsos
Carnegie Mellon University, Pittsburgh, PA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cancer biomarkers
information extraction
knowledge discovery
network analysis
text analysis
text mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate116of549submissions,21%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 133
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining information extraction and text mining for cancer biomarker detection

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Community Based Cancer Biomarker Identification from Gene Co-expression Network

Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification

Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes