skip to main content
10.1145/3440943.3444718acmconferencesArticle/Chapter ViewAbstractPublication PagesiceaConference Proceedingsconference-collections
research-article

MiningBreastCancer: Selection of Candidate Gene Associated with Breast Cancer via Comparison between Data Mining of TCGA and Text Mining of PubMed

Published: 27 September 2021 Publication History

Abstract

In 2016, 12,676 new cases of breast cancer were diagnosed among Taiwan women. In 2018 the standardized death rate of breast cancer was 12.5 per 100,000 persons. Previous studies have integrated data and text mining to yield fusion genes, identify genetic factors for breast cancer and select single-gene feature sets for colon cancer discrimination. However, our study is the first to select significantly different expression between breast normal tissue and cancer using TCGA data and biostatistics, excluding know genes using abstracts from PubMed and natural language processing. The top twenty genes for research potential from the selection of Mining-BreastCancer are EML3, ABCB9, GRASP, KANK3, GPR146, ZNF623, CCDC9, ADCY4, DLL1, ADAM33, GRRP1, LRRN4CL, C14orf180, ABCD4, ABCC6P1, PEAR1, FAM43A, C20orf160, KIF21A and PP-FIA3. Few studies for these genes exist, but they hold significantly different expressions between breast cancer and normal tissue, each pathologic tumor and lymph node, or between each pathologic metastasis. These results show that MiningBreastCancer can help scientists select genes for research potential. MiningBreastCancer is available through http://bio.yungyun.com.tw/MiningBreastCancer.aspx.

References

[1]
DeSantis, C. E., Ma, J., Gaudet, M. M., Newman, L. A., Miller, K. D., and Goding Sauer, A. 2019. Breast cancer statistics. CA Cancer J Clin., 69, 438--451.
[2]
2019 Taiwan Health and Welfare Report. 2020.
[3]
Jang, Y. E., Jang, I., Kim, S., Cho, S., Kim, D., and Kim, K. 2020. ChimerDB 4.0: an updated and expanded database of fusion genes. Nucleic Acids Res., 48, D817--D824.
[4]
Jurca, G., Addam, O., Aksac, A., Gao, S., Ozyer, T., and Demetrick, D. 2016. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends. BMC Res Notes, 9, 236.
[5]
Lu, W., Fu, D., Kong, X., Huang, Z., Hwang, M., and Zhu, Y. 2020. FOLFOX treatment response prediction in metastatic or recurrent colorectal cancer patients via machine learning algorithms. Cancer Med., 9, 1419--1429.
[6]
Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., and Sumer, S. O. 2013. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal, 6, pl1.
[7]
Chen, C. C., and Ho, C. L. 2017. StemTextSearch: Stem cell gene database with evidence from abstracts. J Biomed Inform., 69, 150--159.
[8]
Matsuo, Y., and Ishizuka, M. 2003. Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. FLAIRS, 5, 157-169
[9]
Lee, K., Lee, S., Park, S., Kim, S., Kim, S., and Choi, K. 2016. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. Database (Oxford), 2016.
[10]
Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. 2011. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res., 39, D52-D57.
[11]
Wang, L., Niu, C. H., Wu, S., Wu, H. M., Ouyang, F., and He, M. 2016. PBOV1 correlates with progression of ovarian cancer and inhibits proliferation of ovarian cancer cells. Oncol Rep., 35, 488--496.
[12]
Klein, D., and Manning, C. D. 2002. Fast Exact Inference with a Factored Model for Natural Language Parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 8.
[13]
Bjorne, J., Ginter, F., and Salakoski, T. 2012. University of Turku in the BioNLP'11 Shared Task. BMC bioinformatics, 13 Suppl 11, S4.

Index Terms

  1. MiningBreastCancer: Selection of Candidate Gene Associated with Breast Cancer via Comparison between Data Mining of TCGA and Text Mining of PubMed

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ACM ICEA '20: Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications
    December 2020
    219 pages
    ISBN:9781450383042
    DOI:10.1145/3440943
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. PubMed
    2. TCGA
    3. biostatistics
    4. data mining
    5. natural language processing (NLP)
    6. research potential
    7. text mining

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACM ICEA '20
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 33
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media