ABSTRACT
The clustering of functionally related genes has been an important task for biologists. With the recent progress of machine learning technology, researchers now have more powerful weapons to identify the structures within a large amount of DNA sequencing data. That allows the research on genes to be conducted in an efficient and scalable way. This paper studies the clustering of functionally related genes and their impact on the development and prognosis of lung cancer with machine learning technologies. The patient data derived from 218 patients are analyzed. We focus on two extreme cases, one case includes patients who survived less than 1 year, and the other case includes patients who survived longer than 5 years. We will investigate how different clustering methods can assist in the visualization of the DNA sequence data of such patients, and how such methods can help us identify the underlying patterns of the DNA sequence data.
- Siegel, Rebecca L., Kimberly D. Miller, and Ahmedin Jemal. "Cancer statistics, 2019." CA: a cancer journal for clinicians 69.1 (2019): 7-34.Google Scholar
- Sperduto, Paul W., T. Jonathan Yang, Kathryn Beal, Hubert Pan, Paul D. Brown, Ananta Bangdiwala, Ryan Shanley "Estimating survival in patients with lung cancer and brain metastases: an update of the graded prognostic assessment for lung cancer using molecular markers (Lung-molGPA)." JAMA oncology 3, no. 6 (2017): 827-831.Google ScholarCross Ref
- Cancer Genome Atlas Research Network. "Comprehensive molecular profiling of lung adenocarcinoma." Nature 511, no. 7511 (2014): 543-550.Google ScholarCross Ref
- VanderLaan, Paul A., Deepa Rangachari, Susan M. Mockus, Vanessa Spotlow, Honey V. Reddi, Joan Malcolm, Mark S. Huberman, Loren J. Joseph, Susumu S. Kobayashi, and Daniel B. Costa. "Mutations in TP53, PIK3CA, PTEN and other genes in EGFR mutated lung cancers: Correlation with clinical outcomes." Lung Cancer 106 (2017): 17-21.Google ScholarCross Ref
- Mann, Amandeep Kaur, and Navneet Kaur. "Review paper on clustering techniques." Global Journal of Computer Science and Technology (2013).Google Scholar
- Cancer Genome Atlas Research Network. "Comprehensive molecular profiling of lung adenocarcinoma." Nature 511, no. 7511 (2014): 543.Google ScholarCross Ref
- Pozo, Christie L. Pratt, Mary Ann A. Morgan, and Jhanelle E. Gray. "Survivorship issues for patients with lung cancer." Cancer Control 21, no. 1 (2014): 40-50.Google ScholarCross Ref
- Wattenberg, Martin, Fernanda Viégas, and Ian Johnson. "How to use t-SNE effectively." Distill 1, no. 10 (2016): e2.Google Scholar
- McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold approximation and projection for dimension reduction." arXiv preprint arXiv:1802.03426 (2018).Google Scholar
- Ghosh, Soumi, and Sanjay Kumar Dubey. "Comparative analysis of k-means and fuzzy c-means algorithms." International Journal of Advanced Computer Science and Applications 4, no. 4 (2013).Google Scholar
- Murtagh, Fionn, and Pedro Contreras. "Algorithms for hierarchical clustering: an overview." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, no. 1 (2012): 86-97.Google ScholarCross Ref
- Borg, Ingwer. Applied multidimensional scaling and unfolding. Springer, 2018.Google Scholar
Recommendations
Finding Clusters and Patterns in Big Data Applications: State-of-the-Art Methods in Clustering Environments
ICCDA '21: Proceedings of the 2021 5th International Conference on Compute and Data AnalysisWith the rapid development of computation power and machine learning algorithms, clustering methods have become a powerful tool to providing insights and detecting structures in datasets. Clustering methods are especially important for big data ...
Enhance explainability of manifold learning
AbstractThe explainability of manifold learning is rarely investigated though there is an urgent need from both AI theory and practice. In this study, we propose a novel degree of locality preservation (DLP) approach to study the interpretability of ...
Density-Based Clustering of Functionally Similar Genes Using Biological Knowledge
Pattern Recognition and Machine IntelligenceAbstractClustering is used to identify natural groups present in the data. It has been applied widely for analyzing gene expression data to discover gene clusters that might be involved in same biological processes. This information is very important for ...
Comments