ABSTRACT
Breast cancer patients go through many ordeals when they undergo treatments. Many of these issues are personal, social, or professional. As many of them are not directly medical in nature, these issues are not discussed with their healthcare providers and hence, not included in their treatment plan. However, these issues are vital for the patients' complete recovery. We present a novel approach that acts as the first step in including such personal and social issues resulting from breast cancer treatment into a patient's treatment plan. There are numerous online forums where patients share their experiences and post questions about their treatments and subsequent side effects. We collected data from one such forum called "Online Breast Cancer Forum". On this forum, users (patients) have created threads across many related topics and shared their experiences and questions. We use these message threads to identify critical issues faced by the patient and how they are related to their treatment. We convert the forum data into a bipartite network and turn the network nodes into a high-dimensional feature space. In this feature space, we perform community detection to unearth latent connections between patients and topics. We claim that these latent connections, along with the known ones, will help to create a new knowledge base that will eventually help physicians to estimate non-medical issues for a prescribed treatment. This new knowledge will help the physicians plan a more adaptive and personalized treatment and be better prepared by anticipating potential problems beforehand. We evaluated our method on two baseline methods and show that our method outperforms the baseline methods by 25% on a manually labeled reference dataset.
- Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)-Long Papers. 13--22.Google Scholar
- Eiji Aramaki, Sachiko Maskawa, and Mizuki Morita. 2011. Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1568--1576.Google ScholarDigital Library
- Danielle H Bodicoat, Minouk J Schoemaker, Michael E Jones, Emily McFadden, James Griffin, Alan Ashworth, and Anthony J Swerdlow. 2020. Correction to: Timing of pubertal stages and breast cancer risk: the Breakthrough Generations Study. Breast Cancer Research 22, 1 (2020), 1--2.Google ScholarCross Ref
- Stacy M Carter, L Claire Hooker, and Heather M Davey. 2009. Writing social determinants into and out of cancer control: an assessment of policy practice. Social science & medicine 68, 8 (2009), 1448--1455.Google Scholar
- Nitesh V Chawla and Darcy A Davis. 2013. Bringing big data to personalized healthcare: a patient-centered framework. Journal of general internal medicine 28, 3 (2013), 660--665.Google ScholarCross Ref
- Immad Dabbura. 2018. K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks. https://towardsdatascience.com/k-means-clustering-algorithm-applications\protect\discretionary{\char\hyphenchar\font){}{}evaluation-methods-and-drawbacks-aa03e644b48a.Google Scholar
- Habib Dhahri, Eslam Al Maghayreh, Awais Mahmood, Wail Elkilani, and Mohammed Faisal Nagi. 2019. Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms. Journal of Healthcare Engineering 2019 (2019).Google Scholar
- Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. Bine: Bipartite network embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 715--724.Google ScholarDigital Library
- Lori J Goldstein, Raymond P Perez, Denise Yardley, Linda K Han, James M Reuben, Hui Gao, Susan McCanna, Beth Butler, Pier Adelchi Ruffini, Yi Liu, et al. 2020. A window-of-opportunity trial of the CXCR1/2 inhibitor reparixin in operable HER-2-negative breast cancer. Breast Cancer Research 22, 1 (2020), 1--9.Google Scholar
- William B Grant. 2020. Lower vitamin D status may help explain why black women have a higher risk of invasive breast cancer than white women. Breast Cancer Research 1 (2020), 1--2.Google Scholar
- Jeremy A Greene, Niteesh K Choudhry, Elaine Kilabuk, and William H Shrank. 2011. Online social networking by patients with diabetes: a qualitative evaluation of communication with Facebook. Journal of general internal medicine 26, 3 (2011), 287--292.Google ScholarCross Ref
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.Google ScholarDigital Library
- Robert A Hiatt and Nancy Breen. 2008. The social determinants of cancer: a challenge for transdisciplinary science. American journal of preventive medicine 35, 2 (2008), S141-S150.Google ScholarCross Ref
- Fang Hu, Jia Liu, Liuhuan Li, and Jun Liang. 2019. Community detection in complex networks using Node2vec with spectral clustering. Physica A: Statistical Mechanics and its Applications (2019), 123633.Google Scholar
- Keyuan Jiang and Yujing Zheng. 2013. Mining twitter data for potential drug effects. In International conference on advanced data mining and applications. Springer, 434--443.Google ScholarCross Ref
- Z. Jin, R. Liu, Q. Li, D. D. Zeng, Y. Zhan, and L. Wang. 2016. Predicting user's multi-interests with network embedding in health-related topics. In 2016 International Joint Conference on Neural Networks (IJCNN). 2568--2575. https://doi.org/10.1109/IJCNN.2016.7727520Google ScholarCross Ref
- Josette Jones, Meeta Pradhan, Masoud Hosseini, Anand Kulanthaivel, and Mahmood Hosseini. 2018. Novel Approach to Cluster Patient-Generated Data Into Actionable Topics: Case Study of a Web-Based Breast Cancer Forum. JMIR medical informatics 6, 4, e45.Google Scholar
- Aditya Joshi, Xiang Dai, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C Raina MacIntyre. 2018. Shot or not: Comparison of NLP approaches for vaccination behaviour detection. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task. 43--47.Google ScholarCross Ref
- Mohamad Abdolahi Kharazmi and Morteza Zahedi Kharazmi. 2017. Text coherence new method using word2vec sentence vectors and most likely n-grams. In 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS). IEEE, 105--109.Google ScholarCross Ref
- Munui Kim, Seung Han Baek, and Min Song. 2018. Relation extraction for biological pathway construction using node2vec. BMC bioinformatics 19, 8 (2018), 206.Google Scholar
- Konstantina Kourou, Themis P Exarchos, Konstantinos P Exarchos, Michalis V Karamouzis, and Dimitrios I Fotiadis. 2015. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal 13 (2015), 8--17.Google Scholar
- Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 530--539.Google ScholarCross Ref
- Lu Li, Wei Wang, Shuo Yu, Liangtian Wan, Zhenzhen Xu, and Xiangjie Kong. 2017. A Modified Node2vec Method for Disappearing Link Prediction. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 1232--1235.Google Scholar
- Youguo Li and Haiyan Wu. 2012. A clustering method based on K-means algorithm. Physics Procedia 25 (2012), 1104--1109.Google ScholarCross Ref
- Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge university press.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv.1301.3781 (2013).Google Scholar
- David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 262--272.Google ScholarDigital Library
- Francois Modave, Yunpeng Zhao, Janice Krieger, Zhe He, Yi Guo, Jinhai Huo, Mattia Prosperi, and Jiang Bian. 2019. Understanding Perceptions and Attitudes in Breast Cancer Discussions on Twitter. Studies in health technology and informatics 2019 (08 2019). https://doi.org/10.3233/SHTI190435Google Scholar
- Laura Nyblade, Melissa A Stockton, Kayla Giger, Virginia Bond, Maria L Ekstrand, Roger Mc Lean, Ellen MH Mitchell, E Nelson La Ron, Jaime C Sapag, Taweesap Siraprapasiri, et al. 2019. Stigma in health facilities: why it matters and how we can change it. BMC medicine 17, 1, 25.Google Scholar
- Jungsik Park and Young Uk Ryu. 2014. Online discourse on fibromyalgia: text-mining to identify clinical distinction and patient concerns. Medical science monitor: international medical journal of experimental and clinical research 20 (2014), 1858.Google Scholar
- Jiajie Peng, Jiaojiao Guan, and Xuequn Shang. 2019. Predicting Parkinson's disease genes based on node2vec and autoencoder. Frontiers in genetics 10 (2019).Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.Google ScholarDigital Library
- Dina A Ragab, Maha Sharkas, Stephen Marshall, and Jinchang Ren. 2019. Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 7 (2019), e6201.Google ScholarCross Ref
- Giulio Rossetti, Michele Berlingerio, and Fosca Giannotti. 2011. Scalable link prediction on multidimensional networks. In 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, 979--986.Google ScholarDigital Library
- Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing. 298--307.Google ScholarCross Ref
- Li Shen, Laurie R Margolies, Joseph H Rothstein, Eugene Fluder, Russell McBride, and Weiva Sieh. 2019. Deep learning to improve breast cancer detection on screening mammography. Scientific reports 9, 1 (2019), 1--12.Google Scholar
- Jennifer Y Sheng, Kala Visvanathan, Elissa Thorner, and Antonio C Wolff. 2019. Breast cancer survivorship care beyond local and systemic therapy. The Breast 48 (2019), S103-S109.Google ScholarCross Ref
- Dongdong Wang, Nayden G Naydenov, Mikhail G Dozmorov, Jennifer E Koblinski, and Andrei I Ivanov. 2020. Anillin regulates breast cancer cell migration, growth, and metastasis by non-canonical mechanisms involving control of cell stemness and differentiation. Breast Cancer Research 22, 1 (2020), 1--19.Google ScholarCross Ref
- Yang Yang, Nitesh Chawla, Yizhou Sun, and Jiawei Hani. 2012. Predicting links in multi-relational and heterogeneous networks. In 2012 IEEE 12th international conference on data mining. IEEE, 755--764.Google ScholarDigital Library
- Zhao Yang, René Algesheimer, and Claudio J Tessone. 2016. A comparative analysis of community detection algorithms on artificial networks. Scientific reports 6 (2016), 30750.Google Scholar
- Yongcheng Zhan, Ruoran Liu, Qiudan Li, Scott James Leischow, and Daniel Dajun Zeng. 2017. Identifying topics for e-cigarette user-generated contents: a case study from multiple social media platforms. Journal of medical Internet research 19, 1 (2017), e24.Google ScholarCross Ref
- Enming Zhang. 2020. A Mixed-method Approach Towards the Understanding of Patient-generated Content on Social Media: A Case Study on Breast Cancer. Manuscript under review.Google Scholar
- Shaodian Zhang, Edouard Grave, Elizabeth Sklar, and Noémie Elhadad. 2017. Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. Journal of biomedical informatics 69 (2017), 1--9.Google ScholarDigital Library
Index Terms
- Extracting Features from Online Forums to Meet Social Needs of Breast Cancer Patients
Recommendations
Survival of Breast Cancer Patients in Several Hospitals of Makassar City 2012-2016
ICHSM '18: Proceedings of the International Conference on Healthcare Service Management 2018The aim of this study to determine survival probability of breast cancer patients with survival analysis and factors related to survival of breast cancer patients such as: tumor size, clinical stage, metastasis history, comorbidities, age and therapy ...
Survival patients with pulmonary metastases in breast cancer neoplasia
MCBC'09: Proceedings of the 10th WSEAS international conference on Mathematics and computers in biology and chemistryBreast cancer is one of the most frequent neoplasia in women (27% of the total types of cancer). It represents the second cause of death in the USA after pulmonary cancer.
We conducted a survey from January 2000 to December 2005 on 120 patients admitted ...
Identifying Symptom Clusters in Breast Cancer and Colorectal Cancer Patients using EHR Data
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsPatients with chronic conditions such as breast cancer and colorectal cancer often present with different symptoms, such 'fatigue', 'pain' and 'depression'. These symptoms add to patients' distress and functional impairment if left untreated. In this ...
Comments