skip to main content
10.1145/3378393.3403652acmconferencesArticle/Chapter ViewAbstractPublication PagescompassConference Proceedingsconference-collections
research-article

Extracting Features from Online Forums to Meet Social Needs of Breast Cancer Patients

Authors Info & Claims
Published:01 July 2020Publication History

ABSTRACT

Breast cancer patients go through many ordeals when they undergo treatments. Many of these issues are personal, social, or professional. As many of them are not directly medical in nature, these issues are not discussed with their healthcare providers and hence, not included in their treatment plan. However, these issues are vital for the patients' complete recovery. We present a novel approach that acts as the first step in including such personal and social issues resulting from breast cancer treatment into a patient's treatment plan. There are numerous online forums where patients share their experiences and post questions about their treatments and subsequent side effects. We collected data from one such forum called "Online Breast Cancer Forum". On this forum, users (patients) have created threads across many related topics and shared their experiences and questions. We use these message threads to identify critical issues faced by the patient and how they are related to their treatment. We convert the forum data into a bipartite network and turn the network nodes into a high-dimensional feature space. In this feature space, we perform community detection to unearth latent connections between patients and topics. We claim that these latent connections, along with the known ones, will help to create a new knowledge base that will eventually help physicians to estimate non-medical issues for a prescribed treatment. This new knowledge will help the physicians plan a more adaptive and personalized treatment and be better prepared by anticipating potential problems beforehand. We evaluated our method on two baseline methods and show that our method outperforms the baseline methods by 25% on a manually labeled reference dataset.

References

  1. Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)-Long Papers. 13--22.Google ScholarGoogle Scholar
  2. Eiji Aramaki, Sachiko Maskawa, and Mizuki Morita. 2011. Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1568--1576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Danielle H Bodicoat, Minouk J Schoemaker, Michael E Jones, Emily McFadden, James Griffin, Alan Ashworth, and Anthony J Swerdlow. 2020. Correction to: Timing of pubertal stages and breast cancer risk: the Breakthrough Generations Study. Breast Cancer Research 22, 1 (2020), 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  4. Stacy M Carter, L Claire Hooker, and Heather M Davey. 2009. Writing social determinants into and out of cancer control: an assessment of policy practice. Social science & medicine 68, 8 (2009), 1448--1455.Google ScholarGoogle Scholar
  5. Nitesh V Chawla and Darcy A Davis. 2013. Bringing big data to personalized healthcare: a patient-centered framework. Journal of general internal medicine 28, 3 (2013), 660--665.Google ScholarGoogle ScholarCross RefCross Ref
  6. Immad Dabbura. 2018. K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks. https://towardsdatascience.com/k-means-clustering-algorithm-applications\protect\discretionary{\char\hyphenchar\font){}{}evaluation-methods-and-drawbacks-aa03e644b48a.Google ScholarGoogle Scholar
  7. Habib Dhahri, Eslam Al Maghayreh, Awais Mahmood, Wail Elkilani, and Mohammed Faisal Nagi. 2019. Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms. Journal of Healthcare Engineering 2019 (2019).Google ScholarGoogle Scholar
  8. Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. Bine: Bipartite network embedding. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 715--724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lori J Goldstein, Raymond P Perez, Denise Yardley, Linda K Han, James M Reuben, Hui Gao, Susan McCanna, Beth Butler, Pier Adelchi Ruffini, Yi Liu, et al. 2020. A window-of-opportunity trial of the CXCR1/2 inhibitor reparixin in operable HER-2-negative breast cancer. Breast Cancer Research 22, 1 (2020), 1--9.Google ScholarGoogle Scholar
  10. William B Grant. 2020. Lower vitamin D status may help explain why black women have a higher risk of invasive breast cancer than white women. Breast Cancer Research 1 (2020), 1--2.Google ScholarGoogle Scholar
  11. Jeremy A Greene, Niteesh K Choudhry, Elaine Kilabuk, and William H Shrank. 2011. Online social networking by patients with diabetes: a qualitative evaluation of communication with Facebook. Journal of general internal medicine 26, 3 (2011), 287--292.Google ScholarGoogle ScholarCross RefCross Ref
  12. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert A Hiatt and Nancy Breen. 2008. The social determinants of cancer: a challenge for transdisciplinary science. American journal of preventive medicine 35, 2 (2008), S141-S150.Google ScholarGoogle ScholarCross RefCross Ref
  14. Fang Hu, Jia Liu, Liuhuan Li, and Jun Liang. 2019. Community detection in complex networks using Node2vec with spectral clustering. Physica A: Statistical Mechanics and its Applications (2019), 123633.Google ScholarGoogle Scholar
  15. Keyuan Jiang and Yujing Zheng. 2013. Mining twitter data for potential drug effects. In International conference on advanced data mining and applications. Springer, 434--443.Google ScholarGoogle ScholarCross RefCross Ref
  16. Z. Jin, R. Liu, Q. Li, D. D. Zeng, Y. Zhan, and L. Wang. 2016. Predicting user's multi-interests with network embedding in health-related topics. In 2016 International Joint Conference on Neural Networks (IJCNN). 2568--2575. https://doi.org/10.1109/IJCNN.2016.7727520Google ScholarGoogle ScholarCross RefCross Ref
  17. Josette Jones, Meeta Pradhan, Masoud Hosseini, Anand Kulanthaivel, and Mahmood Hosseini. 2018. Novel Approach to Cluster Patient-Generated Data Into Actionable Topics: Case Study of a Web-Based Breast Cancer Forum. JMIR medical informatics 6, 4, e45.Google ScholarGoogle Scholar
  18. Aditya Joshi, Xiang Dai, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C Raina MacIntyre. 2018. Shot or not: Comparison of NLP approaches for vaccination behaviour detection. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task. 43--47.Google ScholarGoogle ScholarCross RefCross Ref
  19. Mohamad Abdolahi Kharazmi and Morteza Zahedi Kharazmi. 2017. Text coherence new method using word2vec sentence vectors and most likely n-grams. In 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS). IEEE, 105--109.Google ScholarGoogle ScholarCross RefCross Ref
  20. Munui Kim, Seung Han Baek, and Min Song. 2018. Relation extraction for biological pathway construction using node2vec. BMC bioinformatics 19, 8 (2018), 206.Google ScholarGoogle Scholar
  21. Konstantina Kourou, Themis P Exarchos, Konstantinos P Exarchos, Michalis V Karamouzis, and Dimitrios I Fotiadis. 2015. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal 13 (2015), 8--17.Google ScholarGoogle Scholar
  22. Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 530--539.Google ScholarGoogle ScholarCross RefCross Ref
  23. Lu Li, Wei Wang, Shuo Yu, Liangtian Wan, Zhenzhen Xu, and Xiangjie Kong. 2017. A Modified Node2vec Method for Disappearing Link Prediction. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 1232--1235.Google ScholarGoogle Scholar
  24. Youguo Li and Haiyan Wu. 2012. A clustering method based on K-means algorithm. Physics Procedia 25 (2012), 1104--1109.Google ScholarGoogle ScholarCross RefCross Ref
  25. Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge university press.Google ScholarGoogle Scholar
  26. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv.1301.3781 (2013).Google ScholarGoogle Scholar
  27. David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 262--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Francois Modave, Yunpeng Zhao, Janice Krieger, Zhe He, Yi Guo, Jinhai Huo, Mattia Prosperi, and Jiang Bian. 2019. Understanding Perceptions and Attitudes in Breast Cancer Discussions on Twitter. Studies in health technology and informatics 2019 (08 2019). https://doi.org/10.3233/SHTI190435Google ScholarGoogle Scholar
  29. Laura Nyblade, Melissa A Stockton, Kayla Giger, Virginia Bond, Maria L Ekstrand, Roger Mc Lean, Ellen MH Mitchell, E Nelson La Ron, Jaime C Sapag, Taweesap Siraprapasiri, et al. 2019. Stigma in health facilities: why it matters and how we can change it. BMC medicine 17, 1, 25.Google ScholarGoogle Scholar
  30. Jungsik Park and Young Uk Ryu. 2014. Online discourse on fibromyalgia: text-mining to identify clinical distinction and patient concerns. Medical science monitor: international medical journal of experimental and clinical research 20 (2014), 1858.Google ScholarGoogle Scholar
  31. Jiajie Peng, Jiaojiao Guan, and Xuequn Shang. 2019. Predicting Parkinson's disease genes based on node2vec and autoencoder. Frontiers in genetics 10 (2019).Google ScholarGoogle Scholar
  32. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dina A Ragab, Maha Sharkas, Stephen Marshall, and Jinchang Ren. 2019. Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 7 (2019), e6201.Google ScholarGoogle ScholarCross RefCross Ref
  34. Giulio Rossetti, Michele Berlingerio, and Fosca Giannotti. 2011. Scalable link prediction on multidimensional networks. In 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, 979--986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing. 298--307.Google ScholarGoogle ScholarCross RefCross Ref
  36. Li Shen, Laurie R Margolies, Joseph H Rothstein, Eugene Fluder, Russell McBride, and Weiva Sieh. 2019. Deep learning to improve breast cancer detection on screening mammography. Scientific reports 9, 1 (2019), 1--12.Google ScholarGoogle Scholar
  37. Jennifer Y Sheng, Kala Visvanathan, Elissa Thorner, and Antonio C Wolff. 2019. Breast cancer survivorship care beyond local and systemic therapy. The Breast 48 (2019), S103-S109.Google ScholarGoogle ScholarCross RefCross Ref
  38. Dongdong Wang, Nayden G Naydenov, Mikhail G Dozmorov, Jennifer E Koblinski, and Andrei I Ivanov. 2020. Anillin regulates breast cancer cell migration, growth, and metastasis by non-canonical mechanisms involving control of cell stemness and differentiation. Breast Cancer Research 22, 1 (2020), 1--19.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yang Yang, Nitesh Chawla, Yizhou Sun, and Jiawei Hani. 2012. Predicting links in multi-relational and heterogeneous networks. In 2012 IEEE 12th international conference on data mining. IEEE, 755--764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhao Yang, René Algesheimer, and Claudio J Tessone. 2016. A comparative analysis of community detection algorithms on artificial networks. Scientific reports 6 (2016), 30750.Google ScholarGoogle Scholar
  41. Yongcheng Zhan, Ruoran Liu, Qiudan Li, Scott James Leischow, and Daniel Dajun Zeng. 2017. Identifying topics for e-cigarette user-generated contents: a case study from multiple social media platforms. Journal of medical Internet research 19, 1 (2017), e24.Google ScholarGoogle ScholarCross RefCross Ref
  42. Enming Zhang. 2020. A Mixed-method Approach Towards the Understanding of Patient-generated Content on Social Media: A Case Study on Breast Cancer. Manuscript under review.Google ScholarGoogle Scholar
  43. Shaodian Zhang, Edouard Grave, Elizabeth Sklar, and Noémie Elhadad. 2017. Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. Journal of biomedical informatics 69 (2017), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extracting Features from Online Forums to Meet Social Needs of Breast Cancer Patients

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              COMPASS '20: Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies
              June 2020
              359 pages
              ISBN:9781450371292
              DOI:10.1145/3378393

              Copyright © 2020 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate25of50submissions,50%
            • Article Metrics

              • Downloads (Last 12 months)6
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader