Skip to main content
Log in

Frontier knowledge discovery and visualization in cancer field based on KOS and LDA

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scientific research journals have achieved the latest development in scientific research in various fields. However, the interpretation and use of biomedical information is still a very complicated issue. How to use practical methods to interpret biomedical literature into structured data and analyze it into what we can understand has become a major issue. In this paper, a frontier knowledge discovery model based on KOS and LDA is proposed and applied in detecting burst topic and its sematic information relationship in cancer field. Experiments showed that the model plays an important role in topic recognition, evolution recognition and visualization. Furthermore, the application of KOS combined with LDA can effectively remove noisy concept in sematic layer and show a good effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • AlSumait, L., Barbara, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, & X. D. Wu (Eds.), ICDM 2008: Eighth IEEE international conference on data mining, proceedings (pp. 3–12, IEEE international conference on data mining).

  • Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229–236. https://doi.org/10.1136/jamia.2009.002733.

    Article  Google Scholar 

  • Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2012). On smoothing and inference for topic models. UAI 2009, abs/1205.2662, 27-34. https://arxiv.org/abs/1205.2662v1.

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  • Bleeker, F. E., Molenaar, R. J., & Sieger, L. (2012). Recent advances in the molecular understanding of glioblastoma. Journal of Neuro-oncology, 108(1), 11.

    Article  Google Scholar 

  • Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. In International conference on neural information processing systems, 2005 (pp. 147–154).

  • Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In International conference, 2006 (pp. 113–120).

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4–5), 993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993.

    Article  MATH  Google Scholar 

  • Bodenreider, O. (2004). The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–D270. https://doi.org/10.1093/nar/gkh061.

    Article  Google Scholar 

  • Buckland, M., Chen, A., Chen, H. M., Kim, Y., Lam, B., Larson, R., et al. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D-Lib Magazine. http://www.dlib.org/dlib/january99/buckland/01buckland.html.

  • Cao, L., & Zheng, C. (2010). An Improved Algorithm for Semantic Similarity Based on HowNet. Electronic Technology, 47, 1–3.

    Google Scholar 

  • Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7), 1775–1781.

    Article  Google Scholar 

  • Chen, L. (2010). The analysis of research frontier and hot topics about knowledge discovery (KD) based on mapping knowledge domain. In Wase international conference on information engineering, 2010 (pp. 28–32).

  • Chen, Y. H., Lin, Y. J., & Zuo, W. L. (2017). Phrase-based topic and sentiment detection and tracking model using incremental HDP. KSII Transactions on Internet and Information Systems, 11(12), 5905–5926. https://doi.org/10.3837/tiis.2017.12.012.

    Article  Google Scholar 

  • Chen, Y. N., Liu, L. Z., & IEEE. (2016). Development and research of topic detection and tracking. In Proceedings of 2016 IEEE 7th international conference on software engineering and service science. International conference on software engineering and service science (pp. 170–173). New York: IEEE.

  • Collaborators, G. D. (2017). Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet, 390(10100), 1260.

    Article  Google Scholar 

  • Dancey, J. E., Dodd, L. E., Ford, R., Kaplan, R., Mooney, M., Rubinstein, L., et al. (2009). Recommendations for the assessment of progression in randomised cancer treatment trials. European Journal of Cancer, 45(2), 281–289. https://doi.org/10.1016/j.ejca.2008.10.042.

    Article  Google Scholar 

  • Daura-Oller, E., Cabre, M., Montero, M. A., Paternain, J. L., & Romeu, A. (2009). Specific gene hypomethylation and cancer: New insights into coding region feature trends. Bioinformation, 3(8), 340–343.

    Article  Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6%3c391:aid-asi1%3e3.0.co;2-9.

    Article  Google Scholar 

  • Ding, W. Y., Zhang, Y., Chen, C. M., & Hu, X. H. (2016). Semi-supervised DirichletHawkes process with applications of topic detection and tracking in twitter (2016 IEEE international conference on big data). New York: IEEE.

  • Fan, S. P., Xin-Ying, A. N., & Zhao, Y. G. (2016). Framework for multidimensional feature recognition-based studies on frontier knowledge discovery in medical field. Chinese Journal of Medical Library and Information Science, 25, 1–7.

    Google Scholar 

  • Griffiths, T. (2007). Gibbs sampling in the generative model of latent Dirichlet allocation. Standford: Standford University.

    Google Scholar 

  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101, 5228–5235. https://doi.org/10.1073/pnas.0307752101.

    Article  Google Scholar 

  • Haixia, S., Qing, Q., Yingjie, W., & Lian, L. J. (2010). Research on semantic similarity measuring of MeSH. New Technology of Library and Information Service, 26(6), 12–16.

    Google Scholar 

  • Hofmann, T. (1999). Probabilistic latent semantic indexing (Sigir’99: Proceedings of 22nd international conference on research and development in information retrieval).

  • Hong, Y., Zhang, Y., Liu, T., & Li, S. (2007). Evaluation and research of topic detection and tracking. Journal of Chinese Information Processing, 21(6), 71–87.

    Google Scholar 

  • Hu, Z. Y., Fang, S., & Liang, T. (2014). Empirical study of constructing a knowledge organization system of patent documents using topic modeling. Scientometrics, 100(3), 787–799. https://doi.org/10.1007/s11192-014-1328-1.

    Article  Google Scholar 

  • Humphreys, B. L. (1988). Unified medical language system: Progress report. International Classification, 15, 85–86.

    Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694.

    Article  MathSciNet  MATH  Google Scholar 

  • Lei, G. (2017). Visualization of topic discovery and evolution based on LDA. Modern Computer, 7, 42–44.

    Google Scholar 

  • Li, H. J., Cheng, P., & Xie, H. Y. (2017). Text Visualization and LDA Model Based on R Language. In L. Zhu, & T. Zheng (Eds.), Proceedings Of the 2017 2nd International Conference on Machinery, Electronics And Control Simulation (Vol. 138, pp. 516-519, AER-Advances in Engineering Research). Paris: Atlantis Press.

  • Li, G., Jiang, S., Zhang, W., Pang, J., & Huang, Q. (2016). Online web video topic detection and tracking with semi-supervised learning. Multimedia Systems, 22(1), 115–125.

    Article  Google Scholar 

  • Lindberg, D. A. H., & Humphreys, B. L. (1987). Toward a unified medical language. In European federation for medical informatics, Rome, Italy, 1987 September 2125, 1987 (pp. 23–31).

  • Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265–266.

    Google Scholar 

  • Mayr, P., Tudhope, D., Clarke, S. D., Zeng, M. L., & Lin, X. (2016). Recent applications of Knowledge Organization Systems: introduction to a special issue. International Journal of Digital Library Systems, 17(1), 1–4. https://doi.org/10.1007/s00799-015-0167-x

    Article  Google Scholar 

  • Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology, 6, 1–12.

    Google Scholar 

  • Pedersen, T., Patwardhan, S., & Michelizzi, J. WordNet: Similarity—measuring the relatedness of concepts. In National conference on artificial intelligence, 2004 (pp. 1024–1025).

  • Rau, P. L. P. (2005). Book review: The craft of information visualization: Readings and reflections by B. B. Bederson and B. Shneiderman. International Journal of Human–Computer Interaction, 18(1), 129–130.

    Article  Google Scholar 

  • Rindflesch, T. C., & Fiszman, M. (2003). The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6), 462–477. https://doi.org/10.1016/j.jbi.2003.11.003.

    Article  Google Scholar 

  • Scibor, E., & Tomasikbeck, J. (1994). On the establishment of concordances between indexing languages of universal or interdisciplinary scope (Polish experiences). Knowledge Organization, 21(4), 203–212.

    Google Scholar 

  • Shaoping, F., Xinying, A., & Wanhui, L. (2017). The study on method for topic semantic similarity based on medical literature. Library and Information Service, 8, 96–105.

    Google Scholar 

  • Wake, S., & Nicholson, D. (2001). HILT: High-level thesaurus project. Building consensus for interoperable subject access across communities. D-Lib Magazine. https://doi.org/10.1045/september2001-wake.

    Article  Google Scholar 

  • Wang, C., Blei, D., & Heckerman, D. (2012). Continuous time dynamic topic models. Uaiabs/1206.3298, 579–586. https://arxiv.org/abs/1206.3298.

  • Wang, M., Jayaraman, P. P., Solaiman, E., Chen, L. Y., Li, Z., Jun, S., et al. (2018). A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications. Future Generation Computer Systems-the International Journal of Escience, 87, 580–590. https://doi.org/10.1016/j.future.2018.01.047.

    Article  Google Scholar 

  • Wang, X., & Mccallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. In ACM SIGKDD international conference on knowledge discovery and data mining, 2006 (pp. 424–433).

  • WP12, C. (2000). Cross concordances of classifications and thesauri. http://www.bibliothek.uni-regensburg.de/projects/carmen12/index.html.

  • Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014a). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.

    Article  Google Scholar 

  • Wu, Q. Q., Zhang, H. B., & Lan, J. (2015). K-State automaton burst detection model based on KOS: Emerging trends in cancer field. Journal of Information Science, 41(1), 16–26. https://doi.org/10.1177/0165551514551500.

    Article  Google Scholar 

  • Wu, Q. Q., Zheng, Y., She, Y., & An, X. (2014b). Emerging topic detection model based on LDA and its application in stem cell field. In IEEE international conference on computational science and engineering, 2014 (pp. 1939–1944).

  • Xiang, Q., Yu, H., Ziyan, C., Xiaoyan, L., Jing, T., Tinglei, H., et al. (2014). BURST-LDA: A new topic model for detecting bursty topics from stream text. Journal of Electronics (China), 6, 565–575.

    Google Scholar 

  • Xiaohui, Q., & Xiaoqiu, L. (2015). Topic evolution research on a certain field based on LDA topic association filter. New Technology of Library and Information Service, 31(3), 18–25.

    Google Scholar 

  • Young, R. M., Jamshidi, A., Davis, G., & Sherman, J. H. (2015). Current trends in the surgical management and treatment of adult glioblastoma. Annals of Translational Medicine, 3(9), 121. https://doi.org/10.3978/j.issn.2305-5839.2015.05.10.

    Article  Google Scholar 

  • Zeng, M. L. (2010). Knowledge organization systems (KOS). Proceedings of the American Society for Information Science and Technology, 44(1), 1–3.

    Article  Google Scholar 

  • Zeng, M. L., & Chan, L. M. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the Association for Information Science and Technology, 55(5), 377–395.

    Google Scholar 

  • Zheng, R., Zhao, H., & Zhang, X. (2015). A word similarity algorithm with sememe probability density ratio based on HowNet. International Journal of Hybrid Information Technology, 8, 417–426.

    Article  Google Scholar 

Download references

Acknowledgements

The project is supported by the National Natural Science Foundation of China (Grant No. 61502402), the Fundamental Research Funds for the Central Universities (Grant No. 20720180073), the state key laboratory of virtual reality technology and systems of China (Grant No. BUAA-VR-15 KF-09) and the Xiamen University (Grant No. 20720150081).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingying She.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Q., Kuang, Y., Hong, Q. et al. Frontier knowledge discovery and visualization in cancer field based on KOS and LDA. Scientometrics 118, 979–1010 (2019). https://doi.org/10.1007/s11192-018-2989-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2989-y

Keywords

Navigation