Abstract
In recent years, the number of published scientific papers has largely increased. The huge amount of text in scientific papers is flowing relevant information that can lead to significant opportunities for various industries and organizations. Researchers and decision-makers need to analyse published papers to access to relevant information. The use of automatic techniques such as topic modeling becomes a necessary requirement to capture hidden semantic structure in a collection of documents. However, the literature lacks surveys that indicate appropriate topic modeling techniques to analyze a corpus of scientific papers. The aim of this research is to compare and discuss three topic modeling techniques: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and Correlated Topic Model (CTM) applied on a corpus of scientific papers in the field of marketing. Objective and subjective evaluation are performed. The objective evaluation is based on machine learning metrics while the subjective evaluation is based on expert opinion to evaluate the quality of the best topic models retrieved by LSA, LDA and CTM. The obtained results are presented and discussed according to several quality criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arun, R., Suresh, V., Veni Madhavan, C.E., Narasimha Murthy, M.N.: On finding the natural number of topics with latent Dirichlet allocation: some observations. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 391–402. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13657-3_43
Benslama, T., Jallouli, R.: Clustering of social media data and marketing decisions. In: Bach Tobji, M.A., Jallouli, R., Samet, A., Touzani, M., Strat, V.A., Pocatilu, P. (eds.) ICDEc 2020. LNBIP, vol. 395, pp. 53–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64642-4_5
Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural. Inf. Process. Syst. 18, 147 (2006)
Blei, D.M., Lafferty, J.D., et al.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007). https://doi.org/10.1214/07-AOAS114
Cao, J., Xia, T., Li, J., Zhang, Y., Tang, S.: A density-based method for adaptive LDA model selection. Neurocomputing 72(7–9), 1775–1781 (2009). https://doi.org/10.1016/j.neucom.2008.06.011
Chehal, D., Gupta, P., Gulati, P.: Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations. J. Ambient. Intell. Humaniz. Comput. 12(5), 5055–5070 (2020). https://doi.org/10.1007/s12652-020-01956-6
Cho, K.W., Kim, S.Y., Woo, Y.W.: Analysis of women’s health online news articles using topic modeling. Osong Public Health Res. Perspect. 10(3), 158 (2019). https://doi.org/10.24171/j.phrp.2019.10.3.07
Ding, Z., Li, Z., Fan, C.: Building energy savings: analysis of research trends based on text mining. Autom. Constr. 96, 398–410 (2018). https://doi.org/10.1016/j.autcon.2018.10.008
Garbhapu, V., Bodapati, P.: A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data. Indian J. Sci. Technol. 13(44), 4474–4482 (2020)
Goel, D.: A comparative study of NLP topic modeling methods and tools. Int. J. Res. Appl. Sci. Eng. Technol. 7, 1985–1992 (2019). https://doi.org/10.22214/ijraset.2019.6334
Gou, Z., Huo, Z., Liu, Y., Yang, Y.: A method for constructing supervised topic model based on term frequency-inverse topic frequency. Symmetry 11(12), 1486 (2019). https://doi.org/10.3390/sym11121486
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101
Haixia, Y., Baojun, G., Hanlin, S.: Extracting topics of computer science literature with LDA model. Data Anal. Knowl. Discov. 32(11), 20–26 (2016). https://doi.org/10.11925/infotech.1003-3513.2016.11.03
He, J., Hu, Z., Berg-Kirkpatrick, T., Huang, Y., Xing, E.P.: Efficient correlated topic modeling with topic embedding, pp. 225–233 (2017). https://doi.org/10.1145/3097983.3098074
Huang, Y., da Costa, D.A., Zhang, F., Zou, Y.: An empirical study on the issue reports with questions raised during the issue resolving process. Empirical Softw. Eng. 24(2), 718–750 (2019). https://doi.org/10.1007/s10664-018-9636-3
Isichei, F.: F-Measure in BuildAnalytics (2018). https://kantanmt.zendesk.com/hc/en-us/articles/204656689-F-Measure-in-BuildAnalytics
Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2018). https://doi.org/10.1007/s11042-018-6894-4
Jones, T.: A coefficient of determination for probabilistic topic models. arXiv preprint arXiv:1911.11061 (2019)
Kang, H.J., Kim, C., Kang, K.: Analysis of the trends in biochemical research using Latent Dirichlet Allocation (LDA). Processes 7(6), 379 (2019). https://doi.org/10.3390/pr7060379
Kherwa, P., Bansal, P.: A comparative empirical evaluation of topic modeling techniques. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1166, pp. 289–297. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5148-2_26
Lee, J.Y.: Deep learning research trend analysis using text mining. Int. J. Adv. Cult. Technol. 7(4), 295–301 (2019). https://doi.org/10.17703/IJACT.2019.7.4.295
Mohammed, S.H., Al-augby, S.: LSA & LDA topic modeling classification: comparison study on e-books. Indonesian J. Electr. Eng. Comput. Sci. 19(1), 353–362 (2020)
Nyukorong, R.: Conducting market research: an aid to organisational decision making. Eur. Sci. J. 13(10), 1–17 (2017). https://doi.org/10.19044/esj.2017.v13n10p1
Pietsch, A.S., Lessmann, S.: Topic modeling for analyzing open-ended survey responses. J. Bus. Anal. 1(2), 93–116 (2018)
Qomariyah, S., Iriawan, N., Fithriasari, K.: Topic modeling Twitter data using Latent Dirichlet allocation and latent semantic analysis. AIP Conf. Proc. 2194, 020093 (2019). https://doi.org/10.1063/1.5139825
Sehra, S.S., Singh, J., Rai, H.S.: Using latent semantic analysis to identify research trends in OpenStreetMap. ISPRS Int. J. Geo Inf. 6(7), 195 (2017). https://doi.org/10.3390/ijgi6070195
Song, X., Rui, Y., Hu, X.: Pairwise topic model and its application to topic transition and evolution. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 86–95. IEEE (2016). https://doi.org/10.1109/BigData.2016.7840592
Tran, B.X., et al.: Global mapping of interventions to improve the quality of life of patients with cardiovascular diseases during 1990–2018. Health Qual. Life Outcomes 18(1), 1–10 (2020). https://doi.org/10.1186/s12955-020-01507-9
Wolff, P., Ríos, S., Clavijo, D., Graña, M., Carrasco, M.: Methodologically grounded semantic analysis of large volume of Chilean medical literature data applied to the analysis of medical research funding efficiency in Chile. J. Biomed. Semant. 11(1), 1–10 (2020). https://doi.org/10.1186/s13326-020-00226-w
Wood, T.: Precision and Recall (2019). https://deepai.org/machine-learning-glossary-and-terms/precision-and-recall
Yalcinkaya, M., Singh, V.: Patterns and trends in building information modeling (BIM) research: a latent semantic analysis. Autom. Constr. 59, 68–80 (2015). https://doi.org/10.1016/j.autcon.2015.07.012
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chebil, M., Jallouli, R., Bach Tobji, M.A., Ben Ncir, C.E. (2021). Topic Modeling of Marketing Scientific Papers: An Experimental Survey. In: Jallouli, R., Bach Tobji, M.A., Mcheick, H., Piho, G. (eds) Digital Economy. Emerging Technologies and Business Innovation. ICDEc 2021. Lecture Notes in Business Information Processing, vol 431. Springer, Cham. https://doi.org/10.1007/978-3-030-92909-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-92909-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92908-4
Online ISBN: 978-3-030-92909-1
eBook Packages: Computer ScienceComputer Science (R0)