Abstract
Dynamic topic analysis can examine the data from different perspectives and know the distribution of data with different correlation degrees thoroughly. It is a challenge to perform dynamic topic analysis on domain text data due to the smaller semantic differences among subtopics. This paper proposes a method of dynamically constructing topic hierarchy, which uses formal concept analysis (FCA)-based information retrieval (IR) as the technical basis and sememes as the semantic basis to perform hierarchical processing from fine-grained to coarse-grained on Chinese domain text data according to the topics of user’s query. It can meet the user’s need for different scales of the query results, and realize multi-angle inspection of the whole dataset and high-precision retrieval of the query. Taking sememes as formal attributes reduces the size of the concept lattice and expands the application of FCA technology to large-scale text data. The sememe-based word meaning identification (WMI) algorithm and semantic similarity measurement method for long text enable the topic hierarchy to be fine, and the coarse and fine filtering strategy renders the FCA-based method more efficient. Experimental results based on the open dataset show that the method proposed is an efficient and flexible topic-based hierarchical approach.
Similar content being viewed by others
References
Ali CB, Haddad H, Slimani Y (2018) Billingual formal concept analysis for cross-language information retrieval. IEEE/ACS International Conference on Computer Systems & Applications. IEEE
Andrews S (2014) A partial-closure canonicity test to increase the efficiency of cbo-type algorithms. In: International Conference on Conceptual Structures. Springer, Cham. pp. 37–50
Andrews S (2015) A ‘best-of-Breed’approach for designing a fast algorithm for computing fixpoints of Galois connections. Inf Sci 295:33–649
Andrews S, Gibso H, Domdouzis K, Akhgar B (2016) Creating corroborated crisis reports from social media data through formal concept analysis. J Intell Inf Syst 47(2):287–312
Asghari M, Alizadeh S (2015) A new similarity measure by combining formal concept analysis and clustering for case-based reasoning. In: International conference on industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp. 503–513
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bloomfield L (1926) A set of postulates for the science of language. Language 2(3):153–164
Butka P, Low T, Kotzyba M, Haun S, Nürnberger A (2017) Exploration of web search results based on the formal concept analysis. In: Semanitic keyword-based search on structured data sources. Springer, Cham, pp 123–135
Carpineto C, Romano G, Bordoni FU (2004) Exploiting the potential of concept lattices for information retrieval with CREDO. J UCS 10(8):985–1013
Codocedo V, Lykourentzou I, Napoli A (2014) A semantic approach to concept lattice-based information retrieval. Ann Math Artif Intell 72(1):169–195
Dau F, Ducrou J, Eklund P (2008) Concept similarity and related categories in searchsleuth. In: International conference on conceptual structures. Springer, Berlin, Heidelberg. pp. 255–268
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Dong Z, Dong Q (2006) Hownet and the computation of meaning (with cd-rom) world scientific
Duan X, Zhao J, Xu B (2007) Word sense disambiguation through Sememe labeling. In: IJCAI, pp 1594–1599
Eklund P, Ducrou J, Dau F (2012) Concept similarity and related categories in information retrieval using formal concept analysis. Int J Gen Syst 41(8):826–846
Ferrante M, Ferro N, Fuhr N (2021) Towards meaningful statements in IR evaluation: Mapping Evaluation Measures to Interval Scales. IEEE Access 9:136182–136216
Fkih F, Omri MN (2016) IRAFCA: an O (n) information retrieval algorithm based on formal concept analysis. Knowl Inf Syst 48(2):465–491
Formica A (2006) Ontology-based concept similarity in formal concept analysis. Inf Sci 176(18):2624–2641
Formica A (2008) Concept similarity in formal concept analysis: an information content approach. Knowl-Based Syst 21(1):80–87
Formica A (2012) Semantic web search based on rough sets and fuzzy formal concept analysis. Knowl-Based Syst 26:40–47
Ganter B, Wille R (1997) Applied lattice theory: formal concept analysis. In: Grätzer G (ed) In general lattice theory, Birkhäuser
Han M, Zhang X, Yuan X, Jiang J, Yun W, Gao C (2021) A survey on the techniques, applications, and performance of short text semantic similarity. Concurr Comput Pract Exp 33(5):e5971
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. pp 50–57
Ignatov DI (2014) Introduction to formal concept analysis and its applications in information retrieval and related fields. In: Russian Summer School in information retrieval. Springer, Cham. pp. 42–141
Jiang Y, Yang M (2018) Semantic search exploiting formal concept analysis, rough sets, and Wikipedia. Int J Semant Web Inf Syst (IJSWIS) 14(3):99–119
Koester B (2006) FooCA: web information retrieval with formal concept analysis. Verlag Allgemeine Wissenschaft
Li W, Wu Y, Lv X (2016) Improving word vector with prior knowledge in semantic dictionary. In: Natural Language Understanding and Intelligent Applications. Springer, Cham. pp. 461–469
Liu S, Xu J, Ren X (2019) Evaluating semantic rationality of a sentence: a sememe-word-matching neural network based on hownet. In: CCF international conference on natural language processing and Chinese computing. Springer, Cham, pp. 787–800
Napoli A, Laurenço C, Ducournau R (1994) An object-based representation system for organic synthesis planning. Int J Hum Comput Stud 41(1–2):5–32
Nauer E, Toussaint Y (2009) CreChainDo: an iterative and interactive web information retrieval system based on lattices. Int J Gen Syst 38(4):363–378
Negm E, AbdelRahman S, Bahgat R (2017) PREFCA: a portal retrieval engine based on formal concept analysis. Inf Process Manag 53(1):203–222
Neto SM, Zárate LE, Song MA (2018) Handling high dimensionality contexts in formal concept analysis via binary decision diagrams. Inf Sci 429:61–376
Niu Y, Xie R, Liu Z, Sun M (2017) Improved word representation learning with sememes. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (volume 1: long papers), pp 2049-2058
Phyo SS (2022) Content analysis-based documentation and exploration of research articles. Data Technol Appl 56(1):24–43
Qi F, Yang C, Liu Z, Dong Q, Sun M, Dong Z (2019) Openhownet: an open sememe-based lexical knowledge base. arXiv:1901.09957
Qi F, Xie R, Zang Y, Liu Z, Sun M (2021) Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases. Front Comput Sci 15(5):155327
Qian C, Feng F, Wen L, Chua T-S (2021) Conceptualized and contextualized Gaussian embedding. Proc AAAI Conf Artif Intell 35:13683–13691
Sakata W, Shibata T, Tanaka R, Kurohashi S (2019) FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: Proceedings of the 42nd international ACM SIGIR conference on Research and Development in information retrieval, pp 1113-1116
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge, pp 234–265
Shi C, Lai L, Fan J, Bai Y (2016) Similarity model based on CBR and FCA. IEEE/ACIS International Conference on Software Engineering, IEEE Computer Society, pp. 597–603
Trillo R, Po L, Ilarri S, Bergamaschi S, Mena E (2011) Using semantic techniques to access web data. Inf Syst 36(2):117–133
Wan Y, Zou L (2019) An efficient algorithm for decreasing the granularity levels of attributes in formal concept analysis. IEEE Access 7:1029–11040
Wang Y, Zhu L (2020) Research on improved text classification method based on combined weighted model. Concurr Comput Pract Exp 32(6):e5140
Wang Y, Lee JS, Choi IC (2016) Indexing by latent Dirichlet allocation and an ensemble model. J Assoc Inf Sci Technol 67(7):1736–1750
Wang F, Wang N, Cai S, Zhang W (2020) A similarity measure in formal concept analysis containing general semantic information and domain information. IEEE Access 8:5303–75312
Wille R (2009) Restructuring lattice theory: an approach based on hierarchies of concepts. In: International conference on formal concept analysis. Springer, Berlin, Heidelberg. pp. 314–339
Xingyi D, Baoxin W, Ziyue W et al (2019) CJRC: a reliable human-annotated benchmark DataSet for Chinese judicial Reading comprehension. arXiv:1912.09156
Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2020) Semantics-aware BERT for language understanding. Proc AAAI Conf Artif Intell 34(05):9628–9635
Zhendong D, Qiang D (1999) HowNet - Chinese information structure library. http://www.keenage.com/zhiwang/c_zhiwang.html. Accessed 16 May 2019
Availability of data and material
The data sets supporting the paper’s results and the tools used in the experiments are included within the paper and its additional files.
Code availability
Not applicable.
Funding
This work is supported by the National Natural Science Foundation of China (61772152), the Basic Research Project (JCKY2019604C004), in part by the Youth Fund Project of Humanities and Social Sciences Research of the Ministry of Education of China (20YJCZH172).
Author information
Authors and Affiliations
Contributions
All authors conceived the idea of the study; Fugang W. and Wulin Z. collected the data and designed the experiments; Nianbin W. and Fugang W. analyzed the data; Shaobin C. and Fugang W. interpreted the results; Fugang W. wrote the paper; all authors discussed the results and revised the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest/competing interests
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
The Author confirms:
-
that the work described has not been published before;
-
that it is not under consideration for publication elsewhere;
-
that its publication has been approved by all co-authors;
-
that its publication has been approved (tacitly or explicitly) by the responsible authorities at the institution where the work is carried out.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, F., Wang, N., Cai, S. et al. Dynamically constructing semantic topic hierarchy through formal concept analysis. Multimed Tools Appl 82, 7267–7292 (2023). https://doi.org/10.1007/s11042-022-13640-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13640-2