Skip to main content
Log in

Dynamically constructing semantic topic hierarchy through formal concept analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Dynamic topic analysis can examine the data from different perspectives and know the distribution of data with different correlation degrees thoroughly. It is a challenge to perform dynamic topic analysis on domain text data due to the smaller semantic differences among subtopics. This paper proposes a method of dynamically constructing topic hierarchy, which uses formal concept analysis (FCA)-based information retrieval (IR) as the technical basis and sememes as the semantic basis to perform hierarchical processing from fine-grained to coarse-grained on Chinese domain text data according to the topics of user’s query. It can meet the user’s need for different scales of the query results, and realize multi-angle inspection of the whole dataset and high-precision retrieval of the query. Taking sememes as formal attributes reduces the size of the concept lattice and expands the application of FCA technology to large-scale text data. The sememe-based word meaning identification (WMI) algorithm and semantic similarity measurement method for long text enable the topic hierarchy to be fine, and the coarse and fine filtering strategy renders the FCA-based method more efficient. Experimental results based on the open dataset show that the method proposed is an efficient and flexible topic-based hierarchical approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1:
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://pypi.org/project/jieba/

  2. https://sourceforge.net/projects/inclose/

  3. https://radimrehurek.com/gensim/

References

  1. Ali CB, Haddad H, Slimani Y (2018) Billingual formal concept analysis for cross-language information retrieval. IEEE/ACS International Conference on Computer Systems & Applications. IEEE

  2. Andrews S (2014) A partial-closure canonicity test to increase the efficiency of cbo-type algorithms. In: International Conference on Conceptual Structures. Springer, Cham. pp. 37–50

  3. Andrews S (2015) A ‘best-of-Breed’approach for designing a fast algorithm for computing fixpoints of Galois connections. Inf Sci 295:33–649

    Article  Google Scholar 

  4. Andrews S, Gibso H, Domdouzis K, Akhgar B (2016) Creating corroborated crisis reports from social media data through formal concept analysis. J Intell Inf Syst 47(2):287–312

    Article  Google Scholar 

  5. Asghari M, Alizadeh S (2015) A new similarity measure by combining formal concept analysis and clustering for case-based reasoning. In: International conference on industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp. 503–513

  6. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

  7. Bloomfield L (1926) A set of postulates for the science of language. Language 2(3):153–164

    Article  Google Scholar 

  8. Butka P, Low T, Kotzyba M, Haun S, Nürnberger A (2017) Exploration of web search results based on the formal concept analysis. In: Semanitic keyword-based search on structured data sources. Springer, Cham, pp 123–135

    Google Scholar 

  9. Carpineto C, Romano G, Bordoni FU (2004) Exploiting the potential of concept lattices for information retrieval with CREDO. J UCS 10(8):985–1013

    MATH  Google Scholar 

  10. Codocedo V, Lykourentzou I, Napoli A (2014) A semantic approach to concept lattice-based information retrieval. Ann Math Artif Intell 72(1):169–195

  11. Dau F, Ducrou J, Eklund P (2008) Concept similarity and related categories in searchsleuth. In: International conference on conceptual structures. Springer, Berlin, Heidelberg. pp. 255–268

  12. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  13. Dong Z, Dong Q (2006) Hownet and the computation of meaning (with cd-rom) world scientific

  14. Duan X, Zhao J, Xu B (2007) Word sense disambiguation through Sememe labeling. In: IJCAI, pp 1594–1599

  15. Eklund P, Ducrou J, Dau F (2012) Concept similarity and related categories in information retrieval using formal concept analysis. Int J Gen Syst 41(8):826–846

    Article  Google Scholar 

  16. Ferrante M, Ferro N, Fuhr N (2021) Towards meaningful statements in IR evaluation: Mapping Evaluation Measures to Interval Scales. IEEE Access 9:136182–136216

  17. Fkih F, Omri MN (2016) IRAFCA: an O (n) information retrieval algorithm based on formal concept analysis. Knowl Inf Syst 48(2):465–491

    Article  Google Scholar 

  18. Formica A (2006) Ontology-based concept similarity in formal concept analysis. Inf Sci 176(18):2624–2641

  19. Formica A (2008) Concept similarity in formal concept analysis: an information content approach. Knowl-Based Syst 21(1):80–87

    Article  Google Scholar 

  20. Formica A (2012) Semantic web search based on rough sets and fuzzy formal concept analysis. Knowl-Based Syst 26:40–47

    Article  Google Scholar 

  21. Ganter B, Wille R (1997) Applied lattice theory: formal concept analysis. In: Grätzer G (ed) In general lattice theory, Birkhäuser

  22. Han M, Zhang X, Yuan X, Jiang J, Yun W, Gao C (2021) A survey on the techniques, applications, and performance of short text semantic similarity. Concurr Comput Pract Exp 33(5):e5971

    Article  Google Scholar 

  23. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. pp 50–57

  24. Ignatov DI (2014) Introduction to formal concept analysis and its applications in information retrieval and related fields. In: Russian Summer School in information retrieval. Springer, Cham. pp. 42–141

  25. Jiang Y, Yang M (2018) Semantic search exploiting formal concept analysis, rough sets, and Wikipedia. Int J Semant Web Inf Syst (IJSWIS) 14(3):99–119

    Article  Google Scholar 

  26. Koester B (2006) FooCA: web information retrieval with formal concept analysis. Verlag Allgemeine Wissenschaft

    Google Scholar 

  27. Li W, Wu Y, Lv X (2016) Improving word vector with prior knowledge in semantic dictionary. In: Natural Language Understanding and Intelligent Applications. Springer, Cham. pp. 461–469

  28. Liu S, Xu J, Ren X (2019) Evaluating semantic rationality of a sentence: a sememe-word-matching neural network based on hownet. In: CCF international conference on natural language processing and Chinese computing. Springer, Cham, pp. 787–800

  29. Napoli A, Laurenço C, Ducournau R (1994) An object-based representation system for organic synthesis planning. Int J Hum Comput Stud 41(1–2):5–32

    Article  Google Scholar 

  30. Nauer E, Toussaint Y (2009) CreChainDo: an iterative and interactive web information retrieval system based on lattices. Int J Gen Syst 38(4):363–378

    Article  MATH  Google Scholar 

  31. Negm E, AbdelRahman S, Bahgat R (2017) PREFCA: a portal retrieval engine based on formal concept analysis. Inf Process Manag 53(1):203–222

    Article  Google Scholar 

  32. Neto SM, Zárate LE, Song MA (2018) Handling high dimensionality contexts in formal concept analysis via binary decision diagrams. Inf Sci 429:61–376

    Article  MATH  Google Scholar 

  33. Niu Y, Xie R, Liu Z, Sun M (2017) Improved word representation learning with sememes. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (volume 1: long papers), pp 2049-2058

  34. Phyo SS (2022) Content analysis-based documentation and exploration of research articles. Data Technol Appl 56(1):24–43

    Google Scholar 

  35. Qi F, Yang C, Liu Z, Dong Q, Sun M, Dong Z (2019) Openhownet: an open sememe-based lexical knowledge base. arXiv:1901.09957

  36. Qi F, Xie R, Zang Y, Liu Z, Sun M (2021) Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases. Front Comput Sci 15(5):155327

    Article  Google Scholar 

  37. Qian C, Feng F, Wen L, Chua T-S (2021) Conceptualized and contextualized Gaussian embedding. Proc AAAI Conf Artif Intell 35:13683–13691

    Google Scholar 

  38. Sakata W, Shibata T, Tanaka R, Kurohashi S (2019) FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: Proceedings of the 42nd international ACM SIGIR conference on Research and Development in information retrieval, pp 1113-1116

  39. Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge, pp 234–265

    MATH  Google Scholar 

  40. Shi C, Lai L, Fan J, Bai Y (2016) Similarity model based on CBR and FCA. IEEE/ACIS International Conference on Software Engineering, IEEE Computer Society, pp. 597–603

  41. Trillo R, Po L, Ilarri S, Bergamaschi S, Mena E (2011) Using semantic techniques to access web data. Inf Syst 36(2):117–133

    Article  Google Scholar 

  42. Wan Y, Zou L (2019) An efficient algorithm for decreasing the granularity levels of attributes in formal concept analysis. IEEE Access 7:1029–11040

    Google Scholar 

  43. Wang Y, Zhu L (2020) Research on improved text classification method based on combined weighted model. Concurr Comput Pract Exp 32(6):e5140

    Article  Google Scholar 

  44. Wang Y, Lee JS, Choi IC (2016) Indexing by latent Dirichlet allocation and an ensemble model. J Assoc Inf Sci Technol 67(7):1736–1750

    Article  Google Scholar 

  45. Wang F, Wang N, Cai S, Zhang W (2020) A similarity measure in formal concept analysis containing general semantic information and domain information. IEEE Access 8:5303–75312

    Google Scholar 

  46. Wille R (2009) Restructuring lattice theory: an approach based on hierarchies of concepts. In: International conference on formal concept analysis. Springer, Berlin, Heidelberg. pp. 314–339

  47. Xingyi D, Baoxin W, Ziyue W et al (2019) CJRC: a reliable human-annotated benchmark DataSet for Chinese judicial Reading comprehension. arXiv:1912.09156

  48. Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2020) Semantics-aware BERT for language understanding. Proc AAAI Conf Artif Intell 34(05):9628–9635

    Google Scholar 

  49. Zhendong D, Qiang D (1999) HowNet - Chinese information structure library. http://www.keenage.com/zhiwang/c_zhiwang.html. Accessed 16 May 2019

Download references

Availability of data and material

The data sets supporting the paper’s results and the tools used in the experiments are included within the paper and its additional files.

Code availability

Not applicable.

Funding

This work is supported by the National Natural Science Foundation of China (61772152), the Basic Research Project (JCKY2019604C004), in part by the Youth Fund Project of Humanities and Social Sciences Research of the Ministry of Education of China (20YJCZH172).

Author information

Authors and Affiliations

Authors

Contributions

All authors conceived the idea of the study; Fugang W. and Wulin Z. collected the data and designed the experiments; Nianbin W. and Fugang W. analyzed the data; Shaobin C. and Fugang W. interpreted the results; Fugang W. wrote the paper; all authors discussed the results and revised the manuscript.

Corresponding author

Correspondence to Nianbin Wang.

Ethics declarations

Conflicts of interest/competing interests

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

The Author confirms:

  • that the work described has not been published before;

  • that it is not under consideration for publication elsewhere;

  • that its publication has been approved by all co-authors;

  • that its publication has been approved (tacitly or explicitly) by the responsible authorities at the institution where the work is carried out.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Wang, N., Cai, S. et al. Dynamically constructing semantic topic hierarchy through formal concept analysis. Multimed Tools Appl 82, 7267–7292 (2023). https://doi.org/10.1007/s11042-022-13640-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13640-2

Keywords

Navigation