Abstract
This paper proposes an approach and an associated system based on pattern structures, aimed at the classification of documents represented as graphs. The representation of documents relies on Abstract Meaning Representation (AMR) document graphs. Given a set of AMR document graphs, the system learns characteristic graph patterns, that can be reused by an aggregate rule classifier to predict the class of a document. The selection of the most stable graph patterns is based on the gSOFIA algorithm and the \(\varDelta -\)stability measure. In the experiments, two document datasets are considered for validating the approach. The first includes documents belonging to 10 different newsgroups and the second contains sports news articles belonging to 5 topical areas. The results in terms of the macro-averaged \(F_1\) scores, are quite satisfactory and show that the approach is well-founded and useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banarescu, L., et al.: Abstract meaning representation for sembanking. In: Proceedings of LAW@ACL, pp. 178–186. ACL (2013)
Baxter, J.: (10)dataset text document classification (2020). https://www.kaggle.com/datasets/jensenbaxter/10dataset-text-document-classification
Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Revisiting pattern structure projections. In: Baixeries, J., Sacarea, C., Ojeda-Aciego, M. (eds.) ICFCA 2015. LNCS (LNAI), vol. 9113, pp. 200–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19545-2_13
Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Efficient mining of subsample-stable graph patterns. In: Proceedings of ICDM, pp. 757–762. IEEE (2017)
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. 24, 305–339 (2005)
Ferré, S., Cellier, P.: Graph-FCA: an extension of formal concept analysis to knowledge graphs. Discret. Appl. Math. 273, 81–102 (2020)
Ferré, S., Huchard, M., Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Formal concept analysis: from knowledge discovery to knowledge processing. In: Marquis, P., Papini, O., Prade, H. (eds.) A Guided Tour of Artificial Intelligence Research, pp. 411–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-06167-8_13
Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse thicket representation for multi-sentence search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS-ConceptStruct 2013. LNCS (LNAI), vol. 7735, pp. 153–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35786-2_12
Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44583-8_10
Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer (1999)
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of ICML, pp. 377–384. ACM Press (2006)
Kaytoue, M., Codocedo, V., Buzmakov, A., Baixeries, J., Kuznetsov, S.O., Napoli, A.: Pattern structures and concept lattices for data mining and knowledge processing. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 227–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_19
Kuznetsov, S.O., Makhalova, T.P.: On interestingness measures of formal concepts. Inf. Sci. 442–443, 202–219 (2018)
Kuznetsov, S.O., Parakal, E.G.: Explainable Document Classification via Pattern Structures. In: Proceedings of IITI, pp. 423–434. Springer (2023)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (Workshop Poster) (2013)
Mitchell, T.: 20 newsgroups (1999). http://kdd.ics.uci.edudatabases/20newsgroups/20newsgroups.html
Nijssen, S., Kok, J.N.: The gaston tool for frequent subgraph mining. In: GraBaTs. Electronic Notes in Theoretical Computer Science, vol. 127, pp. 77–87. Elsevier (2004)
Parakal, E.G., Kuznetsov, S.O.: Intrinsically interpretable document classification via concept lattices. In: FCA4AI@IJCAI. CEUR Workshop Proceedings, vol. 3233, pp. 9–22. CEUR-WS.org (2022)
Poelmans, J., Ignatov, D.I., Kuznetsov, S.O., Dedene, G.: Formal concept analysis in knowledge processing: a survey on applications. Expert Syst. Appl. 40(16), 6538–6560 (2013)
Poelmans, J., Kuznetsov, S.O., Ignatov, D.I., Dedene, G.: Formal concept analysis in knowledge processing: a survey on models and techniques. Expert Syst. Appl. 40(16), 6601–6623 (2013)
Sowa, J.F.: Conceptual graphs. In: Handbook of Knowledge Representation, pp. 213–237. Elsevier (2008)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724. IEEE (2002)
Acknowledgments
The work of Sergei O. Kuznetsov was supported by the Russian Science Foundation under grant 22-11-00323 and performed at HSE University, Moscow, Russia. Egor Dudyrev and Amedeo Napoli are carrying out this research work as part of the French ANR-21-CE23-0023 SmartFCA Research Project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Parakal, E.G., Dudyrev, E., Kuznetsov, S.O., Napoli, A. (2024). Document Classification via Stable Graph Patterns and Conceptual AMR Graphs. In: Cabrera, I.P., Ferré, S., Obiedkov, S. (eds) Conceptual Knowledge Structures. CONCEPTS 2024. Lecture Notes in Computer Science(), vol 14914. Springer, Cham. https://doi.org/10.1007/978-3-031-67868-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-67868-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-67867-7
Online ISBN: 978-3-031-67868-4
eBook Packages: Computer ScienceComputer Science (R0)