Skip to main content

Document Classification via Stable Graph Patterns and Conceptual AMR Graphs

  • Conference paper
  • First Online:
Conceptual Knowledge Structures (CONCEPTS 2024)

Abstract

This paper proposes an approach and an associated system based on pattern structures, aimed at the classification of documents represented as graphs. The representation of documents relies on Abstract Meaning Representation (AMR) document graphs. Given a set of AMR document graphs, the system learns characteristic graph patterns, that can be reused by an aggregate rule classifier to predict the class of a document. The selection of the most stable graph patterns is based on the gSOFIA algorithm and the \(\varDelta -\)stability measure. In the experiments, two document datasets are considered for validating the approach. The first includes documents belonging to 10 different newsgroups and the second contains sports news articles belonging to 5 topical areas. The results in terms of the macro-averaged \(F_1\) scores, are quite satisfactory and show that the approach is well-founded and useful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/bjascob/amrlib.

  2. 2.

    https://github.com/bjascob/amrlib-models.

  3. 3.

    https://github.com/AlekseyBuzmakov/FCAPS.

  4. 4.

    https://github.com/ericparakal/stable-AMR-graphs-document-classifier.

References

  1. Banarescu, L., et al.: Abstract meaning representation for sembanking. In: Proceedings of LAW@ACL, pp. 178–186. ACL (2013)

    Google Scholar 

  2. Baxter, J.: (10)dataset text document classification (2020). https://www.kaggle.com/datasets/jensenbaxter/10dataset-text-document-classification

  3. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Revisiting pattern structure projections. In: Baixeries, J., Sacarea, C., Ojeda-Aciego, M. (eds.) ICFCA 2015. LNCS (LNAI), vol. 9113, pp. 200–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19545-2_13

    Chapter  Google Scholar 

  4. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Efficient mining of subsample-stable graph patterns. In: Proceedings of ICDM, pp. 757–762. IEEE (2017)

    Google Scholar 

  5. Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. 24, 305–339 (2005)

    Article  Google Scholar 

  6. Ferré, S., Cellier, P.: Graph-FCA: an extension of formal concept analysis to knowledge graphs. Discret. Appl. Math. 273, 81–102 (2020)

    Article  MathSciNet  Google Scholar 

  7. Ferré, S., Huchard, M., Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Formal concept analysis: from knowledge discovery to knowledge processing. In: Marquis, P., Papini, O., Prade, H. (eds.) A Guided Tour of Artificial Intelligence Research, pp. 411–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-06167-8_13

    Chapter  Google Scholar 

  8. Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse thicket representation for multi-sentence search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS-ConceptStruct 2013. LNCS (LNAI), vol. 7735, pp. 153–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35786-2_12

    Chapter  Google Scholar 

  9. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44583-8_10

    Chapter  Google Scholar 

  10. Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer (1999)

    Google Scholar 

  11. Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of ICML, pp. 377–384. ACM Press (2006)

    Google Scholar 

  12. Kaytoue, M., Codocedo, V., Buzmakov, A., Baixeries, J., Kuznetsov, S.O., Napoli, A.: Pattern structures and concept lattices for data mining and knowledge processing. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 227–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_19

  13. Kuznetsov, S.O., Makhalova, T.P.: On interestingness measures of formal concepts. Inf. Sci. 442–443, 202–219 (2018)

    Article  MathSciNet  Google Scholar 

  14. Kuznetsov, S.O., Parakal, E.G.: Explainable Document Classification via Pattern Structures. In: Proceedings of IITI, pp. 423–434. Springer (2023)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (Workshop Poster) (2013)

    Google Scholar 

  16. Mitchell, T.: 20 newsgroups (1999). http://kdd.ics.uci.edudatabases/20newsgroups/20newsgroups.html

  17. Nijssen, S., Kok, J.N.: The gaston tool for frequent subgraph mining. In: GraBaTs. Electronic Notes in Theoretical Computer Science, vol. 127, pp. 77–87. Elsevier (2004)

    Google Scholar 

  18. Parakal, E.G., Kuznetsov, S.O.: Intrinsically interpretable document classification via concept lattices. In: FCA4AI@IJCAI. CEUR Workshop Proceedings, vol. 3233, pp. 9–22. CEUR-WS.org (2022)

    Google Scholar 

  19. Poelmans, J., Ignatov, D.I., Kuznetsov, S.O., Dedene, G.: Formal concept analysis in knowledge processing: a survey on applications. Expert Syst. Appl. 40(16), 6538–6560 (2013)

    Article  Google Scholar 

  20. Poelmans, J., Kuznetsov, S.O., Ignatov, D.I., Dedene, G.: Formal concept analysis in knowledge processing: a survey on models and techniques. Expert Syst. Appl. 40(16), 6601–6623 (2013)

    Article  Google Scholar 

  21. Sowa, J.F.: Conceptual graphs. In: Handbook of Knowledge Representation, pp. 213–237. Elsevier (2008)

    Google Scholar 

  22. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724. IEEE (2002)

    Google Scholar 

Download references

Acknowledgments

The work of Sergei O. Kuznetsov was supported by the Russian Science Foundation under grant 22-11-00323 and performed at HSE University, Moscow, Russia. Egor Dudyrev and Amedeo Napoli are carrying out this research work as part of the French ANR-21-CE23-0023 SmartFCA Research Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Egor Dudyrev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Parakal, E.G., Dudyrev, E., Kuznetsov, S.O., Napoli, A. (2024). Document Classification via Stable Graph Patterns and Conceptual AMR Graphs. In: Cabrera, I.P., Ferré, S., Obiedkov, S. (eds) Conceptual Knowledge Structures. CONCEPTS 2024. Lecture Notes in Computer Science(), vol 14914. Springer, Cham. https://doi.org/10.1007/978-3-031-67868-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-67868-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-67867-7

  • Online ISBN: 978-3-031-67868-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics