DaFNeGE: Dataset of French Newsletters with Graph Representation and Embedding

Blandin, Alexis; Saïd, Farida; Villaneau, Jeanne; Marteau, Pierre-François

doi:10.1007/978-3-031-16270-1_2

Alexis Blandin ORCID: orcid.org/0000-0003-0886-9598^11,12,
Farida Saïd¹³,
Jeanne Villaneau¹² &
…
Pierre-François Marteau¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

Abstract

Natural language resources are essential for integrating linguistic engineering components into information processing suites. However, the resources available in French are scarce and do not cover all possible tasks, especially for specific business applications. In this context, we present a dataset of French newsletters and their use to predict their impact, good or bad, on readers. We propose an original representation of newsletters in the form of graphs that take into account the layout of the newsletters. We then evaluate the interest of such a representation in predicting a newsletter’s performance in terms of open and click rates using graph convolution network models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semi-supervised Classification Based on Graph Convolution Encoder Representations from BERT

Graph Neural Networks in Natural Language Processing

Embedding text-rich graph neural networks with sequence and topical semantic structures

Article 17 October 2022

Notes

1.
Kosmopolead is a UNEEK’s trademark offering services such as CRM (https://www.kosmopolead.com/).
2.
With K set to 5, which here represents the number of colors to detect. It is rare to find more than 5 colors in the same portion of the image, and if it is the case, we only focus here on the dominant color.
3.
As defined in [25].

References

Abdaoui, A., Azé, J., Bringay, S., Poncelet, P.: FEEL: a French expanded emotion Lexicon. Lang. Resources Eval. 51(3), 833–855 (2017). https://doi.org/10.1007/s10579-016-9364-5. https://hal-lirmm.ccsd.cnrs.fr/lirmm-01348016
Blandin, A., Saïd, F., Villaneau, J., Marteau, P.F.: Automatic emotions analysis for french email campaigns optimization. In: CENTRIC 2021, Barcelone, Spain, October 2021. https://hal.archives-ouvertes.fr/hal-03424725
Bonfrer, A., Drèze, X.: Real-time evaluation of e-mail campaign performance. Marketing Science (2009)
Google Scholar
d’Hoffschmidt, M., Belblidia, W., Brendlé, T., Heinrich, Q., Vidal, M.: Fquad: French question answering dataset (2020)
Google Scholar
Duvenaud, D., et al.: Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292 (2015)
Ekman, P.: Basic Emotions, chap. 3, pp. 45–60. John Wiley and Sons, Ltd (1999). https://doi.org/10.1002/0470013494.ch3. https://onlinelibrary.wiley.com/doi/abs/10.1002/0470013494.ch3
Guenoune, H., Cousot, K., Lafourcade, M., Mekaoui, M., Lopez, C.: A dataset for anaphora analysis in French emails. In: Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 165–175. Association for Computational Linguistics, Barcelona, Spain (online), December 2020. https://aclanthology.org/2020.crac-1.17
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017), to appear
Google Scholar
Ipsen, N., Mattei, P.A., Frellsen, J.: How to deal with missing data in supervised deep learning? In: ICML Workshop on the Art of Learning with Missing Values (Artemiss) (2020)
Google Scholar
Kalitvianski, R.: Traitements formels et sémantiques des échanges et des documents textuels liés à des activités collaboratives. Theses, Université Grenoble Alpes, March 2018. https://tel.archives-ouvertes.fr/tel-01893348
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Chapter Google Scholar
Kumar, A.: An empirical examination of the effects of design elements of email newsletters on consumers’ email responses and their purchase. J. Retailing Consumer Serv. 58, 102349 (2021). https://doi.org/10.1016/j.jretconser.2020.102349. https://www.sciencedirect.com/science/article/pii/S0969698920313576
Loria, S.: textblob documentation. Release 0.15 2 (2018)
Google Scholar
Mandivarapu, J.K., Bunch, E., You, Q., Fung, G.: Efficient document image classification using region-based graph neural network. CoRR abs/2106.13802 (2021). https://arxiv.org/abs/2106.13802
Miller, R., Charles, E.: A psychological based analysis of marketing email subject lines. In: 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer), pp. 58–65 (2016). https://doi.org/10.1109/ICTER.2016.7829899
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Article MathSciNet Google Scholar
Olive, T., Barbier, M.L.: Processing time and cognitive effort of longhand note taking when reading and summarizing a structured or linear text. Writ. Commun. 34(2), 224–246 (2017)
Article Google Scholar
Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947 (2019)
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad (2018)
Google Scholar
Salloum, S., Gaber, T., Vadera, S., Shaalan, K.: Phishing email detection using natural language processing techniques: a literature survey. Procedia Comput. Sci. 189, 19–28 (2021). https://doi.org/10.1016/j.procs.2021.05.077
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., Navigli, R., Vidal, M.-E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Chapter Google Scholar
Seth, S., Biswas, S.: Multimodal spam classification using deep learning techniques. In: 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 346–349. IEEE (2017)
Google Scholar
Shen, Z., Zhang, R., Dell, M., Lee, B.C.G., Carlson, J., Li, W.: Layoutparser: a unified toolkit for deep learning based document image analysis. arXiv preprint arXiv:2103.15348 (2021)
Wang, M., et al.: Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)
Wright, P.: The psychology of layout: Consequences of the visual structure of documents. American Association for Artificial Intelligence Technical Report FS-99-04, pp. 1–9 (1999)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Yang, H., Liu, Q., Zhou, S., Luo, Y.: A spam filtering method based on multi-modal fusion. Appl. Sci. 9(6), 1152 (2019)
Article Google Scholar
Yesilada, Y., Jay, C., Stevens, R., Harper, S.: Validating the use and role of visual elements of web pages in navigation with an eye-tracking study. In: Proceedings of the 17th International Conference on World Wide Web, pp. 11–20 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

UNEEK-Kosmopolead, 44300, Nantes, France
Alexis Blandin
Université Bretagne Sud, IRISA, 56000, Vannes, France
Alexis Blandin, Jeanne Villaneau & Pierre-François Marteau
Université Bretagne Sud, LMBA, Vannes, France
Farida Saïd

Authors

Alexis Blandin
View author publications
You can also search for this author in PubMed Google Scholar
Farida Saïd
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Villaneau
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-François Marteau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexis Blandin .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blandin, A., Saïd, F., Villaneau, J., Marteau, PF. (2022). DaFNeGE: Dataset of French Newsletters with Graph Representation and Embedding. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_2
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DaFNeGE: Dataset of French Newsletters with Graph Representation and Embedding