skip to main content
10.1145/3558100.3563856acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Graphical document representation for french newsletters analysis

Published: 18 November 2022 Publication History

Abstract

Document analysis is essential in many industrial applications. However, engineering natural language resources to represent entire documents is still challenging. Besides, available resources in French are scarce and do not cover all possible tasks, especially in specific business applications. In this context, we present a French newsletter dataset and its use to predict the good or bad impact of newsletters on readers. We propose a new representation of newsletters in the form of graphs that consider the newsletters' layout. We evaluate the relevance of the proposed representation to predict a newsletter's performance in terms of open and click rates using graph analysis methods.

References

[1]
Amine Abdaoui, Jérôme Azé, Sandra Bringay, and Pascal Poncelet. 2017. FEEL: a French Expanded Emotion Lexicon. Language Resources and Evaluation 51, 3 (Sept. 2017), 833--855.
[2]
Alexis Blandin. 2021. Adaptation de ressources en langue anglaise pour interroger des données tabulaires en français (Adaptation of resources in English to query French tabular data). In Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 : 23e REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL). ATALA, Lille, France, 47--54. https://aclanthology.org/2021.jeptalnrecital-recital.4
[3]
Alexis Blandin, Farida Saïd, Jeanne Villaneau, and Pierre-François Marteau. 2021. Automatic Emotions Analysis for French Email Campaigns Optimization. In CENTRIC 2021. Barcelone, Spain. https://hal.archives-ouvertes.fr/hal-03424725
[4]
André Bonfrer and Xavier Drèze. 2009. Real-time evaluation of e-mail campaign performance. Marketing Science 28, 2 (2009), 251--263.
[5]
Shaked Brody, Uri Alon, and Eran Yahav. 2021. How Attentive are Graph Attention Networks?
[6]
Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, and Yingyu Liang. 2021. An Analysis of Attentive Walk-Aggregating Graph Neural Networks.
[7]
Martin d'Hoffschmidt, Wacim Belblidia, Tom Brendlé, Quentin Heinrich, and Maxime Vidal. 2020. FQuAD: French Question Answering Dataset. arXiv:2002.06071 [cs.CL]
[8]
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems 28 (2015).
[9]
Paul Ekman. 1999. Basic Emotions. John Wiley and Sons, Ltd, Chapter 3, 45--60. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470013494.ch3
[10]
Hani Guenoune, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, and Cédric Lopez. 2020. A Dataset for Anaphora Analysis in French Emails. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference. Association for Computational Linguistics, Barcelona, Spain (online), 165--175. https://aclanthology.org/2020.crac-1.17
[11]
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017). To appear.
[12]
Niels Ipsen, Pierre-Alexandre Mattei, and Jes Frellsen. 2020. How to deal with missing data in supervised deep learning?. In ICML Workshop on the Art of Learning with Missing Values (Artemiss).
[13]
Ruslan Kalitvianski. 2018. Traitements formels et sémantiques des échanges et des documents textuels liés à des activités collaboratives. Theses. Université Grenoble Alpes. https://tel.archives-ouvertes.fr/tel-01893348
[14]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[15]
Max Klabunde and Florian Lemmerich. 2022. On the Prediction Instability of Graph Neural Networks. arXiv preprint arXiv:2205.10070 (2022).
[16]
Bryan Klimt and Yiming Yang. 2004. The Enron Corpus: A New Dataset for Email Classification Research. In Machine Learning: ECML 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 217--226.
[17]
Boris Knyazev, G Taylor, and M Amer. 1905. Understanding Attention in Graph Neural Networks. In Proceedings of the ICLR RLGM Workshop.
[18]
Ashish Kumar. 2021. An empirical examination of the effects of design elements of email newsletters on consumers' email responses and their purchase. Journal of Retailing and Consumer Services 58 (2021), 102349.
[19]
Mufei Li, Jinjing Zhou, Jiajing Hu, Wenxuan Fan, Yangkang Zhang, Yaxin Gu, and George Karypis. 2021. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science. ACS Omega (2021).
[20]
Hervé Locteau, Sébastien Adam, Eric Trupin, Jacques Labiche, and Pierre Héroux. 2007. Symbol spotting using full visibility graph representation. In Workshop on Graphics Recognition. 49--50.
[21]
Steven Loria. 2018. textblob Documentation. Release 0.15 2 (2018).
[22]
Jaya Krishna Mandivarapu, Eric Bunch, Qian You, and Glenn Fung. 2021. Efficient Document Image Classification Using Region-Based Graph Neural Network. CoRR abs/2106.13802 (2021). arXiv:2106.13802 https://arxiv.org/abs/2106.13802
[23]
R. Miller and E.Y.A. Charles. 2016. A psychological based analysis of marketing email subject lines. In 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer). 58--65.
[24]
Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence 29, 3 (2013), 436--465.
[25]
Thierry Olive and Marie-Laure Barbier. 2017. Processing time and cognitive effort of longhand note taking when reading and summarizing a structured or linear text. Written Communication 34, 2 (2017), 224--246.
[26]
Kenta Oono and Taiji Suzuki. 2019. Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947 (2019).
[27]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv:1806.03822 [cs.CL]
[28]
Said Salloum, Tarek Gaber, Sunil Vadera, and Khaled Shaalan. 2021. Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey. Procedia Computer Science 189 (01 2021), 19--28.
[29]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593--607.
[30]
Shikhar Seth and Sagar Biswas. 2017. Multimodal spam classification using deep learning techniques. In 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, 346--349.
[31]
Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, and Weining Li. 2021. LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv preprint arXiv:2103.15348 (2021).
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[33]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[34]
Marion Sanglé-Ferière Virginie Rodriguez. 2021. LE CONTENU DES COMMUNICATIONS RELATIONNELLES PAR EMAIL DES ENSEIGNES : QUELLE PERCEPTION PAR LE CONSOMMATEUR ? (2021).
[35]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Ye Zihao, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (09 2019).
[36]
Patricia Wright. 1999. The psychology of layout: Consequences of the visual structure of documents. American Association for Artificial Intelligence Technical Report FS-99-04 (1999), 1--9.
[37]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.
[38]
Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. 2019. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry 63, 16 (2019), 8749--8760.
[39]
Hong Yang, Qihe Liu, Shijie Zhou, and Yang Luo. 2019. A spam filtering method based on multi-modal fusion. Applied Sciences 9, 6 (2019), 1152.
[40]
Yeliz Yesilada, Caroline Jay, Robert Stevens, and Simon Harper. 2008. Validating the use and role of visual elements of web pages in navigation with an eye-tracking study. In Proceedings of the 17th international conference on World Wide Web. 11--20.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering
September 2022
118 pages
ISBN:9781450395441
DOI:10.1145/3558100
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset
  2. graph convolutional network
  3. graph embedding
  4. newsletter

Qualifiers

  • Research-article

Conference

DocEng '22
Sponsor:
DocEng '22: ACM Symposium on Document Engineering 2022
September 20 - 23, 2022
California, San Jose

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 60
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media