research-article

Graphical document representation for french newsletters analysis

Authors:

Alexis Blandin,

Jeanne Villaneau,

Pierre-François MarteauAuthors Info & Claims

DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering

Article No.: 3, Pages 1 - 8

https://doi.org/10.1145/3558100.3563856

Published: 18 November 2022 Publication History

Abstract

Document analysis is essential in many industrial applications. However, engineering natural language resources to represent entire documents is still challenging. Besides, available resources in French are scarce and do not cover all possible tasks, especially in specific business applications. In this context, we present a French newsletter dataset and its use to predict the good or bad impact of newsletters on readers. We propose a new representation of newsletters in the form of graphs that consider the newsletters' layout. We evaluate the relevance of the proposed representation to predict a newsletter's performance in terms of open and click rates using graph analysis methods.

References

[1]

Amine Abdaoui, Jérôme Azé, Sandra Bringay, and Pascal Poncelet. 2017. FEEL: a French Expanded Emotion Lexicon. Language Resources and Evaluation 51, 3 (Sept. 2017), 833--855.

Digital Library

[2]

Alexis Blandin. 2021. Adaptation de ressources en langue anglaise pour interroger des données tabulaires en français (Adaptation of resources in English to query French tabular data). In Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 : 23e REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL). ATALA, Lille, France, 47--54. https://aclanthology.org/2021.jeptalnrecital-recital.4

[3]

Alexis Blandin, Farida Saïd, Jeanne Villaneau, and Pierre-François Marteau. 2021. Automatic Emotions Analysis for French Email Campaigns Optimization. In CENTRIC 2021. Barcelone, Spain. https://hal.archives-ouvertes.fr/hal-03424725

[4]

André Bonfrer and Xavier Drèze. 2009. Real-time evaluation of e-mail campaign performance. Marketing Science 28, 2 (2009), 251--263.

[5]

Shaked Brody, Uri Alon, and Eran Yahav. 2021. How Attentive are Graph Attention Networks?

[6]

Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, and Yingyu Liang. 2021. An Analysis of Attentive Walk-Aggregating Graph Neural Networks.

[7]

Martin d'Hoffschmidt, Wacim Belblidia, Tom Brendlé, Quentin Heinrich, and Maxime Vidal. 2020. FQuAD: French Question Answering Dataset. arXiv:2002.06071 [cs.CL]

[8]

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems 28 (2015).

[9]

Paul Ekman. 1999. Basic Emotions. John Wiley and Sons, Ltd, Chapter 3, 45--60. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470013494.ch3

[10]

Hani Guenoune, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, and Cédric Lopez. 2020. A Dataset for Anaphora Analysis in French Emails. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference. Association for Computational Linguistics, Barcelona, Spain (online), 165--175. https://aclanthology.org/2020.crac-1.17

[11]

Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017). To appear.

[12]

Niels Ipsen, Pierre-Alexandre Mattei, and Jes Frellsen. 2020. How to deal with missing data in supervised deep learning?. In ICML Workshop on the Art of Learning with Missing Values (Artemiss).

[13]

Ruslan Kalitvianski. 2018. Traitements formels et sémantiques des échanges et des documents textuels liés à des activités collaboratives. Theses. Université Grenoble Alpes. https://tel.archives-ouvertes.fr/tel-01893348

[14]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[15]

Max Klabunde and Florian Lemmerich. 2022. On the Prediction Instability of Graph Neural Networks. arXiv preprint arXiv:2205.10070 (2022).

[16]

Bryan Klimt and Yiming Yang. 2004. The Enron Corpus: A New Dataset for Email Classification Research. In Machine Learning: ECML 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 217--226.

Digital Library

[17]

Boris Knyazev, G Taylor, and M Amer. 1905. Understanding Attention in Graph Neural Networks. In Proceedings of the ICLR RLGM Workshop.

[18]

Ashish Kumar. 2021. An empirical examination of the effects of design elements of email newsletters on consumers' email responses and their purchase. Journal of Retailing and Consumer Services 58 (2021), 102349.

[19]

Mufei Li, Jinjing Zhou, Jiajing Hu, Wenxuan Fan, Yangkang Zhang, Yaxin Gu, and George Karypis. 2021. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science. ACS Omega (2021).

[20]

Hervé Locteau, Sébastien Adam, Eric Trupin, Jacques Labiche, and Pierre Héroux. 2007. Symbol spotting using full visibility graph representation. In Workshop on Graphics Recognition. 49--50.

[21]

Steven Loria. 2018. textblob Documentation. Release 0.15 2 (2018).

[22]

Jaya Krishna Mandivarapu, Eric Bunch, Qian You, and Glenn Fung. 2021. Efficient Document Image Classification Using Region-Based Graph Neural Network. CoRR abs/2106.13802 (2021). arXiv:2106.13802 https://arxiv.org/abs/2106.13802

[23]

R. Miller and E.Y.A. Charles. 2016. A psychological based analysis of marketing email subject lines. In 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer). 58--65.

[24]

Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence 29, 3 (2013), 436--465.

[25]

Thierry Olive and Marie-Laure Barbier. 2017. Processing time and cognitive effort of longhand note taking when reading and summarizing a structured or linear text. Written Communication 34, 2 (2017), 224--246.

[26]

Kenta Oono and Taiji Suzuki. 2019. Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947 (2019).

[27]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv:1806.03822 [cs.CL]

[28]

Said Salloum, Tarek Gaber, Sunil Vadera, and Khaled Shaalan. 2021. Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey. Procedia Computer Science 189 (01 2021), 19--28.

[29]

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593--607.

[30]

Shikhar Seth and Sagar Biswas. 2017. Multimodal spam classification using deep learning techniques. In 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, 346--349.

[31]

Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, and Weining Li. 2021. LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv preprint arXiv:2103.15348 (2021).

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[33]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

[34]

Marion Sanglé-Ferière Virginie Rodriguez. 2021. LE CONTENU DES COMMUNICATIONS RELATIONNELLES PAR EMAIL DES ENSEIGNES : QUELLE PERCEPTION PAR LE CONSOMMATEUR ? (2021).

[35]

Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Ye Zihao, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (09 2019).

[36]

Patricia Wright. 1999. The psychology of layout: Consequences of the visual structure of documents. American Association for Artificial Intelligence Technical Report FS-99-04 (1999), 1--9.

[37]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.

[38]

Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. 2019. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry 63, 16 (2019), 8749--8760.

[39]

Hong Yang, Qihe Liu, Shijie Zhou, and Yang Luo. 2019. A spam filtering method based on multi-modal fusion. Applied Sciences 9, 6 (2019), 1152.

[40]

Yeliz Yesilada, Caroline Jay, Robert Stevens, and Simon Harper. 2008. Validating the use and role of visual elements of web pages in navigation with an eye-tracking study. In Proceedings of the 17th international conference on World Wide Web. 11--20.

Digital Library

Index Terms

Graphical document representation for french newsletters analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
      2. Language resources
  2. Machine learning
    1. Machine learning algorithms
2. Information systems
  1. Information systems applications

Recommendations

DaFNeGE: Dataset of French Newsletters with Graph Representation and Embedding
Text, Speech, and Dialogue
Abstract
Natural language resources are essential for integrating linguistic engineering components into information processing suites. However, the resources available in French are scarce and do not cover all possible tasks, especially for specific ...
Newsletter engine application
PETRA '08: Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments

In this paper, we present a newsletter engine application, with which newsletters can be created and sent to different subscribers. The application permits to users to create or delete newsletter articles, to select a visual template of the newsletter's ...
Email archive analysis through graphical visualization
VizSEC/DMSEC '04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security

The analysis of the vast storehouse of email content accumulated or produced by individual users has received relatively little attention other than for specific tasks such as spam and virus filtering. Current email analysis in standard client ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering

September 2022

118 pages

ISBN:9781450395441

DOI:10.1145/3558100

General Chairs:
Curtis Wigington
Adobe Systems Incorporated
,
Matthew Hardy
Adobe Systems Incorporated
,
Program Chairs:
Steven R. Bagley
University of Nottingham, United Kingdom
,
Steven Simske
Colorado State University, Fort Collins, CO

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

In-Cooperation

SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DocEng '22

Sponsor:

SIGWEB

DocEng '22: ACM Symposium on Document Engineering 2022

September 20 - 23, 2022

California, San Jose

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten