Template Trees: Extracting Actionable Information from Machine Generated Emails

Agarwal, Manoj K.; Singh, Jitendra

doi:10.1007/978-3-319-98812-2_1

Manoj K. Agarwal¹⁸ &
Jitendra Singh¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11030))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1534 Accesses

Abstract

Many machine generated emails carry important information which must be acted upon at scheduled time by the recipient. Thus, it becomes a natural goal to automatically extract such actionable information from these emails and communicate to the users. These emails are generated for many different domains, providing different types of services. However, such emails carry personal information, therefore, it becomes difficult to get access to large corpus of labeled data for supervised information extraction methods.

In this paper, we propose a novel method to automatically identify part of the email containing actionable information, called core region of the email, with the aid of a domain dictionary. Domain dictionary is generated based on the public information of the domain. The core regions are stored as template trees - a template tree is a sub-tree embedded in the email’s HTML DOM tree.

Our experiments over real data show, structure of the core region of the email, containing all the information of our interest, is very simple and it is 85%–98% smaller compared to the original email. Further, our experiments also show that the template trees are highly repetitive across diverse set of emails from a given service provider.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-Scale Information Extraction from Emails with Data Constraints

Review of the Main Approaches to Automated Email Answering

Unmasking Phishing Attempts: A Study on Detection in Spanish Emails

References

Di Castro, D., et al.: Enforcing k-anonymity in web mail auditing. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 327–336 (2016)
Google Scholar
Grbovic, M., et al.: How many folders do you really need? Classifying email into a handful of categories. In: Proceedings of International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, pp. 869–878 (2014)
Google Scholar
Zhang, W., et al.: Annotating needles in the haystack without looking: product information extraction from emails. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, SIGKDD 2015, Sydney, NSW, Australia, pp. 2257–2266 (2015)
Google Scholar
Liu, B., et al.: Mining data records in web pages. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, SIGKDD 2003, Washington, D.C., pp. 601–606 (2003)
Google Scholar
Di Castro, D., et al.: You’ve got mail, and here is what you could do with it! Analyzing and predicting actions on email messages. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 307–316 (2016)
Google Scholar
Wendt, J.W., et al.: Hierarchical label propagation and discovery for machine generated email. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 317–326 (2016)
Google Scholar
Zhang, A., Garcia-Pueyo, L., Wendt, J.B., Najork, M., Broder, A.: Email category prediction. In: Proceedings of International Conference on World Wide Web Companion, WWW 2017, Perth, Australia, pp. 495–503 (2017)
Google Scholar
Maarek, Y.: Is mail the next frontier in search and data mining? In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, p. 203 (2016)
Google Scholar
Proskurniay, J., et al.: Template induction over unstructured email corpora. In: Proceedings of International Conference on World Wide Web, WWW 2017, Perth, Australia, pp. 1521–1530 (2017)
Google Scholar
Avigdor-Elgrabli, N., et al.: Structural clustering of machine generated mail. In: Proceedings of ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, Indiana, USA, pp. 217–226 (2016)
Google Scholar
Ailon, N., Karnin, Z.S., Liberty, E., Maarek, Y.: Threading machine generated email. In: Proceedings of International Conference on Web search and Data Mining, WSDM 2013, Rome, Italy, pp. 405–414 (2013)
Google Scholar
Cohen, S., Or, N.: A general algorithm for subtree similarity-search. In: IEEE International Conference on Data Engineering, ICDE 2014, Chicago, Illinois (2014)
Google Scholar
Tatarinov, I., et al.: Storing and querying ordered XML using a relational database system. In: Proceedings of International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, pp. 204–215 (2002)
Google Scholar
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, pp. 287–298 (2002)
Google Scholar
Furche, T., et al.: DIADEM: thousands of Websites to a Single Database. Proc. VLDB Endow. 7(14), 1845–1856 (2014)
Article Google Scholar
Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endow. 4(4), 219–230 (2011)
Article Google Scholar
Arso, A., Garcia-Molina, H.: Extracting structured data from web pages. In: Proceedings of International Conference on Management of Data, SIGMOD 2003, pp. 337–348, San Diego, California (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

AI & Research, Microsoft – India, Hyderabad, 500032, India
Manoj K. Agarwal & Jitendra Singh

Authors

Manoj K. Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Jitendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj K. Agarwal .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Regensburg, Regensburg, Germany
Günther Pernul
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agarwal, M.K., Singh, J. (2018). Template Trees: Extracting Actionable Information from Machine Generated Emails. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-98812-2_1
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98811-5
Online ISBN: 978-3-319-98812-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics