Skip to main content

Template Trees: Extracting Actionable Information from Machine Generated Emails

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11030))

Included in the following conference series:

  • 1534 Accesses

Abstract

Many machine generated emails carry important information which must be acted upon at scheduled time by the recipient. Thus, it becomes a natural goal to automatically extract such actionable information from these emails and communicate to the users. These emails are generated for many different domains, providing different types of services. However, such emails carry personal information, therefore, it becomes difficult to get access to large corpus of labeled data for supervised information extraction methods.

In this paper, we propose a novel method to automatically identify part of the email containing actionable information, called core region of the email, with the aid of a domain dictionary. Domain dictionary is generated based on the public information of the domain. The core regions are stored as template trees - a template tree is a sub-tree embedded in the email’s HTML DOM tree.

Our experiments over real data show, structure of the core region of the email, containing all the information of our interest, is very simple and it is 85%–98% smaller compared to the original email. Further, our experiments also show that the template trees are highly repetitive across diverse set of emails from a given service provider.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Di Castro, D., et al.: Enforcing k-anonymity in web mail auditing. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 327–336 (2016)

    Google Scholar 

  2. Grbovic, M., et al.: How many folders do you really need? Classifying email into a handful of categories. In: Proceedings of International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, pp. 869–878 (2014)

    Google Scholar 

  3. Zhang, W., et al.: Annotating needles in the haystack without looking: product information extraction from emails. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, SIGKDD 2015, Sydney, NSW, Australia, pp. 2257–2266 (2015)

    Google Scholar 

  4. Liu, B., et al.: Mining data records in web pages. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, SIGKDD 2003, Washington, D.C., pp. 601–606 (2003)

    Google Scholar 

  5. Di Castro, D., et al.: You’ve got mail, and here is what you could do with it! Analyzing and predicting actions on email messages. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 307–316 (2016)

    Google Scholar 

  6. Wendt, J.W., et al.: Hierarchical label propagation and discovery for machine generated email. In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, pp. 317–326 (2016)

    Google Scholar 

  7. Zhang, A., Garcia-Pueyo, L., Wendt, J.B., Najork, M., Broder, A.: Email category prediction. In: Proceedings of International Conference on World Wide Web Companion, WWW 2017, Perth, Australia, pp. 495–503 (2017)

    Google Scholar 

  8. Maarek, Y.: Is mail the next frontier in search and data mining? In: Proceedings of International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, California, USA, p. 203 (2016)

    Google Scholar 

  9. Proskurniay, J., et al.: Template induction over unstructured email corpora. In: Proceedings of International Conference on World Wide Web, WWW 2017, Perth, Australia, pp. 1521–1530 (2017)

    Google Scholar 

  10. Avigdor-Elgrabli, N., et al.: Structural clustering of machine generated mail. In: Proceedings of ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, Indiana, USA, pp. 217–226 (2016)

    Google Scholar 

  11. Ailon, N., Karnin, Z.S., Liberty, E., Maarek, Y.: Threading machine generated email. In: Proceedings of International Conference on Web search and Data Mining, WSDM 2013, Rome, Italy, pp. 405–414 (2013)

    Google Scholar 

  12. Cohen, S., Or, N.: A general algorithm for subtree similarity-search. In: IEEE International Conference on Data Engineering, ICDE 2014, Chicago, Illinois (2014)

    Google Scholar 

  13. Tatarinov, I., et al.: Storing and querying ordered XML using a relational database system. In: Proceedings of International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, pp. 204–215 (2002)

    Google Scholar 

  14. Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, pp. 287–298 (2002)

    Google Scholar 

  15. Furche, T., et al.: DIADEM: thousands of Websites to a Single Database. Proc. VLDB Endow. 7(14), 1845–1856 (2014)

    Article  Google Scholar 

  16. Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endow. 4(4), 219–230 (2011)

    Article  Google Scholar 

  17. Arso, A., Garcia-Molina, H.: Extracting structured data from web pages. In: Proceedings of International Conference on Management of Data, SIGMOD 2003, pp. 337–348, San Diego, California (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj K. Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Agarwal, M.K., Singh, J. (2018). Template Trees: Extracting Actionable Information from Machine Generated Emails. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98812-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98811-5

  • Online ISBN: 978-3-319-98812-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics