Skip to main content

Using Contextual Information to Improve Search in Email Archives

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

In this paper we address the task of finding topically relevant email messages in public discussion lists. We make two important observations. First, email messages are not isolated, but are part of a larger online environment. This context, existing on different levels, can be incorporated into the retrieval model. We explore the use of thread, mailing list, and community content levels, by expanding our original query with term from these sources. We find that query models based on contextual information improve retrieval effectiveness. Second, email is a relatively informal genre, and therefore offers scope for incorporating techniques previously shown useful in searching user-generated content. Indeed, our experiments show that using query-independent features (email length, thread size, and text quality), implemented as priors, results in further improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balog, K., de Rijke, M.: Finding experts and their details in e-mail corpora. In: WWW 2006 (2006)

    Google Scholar 

  2. Balog, K., Weerkamp, W., de Rijke, M.: A few examples go a long way. In: SIGIR 2008, pp. 371–378 (2008)

    Google Scholar 

  3. Buckley, C.: Why current IR engines fail. In: SIGIR 2004, pp. 584–585 (2004)

    Google Scholar 

  4. Craswell, N., de Vries, A., Soboroff, I.: Overview of the TREC-2005 Enterprise Track. In: The Fourteenth Text REtrieval Conf. Proc. (TREC 2005) (2006)

    Google Scholar 

  5. Culotta, A., Bekkerman, R., Mccallum, A.: Extracting social networks and contact information from email and the web. In: CEAS-1 (2004)

    Google Scholar 

  6. Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIR 2006, pp. 154–161 (2006)

    Google Scholar 

  7. Diehl, C.P., Getoor, L., Namata, G.: Name reference resolution in organizational email archives. In: SIAM Int. Conf. Data Mining 2006, pp. 20–22 (2006)

    Google Scholar 

  8. Elsas, J.L., Arguello, J., Callan, J., Carbonell, J.G.: Retrieval and feedback models for blog feed search. In: SIGIR 2008, pp. 347–354 (2008)

    Google Scholar 

  9. Elsayed, T., Oard, D.W.: Modeling identity in archival collections of email: A preliminary study. In: CEAS 2006, pp. 95–103 (2006)

    Google Scholar 

  10. Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente (2001)

    Google Scholar 

  11. Kamps, J., de Rijke, M., Sigurbjörnsson, B.: The Importance of Length Normalization for XML Retrieval. Information Retrieval 8(4), 631–654 (2005)

    Article  Google Scholar 

  12. Klimt, B., Yang, Y.: Introducing the enron corpus. In: Conference on Email and Anti-Spam (2004)

    Google Scholar 

  13. Kurland, O., Lee, L., Domshlak, C.: Better than the real thing?: Iterative pseudo-query processing using cluster-based language models. In: SIGIR 2005, pp. 19–26 (2005)

    Google Scholar 

  14. Lafferty, J., Zhai, C.: Probabilistic relevance models based on document and query generation. In: Language Modeling for Information Retrieval. Springer, Heidelberg (2003)

    Google Scholar 

  15. Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127 (2001)

    Google Scholar 

  16. Leuski, A.: Email is a stage: discovering people roles from email archives. In: SIGIR 2004, pp. 502–503. ACM, New York (2004)

    Google Scholar 

  17. Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec 2007 blog track. In: TREC 2007 Working Notes, pp. 31–43 (2007)

    Google Scholar 

  18. Miller, D., Leek, T., Schwartz, R.: A hidden Markov model information retrieval system. In: SIGIR 1999, pp. 214–221 (1999)

    Google Scholar 

  19. Minkov, E., Wang, R.C., Cohen, W.W.: Extracting personal names from emails. In: HLT-EMNLP 2005 (2005)

    Google Scholar 

  20. Minkov, E., Cohen, W.W., Ng, A.Y.: Contextual search and name disambiguation in email using graphs. In: SIGIR 2006, pp. 27–34 (2006)

    Google Scholar 

  21. Mishne, G.: Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam (2007)

    Google Scholar 

  22. Newman, P.S.: Exploring discussion lists: steps and directions. In: JCDL 2002, pp. 126–134. ACM, New York (2002)

    Google Scholar 

  23. Ounis, I., Macdonald, C., de Rijke, M., Mishne, G., Soboroff, I.: Overview of the TREC 2006 Blog Track. In: TREC 2006. NIST (2007)

    Google Scholar 

  24. Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  25. Schwartz, M.F., Wood, D.C.M.: Discovering shared interests using graph analysis. Commun. ACM 36(8), 78–89 (1993)

    Article  Google Scholar 

  26. Soboroff, I., de Vries, A.P., Craswell, N.: Overview of the trec 2006 enterprise track. In: The Fifteenth Text REtrieval Conference Proceedings (TREC 2006) (2007)

    Google Scholar 

  27. Song, F., Croft, W.B.: A general language model for information retrieval. In: CIKM 1999, pp. 316–321 (1999)

    Google Scholar 

  28. Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR 2006, pp. 162–169 (2006)

    Google Scholar 

  29. Tuulos, V.H., Perkiö, J., Tirri, H.: Multi-faceted information retrieval system for large scale email archives. In: SIGIR 2005, pp. 683–683 (2005)

    Google Scholar 

  30. W3C. The W3C test collection (2005), http://research.microsoft.com/users/nickcr/w3c-summary.html

  31. Weerkamp, W., de Rijke, M.: Credibility improves topical blog post retrieval. In: ACL 2008: HLT, pp. 923–931 (June 2008)

    Google Scholar 

  32. Weerkamp, W., de Rijke, M.: Looking at things differently: Exploring perspective recall for informal text retrieval. In: DIR 2008, pp. 93–100 (2008)

    Google Scholar 

  33. Zhang, J., Ackerman, M.S.: Searching for expertise in social networks: a simulation of potential strategies. In: GROUP 2005, pp. 71–80 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weerkamp, W., Balog, K., de Rijke, M. (2009). Using Contextual Information to Improve Search in Email Archives. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics