Managing email overload with an automatic nonparametric clustering system

Xiang, Yang

doi:10.1007/s11227-008-0216-y

Managing email overload with an automatic nonparametric clustering system

Published: 06 July 2008

Volume 48, pages 227–242, (2009)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yang Xiang¹

206 Accesses
4 Citations
Explore all metrics

Abstract

Email overload is a recent problem that there is increasingly difficulty that people have to process the large number of emails received daily. Currently, this problem becomes more and more serious and it has already affected the normal usage of email as a knowledge management tool. It has been recognized that categorizing emails into meaningful groups can greatly save cognitive load to process emails, and thus this is an effective way to manage the email overload problem. However, most current approaches still require significant human input for categorizing emails. In this paper, we develop an automatic email clustering system, underpinned by a new nonparametric text clustering algorithm. This system does not require any predefined input parameters and can automatically generate meaningful email clusters. The evaluation shows our new algorithm outperforms existing text clustering algorithms with higher efficiency and quality in terms of computational time and clustering quality measured by different gauges. The experimental results also well match the labeled human clustering results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Thousand Oaks
Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading
Google Scholar
Baker FB, Hubert LJ (1975) Measuring the power of hierarchical cluster analysis. J Am Stat Assoc 70(349):31–38
Article MATH Google Scholar
Bradley P, Fayyad U (1998) Refining initial points for K-means clustering. In: 5th international conference on machine learning, 1998, pp 91–99
Chen M-S, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Chuang S-L, Chien L-F (2002) Towards automatic generation of query taxonomy: a hierarchical query clustering approach. In: Second IEEE international conference on data mining, 2002, p 75
Denning P (2002) The profession of IT. Commun ACM 45(3):15–18
Google Scholar
Huang S, Chen Z, Yu Y, Ma W-Y (2006) Multitype features coselection for Web document clustering. IEEE Trans Knowl Data Eng 18(4):448–459
Article Google Scholar
Huang Y, Mitchell TM (2006) Text clustering with extended user feedback. In: 29th international ACM SIGIR conference on research and development in information retrieval, 2006, pp 413–420
IDC (2005) IDC examines the future of email as it navigates security threats, compliance requirements, and market alternatives. http://www.idc.com/getdoc.jsp?containerId=prUS20033705, 2005
Kushmerick N, Lau T (2005) Automated e-mail activity management: an unsupervised learning approach. In: 10th international conference on intelligent user interfaces, 2005, pp 67–74
Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: ACM SIGKDD 5th international conference on knowledge discovery and data mining, 1999, pp 16–22
Mock K (2001) An experimental framework for email categorization and management. In: 24th ACM international conference on research and development in information retrieval, 2001, pp 392–393
Payne T, Edwards P (1997) Interface agents that learn: an investigation of learning issues in a mail interface. Appl Artif Intell 11(1):1–32
Article Google Scholar
Roussinov DG, Chen H (1999) Document clustering for electronic meetings: an experimental comparison of two techniques. Decis Support Syst 27(1–2):67–79
Article Google Scholar
Roussinov DG, Zhao JL (2003) Automatic discovery of similarity relationships through Web mining. Decis Support Syst 35(1):149–166
Article Google Scholar
Sasaki M, Shinnou H (2005) Spam detection using text clustering. In: International conference on cyberworlds, 2005, p 4
Schuff D, Turetken O, D’Arcy J (2006) A multi-attribute, multi-weight clustering approach to managing “E-mail overload”. Decis Support Syst 42(3):1350–1365
Article Google Scholar
Schuff D, Turetken O, D’Arcy J, Croson D (2007) Managing e-mail overload: solutions and future challenges. IEEE Comput 40(2):31–36
Google Scholar
Schultze U, Vandenbosch B (1998) Information overload in a groupware environment: now you see it, now you don’t. J Organ Comput Electron Commer 8(2):127–148
Google Scholar
Tabachnick BG, Fidell LS (1996) Using multivariate statistics. Harper Collins College Publishers, New York
Google Scholar
Whittaker S, Sidner C (1996) Email overload: exploring personal information management of email. In: ACM SIGCHI conference on human factors in computing systems, 1996, pp 276–283

Download references

Author information

Authors and Affiliations

School of Management and Information Systems, Central Queensland University, North Rockhampton, Queensland, 4702, Australia
Yang Xiang

Authors

Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Xiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiang, Y. Managing email overload with an automatic nonparametric clustering system. J Supercomput 48, 227–242 (2009). https://doi.org/10.1007/s11227-008-0216-y

Download citation

Received: 26 November 2007
Accepted: 16 May 2008
Published: 06 July 2008
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11227-008-0216-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Managing email overload with an automatic nonparametric clustering system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating Cohesion Score with Email Clustering

Investigating the Effect of Combining Text Clustering with Classification on Improving Spam Email Detection

Improved email classification through enhanced data preprocessing approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Managing email overload with an automatic nonparametric clustering system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating Cohesion Score with Email Clustering

Investigating the Effect of Combining Text Clustering with Classification on Improving Spam Email Detection

Improved email classification through enhanced data preprocessing approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation