Abstract
E-mailing has become an essential component of cooperation in business. Consequently, the large number of messages manually produced or automatically generated can rapidly cause information overflow for users. Many research projects have examined this issue but surprisingly few have tackled the problem of the files attached to e-mails that, in many cases, contain a substantial part of the semantics of the message. This paper considers this specific topic and focuses on the problem of clustering and visualization of attached files. Relying on the multinomial mixture model, we used the Classification EM algorithm (CEM) to cluster the set of files, and MDSDCA to visualize the obtained classes of documents. Like the Multidimensional Scaling method, the aim of the MDSDCA algorithm based on the Difference of Convex functions is to optimize the stress criterion. As MDSDCA is iterative, we propose an initialization approach to avoid starting with random values. Experiments are investigated using simulations and textual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data AnalysisĀ 14, 315ā332 (1992)
Govaert, G., Nadif, M.: Clustering of contingency table and mixture model. European Journal of Operational ResearchĀ 36, 1055ā1066 (2007)
Le Thi Hoai, A., Pham Dinh, T.: D.C. Programming Approach for Solving the Multidimensional Scaling Problem. In: Nonconvex Optimizations and Its Applications, pp. 231ā276. Kluwer Academic Publishers, Dordrecht (2001)
Kerr, B.: Thread Arcs: An Email Thread Visualization. In: Proceedings of the IEEE Symposium on Information Visualization (2003)
Otjacques, B., Feltz, F., Halin, G., Bignon, L.-C.: MatGraph: Transformation matricielle de graphe pour visualiser des Ć©changes Ć©lectroniques. In: Actes de la 17me Conference Francophone sur lāInteraction Homme-Machine (IHM 2005), pp. 43ā49 (2005)
Allouti, F., Nadif, M., Otjacques, B., Le Thi, H.A.: Visualisation du parcours des fichiers attachĆ©s aux messages Ć©lectroniques. In: Proceedings of the 20th International Conference of the Association Francophone dāInteraction Homme-Machine (IHM 2008), vol.Ā 339, pp. 29ā32. ACM Publishers, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Allouti, F., Nadif, M., Hoai An, L.T., Otjacques, B. (2009). Mixture Model and MDSDCA for Textual Data. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2009. Lecture Notes in Computer Science, vol 5738. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04265-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-04265-2_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04264-5
Online ISBN: 978-3-642-04265-2
eBook Packages: Computer ScienceComputer Science (R0)