A local semi-supervised Sammon algorithm for textual data visualization

Martín-Merino, Manuel; Blanco, Ángela

doi:10.1007/s10844-008-0056-5

A local semi-supervised Sammon algorithm for textual data visualization

Published: 26 May 2008

Volume 33, pages 23–40, (2009)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Manuel Martín-Merino¹ &
Ángela Blanco¹

175 Accesses
6 Citations
Explore all metrics

Abstract

Sammon’s mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms in textual databases. The word maps generated by the Sammon mapping suffer from a low discriminant power due to the well known “curse of dimensionality” and to the unsupervised nature of the algorithm. Fortunately the textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem. In this paper we first introduce a modification of the Sammon mapping (SSammon) that enhances the local topology reducing the sensibility to the ’curse of dimensionality’. Next a semi-supervised version is proposed that takes advantage of the a priori categorization of a subset of documents to improve the discriminant power of the word maps generated. The new algorithm has been applied to the challenging problem of word map generation. The experimental results suggest that the new model improves significantly well known unsupervised alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Available from http://svmlight.joachims.org.

References

Aggarwal, C. C. (2001). Re-designing distance functions and distance-based applications for high dimensional applications. In Proc. of SIGMOD-PODS (Vol. 1, pp. 13–18).
Aggarwal, C. C., & Yu, P. S. (2002). Redefining clustering for high-dimensional applications. IEEE Transactions on Knowledge and Data Engineering, 14(2), 210–225 (March/April).
Article Google Scholar
Aggarwal, C. C., Gates, S. C., & Yu, P. S. (2004). On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255.
Article Google Scholar
Backer, S., Naud, A., & Scheunders, P. (1998). Non-linear dimensionality reduction techniques for unsupervised feature extraction. Pattern Recognition Letters, 19, 711–720.
Article MATH Google Scholar
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Wokingham, UK: Addison Wesley.
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful?. In Proc. of the international conference on database theory (ICDT). Lecture notes in computer science (Vol. 1540, pp. 217–235). Jerusalem, Israel: Springer.
Google Scholar
Bezdek, J. C., & Pal, N. R. (1995). An index of topological preservation for feature extraction. Pattern Recognition, 28(3), 381–391.
Article Google Scholar
Buja, A., Logan, B., Reeds, F., & Shepp, R. (1994). Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics, 22, 406–438.
Article MATH MathSciNet Google Scholar
Chapelle, O., Weston, J., & Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. Annual Conference on Neural Information Processing Systems (NIPS), 15.
Chung, Y. M., & Lee, J. Y. (2001) A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 52(4), 283–296.
Article Google Scholar
Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). USA: Chapman & Hall/CRC.
MATH Google Scholar
Demartines, P., & Hérault, J. (1996). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 20, 1–6.
Google Scholar
Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Boston: Kluwer.
Google Scholar
Kaplan, W. (1999). MAXIMA and MINIMA with applications. New York: Wiley.
MATH Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. An introduction to cluster analysis. New York: Wiley.
Google Scholar
Kohonen, T. (1995). Self-organizing maps (2nd ed.). Berlin: Springer Verlag.
Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., et al. (2000). Organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3), 574–585.
Article Google Scholar
Kothari, R., & Jain, V. (2003). Learning from labeled and unlabeled data using a minimal number of queries. IEEE Transactions on Neural Networks, 14(6), 1496–1505 (November).
Article Google Scholar
Kraaijveld, M., Mao, J., & Jain, A. (1995). A nonlinear projection method based on kohonen’s topology preserving maps. IEEE Transactions on Neural Networks, 6(3), 548–559 (May).
Article Google Scholar
Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 37, 49–76.
Article Google Scholar
Mao, J., & Jain, A. K. (1995). Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317 (March).
Article Google Scholar
Martín-Merino, M., & Muñoz, A. (2001). Self organizing map and Sammon mapping for asymmetric proximities. LNCS (Vol. 2130, pp. 429–435). Springer.
Martín-Merino, M., & Muñoz, A. (2004a). A new MDS algorithm for textual data analysis. Lecture notes in computer science LNCS-3316 (pp. 860–867). Springer.
Martín-Merino, M., & Muñoz, A. (2004b). A new Sammon algorithm for sparse data visualization. In International Conference on Pattern Recognition (Vol. 1, pp. 477–481) Cambridge, August.
Muñoz, A. (1997). Compound key word generation from document databases using a hierarchical clustering ART model. Journal of Intelligent Data Analysis, 1(1), 25–48.
Article Google Scholar
Pedrycz, W., & Vukovich, G. (2004). Fuzzy clustering with supervision. Pattern Recognition, 37, 1339–1349.
Article MATH Google Scholar
Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18, 401–409 (May).
Article Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. In Proceedings of the 17th national conference on artificial intelligence: Workshop of artificial intelligence for Web search (pp. 58–64) Austin, USA (July).
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proc. of the 14th international conference on machine learning (pp. 412–420). Nashville, Tennessee, USA (July).

Download references

Acknowledgements

Financial support from Junta de Castilla y León grant PON05B06 is gratefully appreciated.

Author information

Authors and Affiliations

Universidad Pontificia de Salamanca, C/Compañía 5, 37002, Salamanca, Spain
Manuel Martín-Merino & Ángela Blanco

Authors

Manuel Martín-Merino
View author publications
You can also search for this author in PubMed Google Scholar
Ángela Blanco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Martín-Merino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martín-Merino, M., Blanco, Á. A local semi-supervised Sammon algorithm for textual data visualization. J Intell Inf Syst 33, 23–40 (2009). https://doi.org/10.1007/s10844-008-0056-5

Download citation

Published: 26 May 2008
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10844-008-0056-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A local semi-supervised Sammon algorithm for textual data visualization

Abstract

Access this article

Similar content being viewed by others

Citation-based clustering of publications using CitNetExplorer and VOSviewer

Clustering graph data: the roadmap to spectral techniques

A comprehensive and analytical review of text clustering techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A local semi-supervised Sammon algorithm for textual data visualization

Abstract

Access this article

Similar content being viewed by others

Citation-based clustering of publications using CitNetExplorer and VOSviewer

Clustering graph data: the roadmap to spectral techniques

A comprehensive and analytical review of text clustering techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation