Skip to main content

Semi-supervised Metrics for Textual Data Visualization

  • Conference paper
Book cover Artificial Neural Networks – ICANN 2007 (ICANN 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4669))

Included in the following conference series:

  • 1880 Accesses

Abstract

Multidimensional Scaling algorithms (MDS) are useful tools that help to discover high dimensional object relationships. They have been applied to a wide range of practical problems and particularly to the visualization of the semantic relations among documents or terms in textual databases.

The MDS algorithms proposed in the literature often suffer from a low discriminant power due to its unsupervised nature and to the ‘curse of dimensionality’. Fortunately, textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem.

In this paper we propose a semi-supervised version of the Torgerson MDS algorithm that takes advantage of this document classification to improve the discriminant power of the word maps generated. The algorithm has been applied to the visualization of term relationships. The experimental results show that the model proposed outperforms well known unsupervised alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16(2), 245–255 (2004)

    Article  Google Scholar 

  2. Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13–18 (2001)

    Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Wokingham, UK (1999)

    Google Scholar 

  4. Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings of ACM SIGIR Conference, Copenhagen, pp. 161–167. ACM Press, New York (1992)

    Google Scholar 

  5. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces and information retrieval. SIAM review 41(2), 335–362 (1999)

    Article  MATH  Google Scholar 

  6. Buja, A., Logan, B., Reeds, F., Shepp, R.: Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics 22, 406–438 (1994)

    MATH  Google Scholar 

  7. Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Conference Neural Information Processing Systems, vol. 15 (2003)

    Google Scholar 

  8. Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIS) 49(7), 582–603 (1998)

    Google Scholar 

  9. Chung, Y.M., Lee, J.Y.: A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology 52(4), 283–296 (2001)

    Article  Google Scholar 

  10. Cox, T.F., Cox, M.A.A.: Multidimensional scaling, 2nd edn. Chapman & Hall/CRC, USA (2001)

    MATH  Google Scholar 

  11. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins university press, Baltimore, Maryland, USA (1996)

    MATH  Google Scholar 

  12. Joachims, T.: Learning to Classify Text using Support Vector Machines. Methods, Theory and Algorithms. Kluwer Academic Publishers, Boston (2002)

    Google Scholar 

  13. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. John Wiley & Sons, New York (1990)

    Google Scholar 

  14. Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Transactions on Neural Networks 14(6), 1496–1505 (2003)

    Article  Google Scholar 

  15. Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Netherlands (1998)

    Google Scholar 

  16. Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks 6(2) (March 1995)

    Google Scholar 

  17. Martín-Merino, M., Muñoz, A.: A New MDS Algorithm for Textual Data Analysis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 860–867. Springer, Heidelberg (2004)

    Google Scholar 

  18. Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization. In: International Conference on Pattern Recognition, Cambridge, vol. 1, pp. 477–481 (August, 2004)

    Google Scholar 

  19. Mladenié, D.: Turning Yahoo into an Automatic Web-Page Classifier. In: Proceedings 13th European Conference on Aritficial Intelligence, Brighton, pp. 473–474 (1998)

    Google Scholar 

  20. Pedrycz, W., Vukovich, G.: Fuzzy Clustering with Supervision. Pattern Recognition 37, 1339–1349 (2004)

    Article  MATH  Google Scholar 

  21. Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, Austin, USA, pp. 58–64 (July 2000)

    Google Scholar 

  22. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  23. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA, pp. 412–420 (July, 1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joaquim Marques de Sá Luís A. Alexandre Włodzisław Duch Danilo Mandic

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blanco, Á., Martín-Merino, M. (2007). Semi-supervised Metrics for Textual Data Visualization. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74695-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74695-9_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74693-5

  • Online ISBN: 978-3-540-74695-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics