Skip to main content

Difference-Similitude Matrix in Text Classification

  • Conference paper
Fuzzy Systems and Knowledge Discovery (FSKD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Included in the following conference series:

Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Setiono, R., Liu, H.: Neural network feature selector. IEEE Transactions on Neural Networks, vol 8(39), 645–662 (1997)

    Google Scholar 

  3. Barker, A.L.: Selection of Distance Metrics and Feature Subsets for k-Nearest Neighbor Classifiers (1997)

    Google Scholar 

  4. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  5. Pawlak, Z.: Rough Classification. International Journal of Man-Machine Studies 20(5), 469–483 (1984)

    Article  MATH  Google Scholar 

  6. Nguyen, S.H.: Scalable classification method based on rough sets. In: Proceedings of Rough Sets and Current Trends in Computing, pp. 433–440 (2002)

    Google Scholar 

  7. Pawlak, Z.: Rough Sets. Informational Journal of Information and Computer Sciences 11(5), 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  8. Xia, D., Yan, P.: A New Method of Knowledge Reduction for Information System – DSM Approach. Research Report of Wuhan University, Wuhan (2001)

    Google Scholar 

  9. Jiang, H., Yan, P., Xia, D.: A New Reduction Algorithm – Difference-Similitude Matrix. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, 2-5 Xi’an, pp. 1533–1537 (2004)

    Google Scholar 

  10. Wu, M., Xia, D., Yan, P.: A New Knowledge Reduction Method Based on Difference-Similitude Set Theory. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 1413–1418 (2004)

    Google Scholar 

  11. Aizawa, A.: The feature quantity: An information theoretic perspective of tfidf-like measures. In: Proceedings of SIGIR 2000, pp. 104–111 (2000)

    Google Scholar 

  12. Chen, Y., Wang, J.Z.: Support Vector Learning for Fuzzy Rule-Based Classification System. IEEE Transactions on Fuzzy Systems 11(6), 716–728 (2003)

    Article  Google Scholar 

  13. Li, H., Kenji, Y.: Text Classification Using ESC-based Stochastic Decision List. In: Proceedings of the 8th ACM International Conference on Information and Knowledge Management (CIKM 1999), pp. 122–130 (1999)

    Google Scholar 

  14. Han, E.-H., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Technical Report #99-019 (1999)

    Google Scholar 

  15. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Using EM to Classify Text from Labeled and Unlabeled Documents. Technical Report CMU-CS-98-120, School of Computer Science, CMU, Pittsburgh, p. 15213 (1998)

    Google Scholar 

  16. Fung, B.C.M., Wang, K., Ester, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the SIAM International Conference on Data Mining (2003)

    Google Scholar 

  17. Zhou, J., Xia, D., Yan, P.: Incremental Machine Learning Theorem and Algorithm Based on DSM Method. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 2202–2207 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, X., Wu, M., Xia, D., Yan, P. (2005). Difference-Similitude Matrix in Text Classification. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_3

Download citation

  • DOI: https://doi.org/10.1007/11540007_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28331-7

  • Online ISBN: 978-3-540-31828-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics