Difference-Similitude Matrix in Text Classification

Huang, Xiaochun; Wu, Ming; Xia, Delin; Yan, Puliu

doi:10.1007/11540007_3

Xiaochun Huang²⁰,
Ming Wu²⁰,
Delin Xia²⁰ &
…
Puliu Yan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Setiono, R., Liu, H.: Neural network feature selector. IEEE Transactions on Neural Networks, vol 8(39), 645–662 (1997)
Google Scholar
Barker, A.L.: Selection of Distance Metrics and Feature Subsets for k-Nearest Neighbor Classifiers (1997)
Google Scholar
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Pawlak, Z.: Rough Classification. International Journal of Man-Machine Studies 20(5), 469–483 (1984)
Article MATH Google Scholar
Nguyen, S.H.: Scalable classification method based on rough sets. In: Proceedings of Rough Sets and Current Trends in Computing, pp. 433–440 (2002)
Google Scholar
Pawlak, Z.: Rough Sets. Informational Journal of Information and Computer Sciences 11(5), 341–356 (1982)
Article MATH MathSciNet Google Scholar
Xia, D., Yan, P.: A New Method of Knowledge Reduction for Information System – DSM Approach. Research Report of Wuhan University, Wuhan (2001)
Google Scholar
Jiang, H., Yan, P., Xia, D.: A New Reduction Algorithm – Difference-Similitude Matrix. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, 2-5 Xi’an, pp. 1533–1537 (2004)
Google Scholar
Wu, M., Xia, D., Yan, P.: A New Knowledge Reduction Method Based on Difference-Similitude Set Theory. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 1413–1418 (2004)
Google Scholar
Aizawa, A.: The feature quantity: An information theoretic perspective of tfidf-like measures. In: Proceedings of SIGIR 2000, pp. 104–111 (2000)
Google Scholar
Chen, Y., Wang, J.Z.: Support Vector Learning for Fuzzy Rule-Based Classification System. IEEE Transactions on Fuzzy Systems 11(6), 716–728 (2003)
Article Google Scholar
Li, H., Kenji, Y.: Text Classification Using ESC-based Stochastic Decision List. In: Proceedings of the 8th ACM International Conference on Information and Knowledge Management (CIKM 1999), pp. 122–130 (1999)
Google Scholar
Han, E.-H., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Technical Report #99-019 (1999)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Using EM to Classify Text from Labeled and Unlabeled Documents. Technical Report CMU-CS-98-120, School of Computer Science, CMU, Pittsburgh, p. 15213 (1998)
Google Scholar
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical Document Clustering Using Frequent Itemsets. In: Proceedings of the SIAM International Conference on Data Mining (2003)
Google Scholar
Zhou, J., Xia, D., Yan, P.: Incremental Machine Learning Theorem and Algorithm Based on DSM Method. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, vol. 3, pp. 2202–2207 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430079, Hubei, China
Xiaochun Huang, Ming Wu, Delin Xia & Puliu Yan

Authors

Xiaochun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Delin Xia
View author publications
You can also search for this author in PubMed Google Scholar
Puliu Yan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, X., Wu, M., Xia, D., Yan, P. (2005). Difference-Similitude Matrix in Text Classification. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_3

Download citation

DOI: https://doi.org/10.1007/11540007_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics