Abstract
This paper presents a hybrid framework combining self-organising map (SOM) and fuzzy theory for textual classification. Clustering using self-organizing maps is applied to produce multiple targets. In this paper, we propose that an amalgamation of SOM and association rule theory may hold the key to a more generic solution, less reliant on initial supervision and redundant user interaction. The results of clustering stem words from text documents could be utilised to derive association rules which designate the applicability of documents to the user. A four stage process is consequently detailed, demonstrating a generic example of how a graphical derivation of associations may be derived from a repository of text documents, or even a set of synopses of many such repositories. This research demonstrates the feasibility of applying such processes for data mining and knowledge discovery.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pendharkar, P.C., et al.: Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Systems with Applications 17(3), 223–232 (1999)
Ahonen, H., et al.: Applying Data Mining Techniques in Text Analysis, pp. 1–12. University of Helsinki, Helsinki (1997)
Kaski, S.: The Self-organizing Map (SOM), p. 1. Helsinki University of Technology, Helsinki (1999)
Kohonen, T., et al.: Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000)
Vesanto, J.: SOM-based data visualization methods. Intelligent Data Analysis 3(2), 111–126 (1999)
Klose, A., et al.: Interactive Text Retrieval Based on Document Similarities. Phys. Chem. Earch (A) 25(8), 649–654 (2000)
Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1-3), 61–77 (1998)
Savoy, J.: Statistical Inference in Retrieval Effectiveness Evaluation. Information Processing and Management 33(4), 495–512 (1997)
Riloff, E., Lehnert, W.: Information extraction as a basis for high-precision text classification. ACM Transactions on Information Systems 12(3), 296–333 (1994)
Chang, C.-H., Hsu, C.-C.: Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW. IEEE Transactions on Knowledge and Data Engineering 11(4), 595–608 (1999)
O’Donnell, R., Smeaton, A.: A Linguistic Approach to Information Retrieval. In: 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Taylor Graham Publishing, London (1996)
Srinivasan, P., et al.: Vocabulary mining for information retrieval: rough sets and fuzzy sets. Information Processing and Management 37(1), 15–38 (2001)
Kaski, S., et al.: WEBSOM - Self-organizing maps of document collections. Neurocomputing 21(1-3), 101–117 (1998)
Vesanto, J., Alhoniemi, E.: Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks 11(3), 586–600 (2000)
Alahakoon, D., Halgamuge, S.K., Srinivasan, B.: Dynamic Self Organizing Maps with Controlled Growth for Knowledge Discovery. IEEE Transactions on Neural Networks 11(3), 601–614 (2000)
De Ketelaere, B., et al.: A hierarchical Self-Organizing Map for classification problems, pp. 1–5. K.U. Leuven, Belgium (1997)
Cervera, E., del Pobil, A.P.: Multiple self-organizing maps: A hybrid learning scheme. Neurocomputing 16(4), 309–318 (1997)
Wan, W., Fraser, D.: Multisource Data Fusion with Multiple Self-Organizing Maps. IEEE Transactions on Geoscience and Remote Sensing 37(3), 1344–1349 (1999)
Kohonen, T., Somervuo, P.: Self-organizing maps of symbol strings. Neurocomputing 21(1-3), 19–30 (1998)
Chen, H., et al.: Internet Browsing and Searching: User Evaluations of Cate- gory Map and Concept Space Techniques. Journal of the American Society for Information Science 49(7), 582–603 (1998)
De Backer, S., Naud, A., Scheunders, P.: Non-linear dimensionality reduc- tion techniques for unsupervised feature extraction. Pattern Recognition Letters 19(8), 711–720 (1998)
Yin, H., Allinson, N.M.: Interpolating self-organising map (iSOM). Electronics Letters 35(19), 1649–1650 (1999)
Hämäläinen, T., et al.: Mapping of SOM and LVQ algorithms on a tree shape parallel computer system. Parallel Computing 23(3), 271–289 (1997)
Walter, J., Ritter, H.: Rapid learning with parametrized self-organizing maps. Neurocomputing 12(2-3), 131–153 (1996)
Kangas, J., Kohonen, T.: Developments and applications of the self-organizing map and related algorithms. Mathematics and Computers in Simulation 41(1-2), 3–12 (1996)
Joshi, K.P.: Analysis of Data Mining Algorithms, 1–19 (1997), http://www.gl.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm
Zaki, M.J.: Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Boley, D., et al.: Partioning-based clustering for Web document categorization. Decision Support Systems 27(3), 329–341 (1999)
Pudi, V., Haritsa, J.R.: Quantifying the Utility of the Past in Mining Large Databases. Information Systems 25(5), 323–343 (2000)
Gunther, P., Chen, P.: A Framework to Hybrid SOM Performance for Textual Classification. In: Proceedings of the 10th International IEEE conference on Fuzzy Systems, pp. 968–971. IEEE CS Press, Los Alamitos (2001)
Prade, H., Testemale, C.: Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries. Information Sciences 34, 115–143 (1984)
Bosc, P., Galibourg, M.: Indexing Principles for a Fuzzy Data Base. Information Systems 14, 493–499 (1989)
Pirolli, P., Schank, P., Hearst, M.A., Diehl, C.: Scatter/ Gather Browsing Communicates the Topic Structure of a Very large Text Collection. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) (May 1996)
Drobics, M., Bodenhofer, U., Winiwarter, W.: Interpretation of Self- Organizing Maps with Fuzzy Rules (2000). In: Proceedings of ICTAI 2000, Vancouver, pp. 304-311, (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, YP.P. (2003). A Hybrid Framework Using SOM and Fuzzy Theory for Textual Classification in Data Mining. In: Lawry, J., Shanahan, J., L. Ralescu, A. (eds) Modelling with Words. Lecture Notes in Computer Science(), vol 2873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39906-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-39906-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20487-9
Online ISBN: 978-3-540-39906-3
eBook Packages: Springer Book Archive