Abstract
We address some crucial problem associated with text categorization, a local feature selection. It seems that intuitionistic fuzzy sets can be an effective and efficient tool making it possible to assess each term (from a feature set for each category) from a point of view of both its indicative and non-indicative ability. It is important especially for high dimensional problems to improve text filtering via a confident rejection of non-relevant documents. Moreover, we indicate that intuitionistic fuzzy sets are a good tool for the classification of imbalanced and overlapping classes, a commonly encountered case in text categorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atanassov, K.: Intuitionistic Fuzzy Sets. VII ITKR Session. Sofia (Deposed in Centr. Sci.-Techn. Library of Bulg. Acad. of Sci., 1697/84) (in Bulgarian) (1983)
Atanassov, K.: Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 20, 87–96 (1986)
Atanassov, K.: Intuitionistic Fuzzy Sets: Theory and Applications. Springer, Heidelberg (1999)
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technigue. Artificial Intelligence Research 16, 321–357 (2002)
Dubois, D., Gottwald, S., Hajek, P., Kacprzyk, J., Prade, H.: Terminological difficulties in fuzzy set theory - the case of ”Intuitionistic Fuzzy Sets”. Fuzzy Sets and Systems 156, 496–499 (2005)
Fawcett, T., Provost, F.: Adaptive Fraud Detection. Data Mining and Knowledge Discovery 3(1), 291–316 (1997)
Forman, G.: An experimental study of feature selection metrics for text categorization. Journal of Machine Learning Research 3, 1289–1305 (2003)
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: 4th European Conf. on Research and Advanced Technology for Digital Libraries ECDL 2000, pp. 59–68 (2000)
Japkowicz, N.: Class Imbalances: Are we Focusing on the Right Issue? In: Workshop on Learning from Imbalanced Data II, ICML, Washington (2003)
Joachims, T.: Text categorization with support vector machines: lerning with many relevant features. In: European Conf. on machine Learning (ECML), pp. 137–142. Springer, Berlin (1998)
Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997)
Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)
Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proc. 11th Conf. on Machine Learning, pp. 148–156 (1994)
Lingras, P., Butz, C.J.: Precision and Recall in Rough Support Vector Machines. In: 2007 IEEE Int. Conf. on Granular Computing, pp. 654–658 (2007)
Mladenic, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: 16th Int. Conf. on Machine Learning, pp. 258–267 (1999)
Mladenic, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Decision Support Systems 35, 45–87 (2003)
Van Rijsbergen, C.J.: Information retrieval, 2nd edn. Butterworths, London (1979)
Van Rijsbergen, C.J., Harper, D.J., Porter, M.F.: The selection of good search terms. Information Processing and Management 17, 77–91 (1981)
Sebastiani, F.: Machine Learning in Automated Text Categorizaton. ACM Coputing Surveys 34(1), 1–47 (2002)
Sousa, P., Pimentao, J., Santos, B., Moura-Pires, F.: Feture selection algorithms to improve documents classification performance. LNAI 2663, pp. 288–296 (2003)
Soucy, P., Mineau, G.: Feature Selection Strategies for Text Categorization. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS (LNAI), vol. 2671, pp. 505–509. Springer, Heidelberg (2003)
Szmidt, E., Baldwin, J.: Assigning the parameters for Intuitionistic Fuzzy Sets. Notes on IFSs 11(6), 1–12 (2005)
Szmidt, E., Baldwin, J.: Intuitionistic fuzzy set functions, mass assignment theory, possibility theory and histograms. In: Proc. of 2006 IEEE World Congress on Computational Intelligence, Vancouver, Canada, pp. 234–243, Omnipress (IEEE Catalog Number: 06CH37726D; ISBN: 0-7803-9489-5) (2006)
Szmidt, E., Kacprzyk, J.: Remarks on some applications of intuitionistic fuzzy sets in decision making. Notes on IFS 2(3), 22–31 (1996c)
Szmidt, E., Kacprzyk, J.: On measuring distances between intuitionistic fuzzy sets. Notes on IFS 3(4), 1–13 (1997)
Szmidt, E., Kacprzyk, J.: Group Decision Making under Intuitionistic Fuzzy Preference Relations. In: IPMU 1998, Paris, La Sorbonne, pp. 172–178 (1998a)
Szmidt, E., Kacprzyk, J.: Applications of Intuitionistic Fuzzy Sets in Decision Making. In: EUSFLAT 1999, pp. 150–158 (1998b)
Szmidt, E., Kacprzyk, J.: Distances between intuitionistic fuzzy sets. Fuzzy Sets and Systems 114(3), 505–518 (2000)
Szmidt, E., Kacprzyk, J.: On Measures on Consensus Under Intuitionistic Fuzzy Relations. In: IPMU 2000, pp. 1454–1461 (2000)
Szmidt, E., Kacprzyk, J.: Entropy for intuitionistic fuzzy sets. Fuzzy Sets and Systems 118(3), 467–477 (2001)
Szmidt, E., Kacprzyk, J.: Analysis of Consensus under Intuitionistic Fuzzy Preferences. In: Proc. Int. Conf. in Fuzzy Logic and Technology. De Montfort Univ. Leicester, pp. 79–82 (2001)
Szmidt, E., Kacprzyk, J.: An Intuitionistic Fuzzy Set Based Approach to Intelligent Data Analysis (an application to medical diagnosis). In: Abraham, A., Jain, L., Kacprzyk, J. (eds.) Recent Advances in Intelligent Paradigms and and Applications, pp. 57–70. Springer, Heidelberg (2002)
Szmidt, E., Kacprzyk, J.: Analysis of Agreement in a Group of Experts via Distances Between Intuitionistic Fuzzy Preferences. In: Proc. 9th Int. Conf. IPMU 2002, Annecy, France, pp. 1859–1865 (2002)
Szmidt, E., Kacprzyk, J.: An Intuitionistic Fuzzy Set Based Approach to Intelligent Data Analysis (an application to medical diagnosis). In: Abraham, A., Jain, L., Kacprzyk, J. (eds.) Recent Advances in Intelligent Paradigms and Applications, pp. 57–70. Springer, Heidelberg (2002b)
Szmidt, E., Kacprzyk, J.: Evaluation of Agreement in a Group of Experts via Distances Between Intuitionistic Fuzzy Sets. In: Proc. IS 2002 – Int. IEEE Symposium: Intelligent Systems, Varna, IEEE Catalog Number 02EX499, pp. 166–170 (2002c)
Szmidt, E., Kacprzyk, J.: A New Concept of a Similarity Measure for Intuitionistic Fuzzy Sets and Its Use in Group Decision Making. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 272–282. Springer, Heidelberg (2005)
Szmidt, E., Kacprzyk, J.: Distances Between Intuitionistic Fuzzy Sets: Straightforward Approaches may not work. In: 3rd International IEEE Conference Intelligent Systems IS06, London, May 2006, pp. 716–721 (2006)
Szmidt, E., Kacprzyk, J.: An Application of Intuitionistic Fuzzy Set Similarity Measures to a Multi-criteria Decision Making Problem. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 314–323. Springer, Heidelberg (2006)
Szmidt, E., Kacprzyk, J.: Some Problems with Entropy Measures for the Atanassov Intuitionistic Fuzzy Sets. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 291–297. Springer, Heidelberg (2007)
Szmidt, E., Kacprzyk, J.: A New Similarity Measure for Intuitionistic Fuzzy Sets: Straightforward Approaches not work. In: 2007 IEEE Conf. on Fuzzy Sytems, pp. 481–486 (2007a); IEEE Catalog Number: 07CH37904C,ISBN: 1-4244-1210-2
Szmidt, E., Kukier, M.: Classification of Imbalanced and Overlapping Classes using Intuitionistic Fuzzy Sets. In: 3rd International IEEE Conference on Intelligent Systems IS 2006, pp. 722–727 (2006)
Szmidt, E., Kukier, M.: A New Approach to Classification of Imbalanced Classes via Atanassov’s Intuitionistic Fuzzy Sets. In: Wang, H.-F. (ed.) Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery (in press )
Torkkola, K.: Discriminative features for text document classification. In: Int. Conf. on Pattern Recognition, Canada (2002)
Visa, S., Ralescu, A.: Experiments in guided class rebalance based on class structure. In: 15th Midwest Artificial Intelligence and Cognitive Science Conference, Dayton, USA, pp. 8–14 (2004)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Fisher Jr., D.H. (ed.) The 14th Int. Conf. on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Yang, Y.: An evaluation of statistical approach to text categorization. Journal of Information retrieval 1(1/2), 67–88 (1999)
Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Zadrozny, S., Kacprzyk, J.: Computing with words for text processing: An approach to the text categorization. Information Sciences 176, 415–437 (2006)
Zhang, J., Mani, J.: knn approach to unbalanced data distributions: A case study involving information extraction. In: Proceedings of the ICML-2003 Workshop: Learning with Imbalanced Data Sets II, pp. 42–48 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szmidt, E., Kacprzyk, J. (2008). Using Intuitionistic Fuzzy Sets in Text Categorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)