Abstract
We discuss the use of some elements of Zadeh’s computing with words and perceptions paradigms (cf. Zadeh and Kacprzyk [37, 38]) for the formulation and solution of automatic text document categorization. This problem is constantly gaining importance and popularity in view of a fast proliferation of textual information available on the Internet. The main issues addressed are the document representation and classification. The use of fuzzy logic for both problems has already been quite deeply studied though for the latter, i.e. classification, mainly in a more general context. Our approach is based mainly on the use of usuality qualification in the computing with words and perception paradigm that is technically handled by Zadeh’s classic calculus of linguistically quantified propositions [36]. Moreover, we employ results related to fuzzy (linguistic) queries in information retrieval, in particular various interpretations of weights of query terms. The methods developed are illustrated by example of a well known text corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Baeza-Yates and B. Ribeiro-Neto, editors. Modern Information Retrieval. Addison-Wesley, Reading, Massachusetts, 1999.
R. K. Belew and C. J. van Rijsbergen. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press, New York, NY, USA, 2000.
A. Bookstein. Fuzzy Requests: An Approach to Weighted Boolean Searches. Journal of the American Society for Information Sciences, 31:240–247, 1980.
G. Bordogna, P. Bosc, and G. Pasi. Extended Boolean Information Retrieval in Terms of Fuzzy Inclusion. In O. Pons, M. A. Vila, and J. Kacprzyk, editors, Knowledge Management in Fuzzy Databases, pages 234–246. Physica, Heidelberg, New York, 2000.
G. Bordogna, P. Carrara, and G. Pasi. Fuzzy Approaches to Extend Boolean Information Retrieval. In P. Bosc and J. Kacprzyk, editors, Fuzziness in Database Management Systems, pages 231–274. Physica, Heidelberg, 1995.
G. Bordogna and G. Pasi. Application of Fuzzy Sets Theory to Extend Boolean Information Retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval, pages 21–47. Physica, Heidelberg, New York, 2000.
C. Carlsson and R. Fuller. A New Look at Linguistic Importance Weighted Aggregation. In Proceedings of the Fourteenth European Meeting on Cybernetics and Systems Research, pages 169–174, Vienna, 1998. Austrian Society for Cybernetic Studies.
M. Delgado, J. L. Verdegay, and M. A. Vila. On Aggregation Operations of Linguistic Labels. International Journal of Intelligent System, 8:351–370, 1993.
D. Dubois, H. Fargier, and H. Prade. Beyond Min Aggregation in Multicriteria Decision: (Ordered) Weighted Min, Discri-min, Leximin. In R. R. Yager and J. Kacprzyk, editors, The Ordered Weighted Averaging Operators. Theory and Applications, pages 181–192. Kluwer Academic Publishers, Boston, Dordrecht, London, 1997.
D. Dubois and H. Prade. Using Fuzzy Sets in Flexible Querying: Why and How? In T. Andreasen, H. Christiansen, and H. L. Larsen, editors, Flexible Querying Answering Systems, pages 45–60. Kluwer Academic Publishers, Boston, Dordrecht, 1997.
E. Herrera-Viedma. An Information Retrieval System with Ordinal Linguistic Weighted Queries Based on Two Weighting Elements. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9:77–88, 2001.
E. Herrera-Viedma. Modeling the Retrieval Process of an Information Retrieval System Using an Ordinal Fuzzy Linguistic Approach. Journal of the American Society for Information Science and Technology (JASIST), 52(6):460–475, 2001.
T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning, pages 143–151, Nashville, US, 1997. Morgan Kaufmann.
J. Kacprzyk and S. Zadrożny. Computing withWords in Intelligent Database Querying: Standalone and Internet-Based Applications. Information Sciences, 134:71–109, 2001.
J. Kacprzyk, S. Zadrożny, and A. Ziółkowski. FQUERY III+: a “humanconsistent” database querying system based on fuzzy logic with linguistic quantifiers. Information Systems, 14:443–453, 1989.
J. Kacprzyk and A. Ziółkowski. Database Queries with Fuzzy Linguistic Quantifiers. IEEE Transactions on Systems, Man and Cybernetics, 16:474–479, 1986.
R. R. Korfhage. Information Storage and Retrieval. John Wiley and Sons, New York, 1997.
D. H. Kraft, G. Bordogna, and G. Pasi. An Extended Fuzzy Linguistic Approach to Generalize Boolean Information Retrieval. Journal of Information Sciences, 2(3):119–134, 1994.
D. H. Kraft, G. Bordogna, and G. Pasi. Fuzzy Set Techniques in Information Retrieval. In J. C. Bezdek, D. Dubois, and H. Prade, editors, Fuzzy Sets in Approximate Reasoning and Information Systems (The Handbook of Fuzzy Sets Vol. 3), pages 469–510. Kluwer Academic Publishers, Norwell, 1999.
D. D. Lewis. Reuters-21578, Dist. 1.0. online. http://www.research. att.com/~lewis.
M. F. Porter. An Algorithm for Sufix Stripping. Program, 14(3):130–137, 1980.
T. Radecki. Fuzzy Set Theoretical Approach to Document Retrieval. Information Processing and Management, 15:247–260, 1979.
J. Rocchio. Relevance Feedback in Information Retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971.
G. Salton, E. A. Fox, and H. Wu. Extended Boolean Information Retrieval. Communications of ACM, 26(11):1022–1036, 1983.
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, New York, 1983.
F. Sebastiani. A Tutorial on Automated Text Categorisation. In Proceedings of ASAI-99, 1st Argentinian Symposium on Artificial Intelligence, pages 7–35, Buenos Aires, 1999.
Stop Words list. http://www.indiana.edu/cgi-bin-local/ doIsearch.pl?Stopwords.
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, Boston, 1979.
R. R. Yager. A Note on Weighted Queries in Information Retrieval Systems. Journal of the American Society for Information Science, 38:23–24, 1987.
R. R. Yager. On Ordered Weighted Averaging Aggregation Operators in Multi-Criteria Decision Making. IEEE Transactions on Systems, Man and Cybernetics, 18:183–190, 1988.
R. R. Yager and J. Kacprzyk, editors. The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer Academic Publishers, Boston, 1997.
Y. Yang. An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1(1/2):67–88, 1999.
Y. Yang. A Study on Thresholding Strategies for Text Categorization. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pages 137–145, New Orleans, US, 2001. ACM.
Y. Yang and X. Liu. A Re-examination of Text Categorization Methods. In M. A. Hearst, F. Gey, and R. Tong, editors, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), pages 42–49, Berkeley, US, 1999. ACM.
L. A. Zadeh. The Concept of Linguistic Variable and its Applications to Approximate Reasoning. Parts I, II, III. Information Sciences, 8, 9:199–251 (8), 301–357 (8), 43–80 (9), 1975.
L. A. Zadeh. A Computational Approach to Fuzzy Quantifiers in Natural Languages. Computers and Mathematics, 9:149–184, 1983.
L. A. Zadeh and J. Kacprzyk, editors. Computing with Words in Information/ Intelligent Systems. Part 1: Foundations. Physica, Heidelberg, New York, 1999.
L. A. Zadeh and J. Kacprzyk, editors. Computing with Words in Information/ Intelligent Systems. Part 2: Applications. Physica, Heidelberg, New York, 1999.
S. Zadrożny, K. Ławcewicz, and J. Kacprzyk. Intelligent Linguistic Characterization and Retrieval of Textual Documents: An Internet-Based Application. In B. Bouchon-Meunier, L. Foulloy, and R. R. Yager, editors, Intelligent Systems for Information Processing — From Representation to Applications, pages 153–164. Elsevier, Amsterdam, 2003.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Kacprzyk, J., Zadrożny, S. (2007). Computing with Words for Text Categorization. In: Aspects of Automatic Text Analysis. Studies in Fuzziness and Soft Computing, vol 209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37522-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-37522-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37520-3
Online ISBN: 978-3-540-37522-7
eBook Packages: EngineeringEngineering (R0)