Abstract
We advance our work on a special text categorization problem, the multiaspect text categorization, introduced in our previous works. In general case, it assumes a hierarchy of categories, and documents are assigned to leaves of a category but within categories documents are further structured into sequences of documents, referred to as cases. This is much more complex than the classic text categorization. Previously, we proposed a number of approaches to deal the above problem but we took into account to a limited extent hierarchies occurring in the definition of the problem. Here, we we start with one of our best approaches proposed so far and extend it by assuming that categories are arranged into a hierarchy, and that there is a hierarchical relation between a category and its offspring cases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
M. Ceci, D. Malerba, Classifying web documents in a hierarchy of categories: a comprehensive study. J. Intell. Inf. Syst. 28(1), 37–78 (2007)
S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and two approaches to its solution, in Proceedings of the International Congress on Control and Information Processing 2013 (Cracow University of Technology, 2013)
S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and its solution, in Technical Transactions, vol. 4-AC (2013), pp. 7–16
S. Zadrożny, J. Kacprzyk, M. Gajewski, A novel approach to sequence-of-documents focused text categorization using the concept of a degree of fuzzy set subsethood, in Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society NAFIPS’2015 and 5th World Conference on Soft Computing 2015 (Redmond, WA, USA, 17–19 Aug 2015)
S. Zadrożny, J. Kacprzyk, M. Gajewski, A new two-stage approach to the multiaspect text categorization, in IEEE Symposium on Computational Intelligence for Human-like Intelligence, CIHLI 2015 (IEEE, Cape Town, South Africa, 8–10 Dec 2015). pp. 1484–1490
M. Gajewski, J. Kacprzyk, S. Zadrożny, Topic detection and tracking: a focused survey and a new variant. Informatyka Stosowana 2014(1), 133–147 (2014)
S. Zadrożny, J. Kacprzyk, M. Gajewski, A new approach to the multiaspect text categorization by using the support vector machines, in Challenging Problems and Solutions in Intelligent Systems, ed. by G. De Tré et al.(Springer) (to appear)
S. Zadrożny, J. Kacprzyk, M. Gajewski, Multiaspect text categorization problem solving: a nearest neighbours classifier based approaches and beyond. J. Autom. Mob. Robot. Intell. Syst. 9, 58–70 (2015)
S. Zadrożny, J. Kacprzyk, M. Gajewski, On the detection of new cases in multiaspect text categorization: a comparison of approaches, in Proceedings of the Congress on Information Technology, Computational and Experimental Physics (AGH University of Science and Technology, 2015), pp. 213–218
D. Koller, M. Sahami, Hierarchically classifying documents using very few words, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), ed. by D. H. Fisher (Nashville, Tennessee, USA, 8–12 July 1997), pp. 170–178
A.S. Weigend, E.D. Wiener, J.O. Pedersen, Exploiting hierarchy in text categorization. Inf. Retr. 1(3), 193–216 (1999)
J.W.T. Wong, W. Kan, and G.H. Young, ACTION: automatic classification for full-text documents. SIGIR Forum 30(1), 26–41 (1996). https://doi.org/10.1145/381984.381987
S. D’Alessio, K.A. Murray, R. Schiaffino, A. Kershenbaum, The effect of using hierarchical classifiers in text categorization, in Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications)—RIAO, 6th International Conference, Proceedings, ed. by J. Mariani and D. Harman, vol. 2000 (College de France, France, 12–14 Apr 2000) CID, pp. 302–313
S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and its solution, in Technical Transactions. Automatic Control, vol. 4-AC (2013), pp. 7–16
J. Allan (ed.), Topic Detection and Tracking: Event-Based Information (Kluwer Academic Publishers, 2002)
L. Zadeh, A computational approach to fuzzy quantifiers in natural languages. Comput. Math. Appl. 9, 149–184 (1983)
B. Kosko, Fuzzy entropy and conditioning. Inf. Sci. 40(2), 165–174 (1986)
V.R. Young, Fuzzy subsethood, in Fuzzy Sets and Systems, vol. 77 (1996), pp. 371–384
L.A. Zadeh, Probability measures of fuzzy events. J. Math. Anal. Appl. 23, 421–427 (1968)
R.R. Yager, Weighted triangular norms using generating functions. Int. J. Intell. Syst. 19(3), 217–231 (2004). https://doi.org/10.1002/int.10162
S. Bird et al., The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics, in Proceedings of Language Resources and Evaluation Conference (LREC 08) (Marrakesh, Morocco), pp. 1755–1759
R Core Team, R: A Language and Environment for Statistical Computing, in R Foundation for Statistical Computing (Vienna, Austria, 2014), http://www.R-project.org
I. Feinerer, K. Hornik, D. Meyer, Text mining infrastructure. R. J. Stat. Softw. 25(5), 1–54 (2008)
Acknowledgements
This work is supported by the National Science Centre under contracts no. UMO-2011/01/B/ST6/06908 and UMO-2012/05/B/ST6/03068.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Zadrożny, S., Kacprzyk, J., Gajewski, M. (2018). A Hierarchy-Aware Approach to the Multiaspect Text Categorization Problem. In: Zadeh, L., Yager, R., Shahbazova, S., Reformat, M., Kreinovich, V. (eds) Recent Developments and the New Direction in Soft-Computing Foundations and Applications. Studies in Fuzziness and Soft Computing, vol 361. Springer, Cham. https://doi.org/10.1007/978-3-319-75408-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-75408-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75407-9
Online ISBN: 978-3-319-75408-6
eBook Packages: EngineeringEngineering (R0)