Skip to main content

A Hierarchy-Aware Approach to the Multiaspect Text Categorization Problem

  • Chapter
  • First Online:

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 361))

Abstract

We advance our work on a special text categorization problem, the multiaspect text categorization, introduced in our previous works. In general case, it assumes a hierarchy of categories, and documents are assigned to leaves of a category but within categories documents are further structured into sequences of documents, referred to as cases. This is much more complex than the classic text categorization. Previously, we proposed a number of approaches to deal the above problem but we took into account to a limited extent hierarchies occurring in the definition of the problem. Here, we we start with one of our best approaches proposed so far and extend it by assuming that categories are arranged into a hierarchy, and that there is a hierarchical relation between a category and its offspring cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. M. Ceci, D. Malerba, Classifying web documents in a hierarchy of categories: a comprehensive study. J. Intell. Inf. Syst. 28(1), 37–78 (2007)

    Article  Google Scholar 

  3. S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and two approaches to its solution, in Proceedings of the International Congress on Control and Information Processing 2013 (Cracow University of Technology, 2013)

    Google Scholar 

  4. S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and its solution, in Technical Transactions, vol. 4-AC (2013), pp. 7–16

    Google Scholar 

  5. S. Zadrożny, J. Kacprzyk, M. Gajewski, A novel approach to sequence-of-documents focused text categorization using the concept of a degree of fuzzy set subsethood, in Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society NAFIPS’2015 and 5th World Conference on Soft Computing 2015 (Redmond, WA, USA, 17–19 Aug 2015)

    Google Scholar 

  6. S. Zadrożny, J. Kacprzyk, M. Gajewski, A new two-stage approach to the multiaspect text categorization, in IEEE Symposium on Computational Intelligence for Human-like Intelligence, CIHLI 2015 (IEEE, Cape Town, South Africa, 8–10 Dec 2015). pp. 1484–1490

    Google Scholar 

  7. M. Gajewski, J. Kacprzyk, S. Zadrożny, Topic detection and tracking: a focused survey and a new variant. Informatyka Stosowana 2014(1), 133–147 (2014)

    Google Scholar 

  8. S. Zadrożny, J. Kacprzyk, M. Gajewski, A new approach to the multiaspect text categorization by using the support vector machines, in Challenging Problems and Solutions in Intelligent Systems, ed. by G. De Tré et al.(Springer) (to appear)

    Google Scholar 

  9. S. Zadrożny, J. Kacprzyk, M. Gajewski, Multiaspect text categorization problem solving: a nearest neighbours classifier based approaches and beyond. J. Autom. Mob. Robot. Intell. Syst. 9, 58–70 (2015)

    Google Scholar 

  10. S. Zadrożny, J. Kacprzyk, M. Gajewski, On the detection of new cases in multiaspect text categorization: a comparison of approaches, in Proceedings of the Congress on Information Technology, Computational and Experimental Physics (AGH University of Science and Technology, 2015), pp. 213–218

    Google Scholar 

  11. D. Koller, M. Sahami, Hierarchically classifying documents using very few words, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), ed. by D. H. Fisher (Nashville, Tennessee, USA, 8–12 July 1997), pp. 170–178

    Google Scholar 

  12. A.S. Weigend, E.D. Wiener, J.O. Pedersen, Exploiting hierarchy in text categorization. Inf. Retr. 1(3), 193–216 (1999)

    Article  Google Scholar 

  13. J.W.T. Wong, W. Kan, and G.H. Young, ACTION: automatic classification for full-text documents. SIGIR Forum 30(1), 26–41 (1996). https://doi.org/10.1145/381984.381987

    Article  Google Scholar 

  14. S. D’Alessio, K.A. Murray, R. Schiaffino, A. Kershenbaum, The effect of using hierarchical classifiers in text categorization, in Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications)—RIAO, 6th International Conference, Proceedings, ed. by J. Mariani and D. Harman, vol. 2000 (College de France, France, 12–14 Apr 2000) CID, pp. 302–313

    Google Scholar 

  15. S. Zadrożny, J. Kacprzyk, M. Gajewski, M. Wysocki, A novel text classification problem and its solution, in Technical Transactions. Automatic Control, vol. 4-AC (2013), pp. 7–16

    Google Scholar 

  16. J. Allan (ed.), Topic Detection and Tracking: Event-Based Information (Kluwer Academic Publishers, 2002)

    Google Scholar 

  17. L. Zadeh, A computational approach to fuzzy quantifiers in natural languages. Comput. Math. Appl. 9, 149–184 (1983)

    Article  MathSciNet  Google Scholar 

  18. B. Kosko, Fuzzy entropy and conditioning. Inf. Sci. 40(2), 165–174 (1986)

    Article  MathSciNet  Google Scholar 

  19. V.R. Young, Fuzzy subsethood, in Fuzzy Sets and Systems, vol. 77 (1996), pp. 371–384

    Article  MathSciNet  Google Scholar 

  20. L.A. Zadeh, Probability measures of fuzzy events. J. Math. Anal. Appl. 23, 421–427 (1968)

    Article  MathSciNet  Google Scholar 

  21. R.R. Yager, Weighted triangular norms using generating functions. Int. J. Intell. Syst. 19(3), 217–231 (2004). https://doi.org/10.1002/int.10162

    Article  Google Scholar 

  22. S. Bird et al., The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics, in Proceedings of Language Resources and Evaluation Conference (LREC 08) (Marrakesh, Morocco), pp. 1755–1759

    Google Scholar 

  23. R Core Team, R: A Language and Environment for Statistical Computing, in R Foundation for Statistical Computing (Vienna, Austria, 2014), http://www.R-project.org

  24. I. Feinerer, K. Hornik, D. Meyer, Text mining infrastructure. R. J. Stat. Softw. 25(5), 1–54 (2008)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Science Centre under contracts no. UMO-2011/01/B/ST6/06908 and UMO-2012/05/B/ST6/03068.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sławomir Zadrożny .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zadrożny, S., Kacprzyk, J., Gajewski, M. (2018). A Hierarchy-Aware Approach to the Multiaspect Text Categorization Problem. In: Zadeh, L., Yager, R., Shahbazova, S., Reformat, M., Kreinovich, V. (eds) Recent Developments and the New Direction in Soft-Computing Foundations and Applications. Studies in Fuzziness and Soft Computing, vol 361. Springer, Cham. https://doi.org/10.1007/978-3-319-75408-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75408-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75407-9

  • Online ISBN: 978-3-319-75408-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics