A Comparative Study of Thresholding Strategies in Progressive Filtering

Addis, Andrea; Armano, Giuliano; Vargiu, Eloisa

doi:10.1007/978-3-642-23954-0_4

Andrea Addis¹⁹,
Giuliano Armano¹⁹ &
Eloisa Vargiu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6934))

Included in the following conference series:

Congress of the Italian Association for Artificial Intelligence

971 Accesses

Abstract

Thresholding strategies in automated text categorization are an under-explored area of research. Indeed, thresholding strategies are often considered a post-processing step of minor importance, the underlying assumptions being that they do not make a difference in the performance of a classifier and that finding the optimal thresholding strategy for any given classifier is trivial. Neither these assumptions are true. In this paper, we concentrate on progressive filtering, a hierarchical text categorization technique that relies on a local-classifier-per-node approach, thus mimicking the underlying taxonomy of categories. The focus of the paper is on assessing TSA, a greedy threshold selection algorithm, against a relaxed brute-force algorithm and the most relevant state-of-the-art algorithms. Experiments, performed on Reuters, confirm the validity of TSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Article 03 June 2021

Text Categorization: A Lazy Learning-Based Approach

References

Addis, A., Armano, G., Vargiu, E.: From a generic multiagent architecture to multiagent information retrieval systems. In: AT2AI-6, Sixth International Workshop, From Agent Theory to Agent Implementation, pp. 3–9 (2008)
Google Scholar
Addis, A., Armano, G., Vargiu, E.: Assessing progressive filtering to perform hierarchical text categorization in presence of input imbalance. In: Proceedings of International Conference on Knowledge Discovery and Information Retrieval, KDIR 2010 (2010)
Google Scholar
Addis, A., Armano, G., Vargiu, E.: Experimental assessment of a threshold selection algorithm for tuning classifiers in the field of hierarchical text categorization. In: Proceedings of 17th RCRA International Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion (2010)
Google Scholar
Addis, A., Armano, G., Vargiu, E.: Using the progressive filtering approach to deal with input imbalance in large-scale taxonomies. In: Large-Scale Hierarchical Classification Workshop (2010)
Google Scholar
Addis, A., Armano, G., Vargiu, E.: A comparative experimental assessment of a threshold selection algorithm in hierarchical text categorization. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611. Springer, Heidelberg (2011)
Chapter Google Scholar
Bellifemine, F., Caire, G., Greenwood, D.: Developing Multi-Agent Systems with JADE. Wiley Series in Agent Technology. John Wiley and Sons, Chichester (2007)
Book Google Scholar
Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1), 37–78 (2007)
Article Google Scholar
Cost, R.S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10, 57–78 (1993)
Google Scholar
D’Alessio, S., Murray, K., Schiaffino, R.: The effect of using hierarchical classifiers in text categorization. In: Proceedings of of the 6th International Conference on Recherche Information Assiste par Ordinateur (RIAO), pp. 302–313 (2000)
Google Scholar
Dumais, S.T., Chen, H.: Hierarchical classification of Web content. In: Belkin, N.J., Ingwersen, P., Leong, M.K. (eds.) Proceedings of SIGIR 2000, 23rd ACM International Conference on Research and Development in Information Retrieval, pp. 256–263. ACM Press, New York (2000)
Google Scholar
Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: SIGIR 1995: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 246–254. ACM, New York (1995)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Ruiz, M.E.: Combining machine learning and hierarchical structures for text categorization. Ph.D. thesis (2001), supervisor-Srinivasan, Padmini
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–55 (2002)
Article MathSciNet Google Scholar
Silla, C., Freitas, A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 31–72 (2011), http://dx.doi.org/10.1007/s10618-010-0175-9 , 10.1007/s10618-010-0175-9
Article MathSciNet MATH Google Scholar
Sun, A., Lim, E.: Hierarchical text classification and evaluation. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 521–528. IEEE Computer Society, Washington, DC (2001)
Google Scholar
Wu, F., Zhang, J., Honavar, V.: Learning classifiers using hierarchically structured class taxonomies. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 313–320. Springer, Heidelberg (2005)
Chapter Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1/2), 69–90 (1999)
Article Google Scholar
Yang, Y.: A study of thresholding strategies for text categorization. In: SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 137–145. ACM, New York (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, University of Cagliari, Italy
Andrea Addis, Giuliano Armano & Eloisa Vargiu

Authors

Andrea Addis
View author publications
You can also search for this author in PubMed Google Scholar
Giuliano Armano
View author publications
You can also search for this author in PubMed Google Scholar
Eloisa Vargiu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemical, Management, Computer, and Mechanical Engineering (DICGIM), University of Palermo, Viale delle Scienze, Edificio 6, 90128, Palermo, Italy
Roberto Pirrone & Filippo Sorbello &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Addis, A., Armano, G., Vargiu, E. (2011). A Comparative Study of Thresholding Strategies in Progressive Filtering. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-23954-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23953-3
Online ISBN: 978-3-642-23954-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study of Thresholding Strategies in Progressive Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Text Categorization: A Lazy Learning-Based Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparative Study of Thresholding Strategies in Progressive Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Text Categorization: A Lazy Learning-Based Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation