Skip to main content

A Framework for Deep Quantification Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12457))

Abstract

A quantification learning task estimates class ratios or class distribution given a test set. Quantification learning is useful for a variety of application domains such as commerce, public health, and politics. For instance, it is desirable to automatically estimate the proportion of customer satisfaction in different aspects from product reviews to improve customer relationships. We formulate the quantification learning problem as a maximum likelihood problem and propose the first end-to-end Deep Quantification Network (DQN) framework. DQN jointly learns quantification feature representations and directly predicts the class distribution. Compared to classification-based quantification methods, DQN avoids three separate steps: classification of individual instances, calculation of the predicted class ratios, and class ratio adjustment to account for classification errors. We evaluated DQN on four public datasets, ranging from movie and product reviews to multi-class news. We compared DQN against six existing quantification methods and conducted a sensitivity analysis of DQN performance. Compared to the best existing method in our study, (1) DQN reduces Mean Absolute Error (MAE) by about 35%. (2) DQN uses around 40% less training samples to achieve a comparable MAE.

This work is partially supported in part by the NSF SBE Grant No. 1729775.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)

    Google Scholar 

  2. Barranquero, J., et al.: On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit. 46(2), 472–82 (2013)

    Article  Google Scholar 

  3. Asoh, H., et al.: A fast and simple method for profiling a population of twitter users. In: The Third International Workshop on Mining Ubiquitous and Social Environments. Citeseer (2012)

    Google Scholar 

  4. Buck, A.A., Gart, J.J.: Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. Am. J. Epidemiol. 83(3), 593–602 (1966)

    Article  Google Scholar 

  5. Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  6. González, P., Castaño, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. (CSUR) 50(5), 74 (2017)

    Article  Google Scholar 

  7. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  8. Hofer, V., Krempl, G.: Drift mining in data: a framework for addressing drift in classification. Comput. Stat. Data Anal. 57(1), 377–391 (2013)

    Article  MathSciNet  Google Scholar 

  9. King, G., Lu, Y.: Verbal autopsy methods with multiple causes of death. Stat. Sci. 23(1), 78–91 (2008)

    Article  MathSciNet  Google Scholar 

  10. González-Castro, V., Alaiz-Rodríguez, R., Fernández-Robles, L., Guzmán-Martínez, R., Alegre, E.: Estimating class proportions in boar semen analysis using the Hellinger distance. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS (LNAI), vol. 6096, pp. 284–293. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13022-9_29

    Chapter  Google Scholar 

  11. Forman, G.: Counting positives accurately despite inaccurate classification. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 564–575. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_55

    Chapter  Google Scholar 

  12. Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008)

    Article  MathSciNet  Google Scholar 

  13. Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010)

    Article  Google Scholar 

  14. Bella, A., Ferri, C., Hernández-Orallo, J., et al.: Quantification via probability estimators. In: 2010 IEEE International Conference on Data Mining, pp. 737–742. IEEE (2010)

    Google Scholar 

  15. Milli, L., Monreale, A., Rossetti, G., et al.: Quantification trees. In: ICDM, pp. 528–536. IEEE (2013)

    Google Scholar 

  16. Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), 27:1–27:27 (2015)

    Article  Google Scholar 

  17. Barranquero, J., et al.: Quantification-oriented learning based on reliable classifiers. Pattern Recognit. 48(2), 591–604 (2015)

    Article  Google Scholar 

  18. Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning (2005)

    Google Scholar 

  19. Pérez-Gállego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017)

    Article  Google Scholar 

  20. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  21. Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003)

    Article  MathSciNet  Google Scholar 

  22. Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  23. Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Ravenio Books, Cambridge (2016)

    Google Scholar 

  24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  25. Maas, A.L., et al.: Learning word vectors for sentiment analysis. In: ACL, pp. 142–150 (2011)

    Google Scholar 

  26. Zhang, X., et al.: Character-level convolutional networks for text classification. In: NIPS, pp. 649–657 (2015)

    Google Scholar 

  27. Lang, K.: NewsWeeder: learning to filter netnews. In: Machine Learning Proceedings, pp. 331–339 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qi, L., Khaleel, M., Tavanapong, W., Sukul, A., Peterson, D. (2021). A Framework for Deep Quantification Learning. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67658-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67657-5

  • Online ISBN: 978-3-030-67658-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics