Skip to main content

Identifying Exceptional Descriptions of People Using Topic Modeling and Subgroup Discovery

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11177))

Abstract

Descriptions of images form the backbone for many intelligent systems, assuming descriptions that randomly vary in construction and content, but where description content is homogeneous. This assumption becomes problematic being extended to descriptions of images of people [14], where people are known to show systematic biases in how they process others [19]. Therefore, this paper presents a novel approach for discovering exceptional subgroups of descriptions in which the content of those descriptions reliably differs from the general set of descriptions. We develop a novel interestingness measure for subgroup discovery appropriate for probability distributions across semantic representations. The proposed method is applied to a web-based experiment in which 500 raters describe images of 200 people. Our analysis identifies multiple exceptional subgroups and the attributes of the respective raters and images. We further discuss implications for intelligent systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The difference between documents was calculated as the sum across all pairs of descriptions of the cosine similarity of the topic probability distributions. The number of topics per document was calculated as the sum across all descriptions of the conditional entropy of the topic probability distribution.

  2. 2.

    http://www.vikamine.org.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of VLDB, pp. 487–499. Morgan Kaufmann (1994)

    Google Scholar 

  2. Antol, S., et al.: VQA: visual question answering. In: Proceedings of IEEE ICCV, pp. 2425–2433 (2015)

    Google Scholar 

  3. Atzmueller, M.: Subgroup discovery. WIREs DMKD 5(1), 35–49 (2015)

    Google Scholar 

  4. Atzmueller, M., Lemmerich, F.: VIKAMINE – open-source subgroup discovery, pattern mining, and analytics. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 842–845. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_60

    Chapter  Google Scholar 

  5. Atzmueller, M., Lemmerich, F.: Exploratory pattern mining on social media using geo-references and social tagging information. IJWS 2(1/2), 80–112 (2013)

    Article  Google Scholar 

  6. Atzmueller, M., Puppe, F.: SD-Map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6

    Chapter  Google Scholar 

  7. Bayardo, R., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Discov. 4, 217–240 (2000)

    Article  Google Scholar 

  8. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  9. Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)

    Article  MathSciNet  Google Scholar 

  10. Chrupała, G., Gelderloos, L., Alishahi, A.: Representations of language in a model of visually grounded speech signal. In: Proceedings of ACL, pp. 613–622 (2017)

    Google Scholar 

  11. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Discov. 30(1), 47–98 (2016)

    Article  MathSciNet  Google Scholar 

  12. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.S.: Fairness through awareness. CoRR abs/1104.3913 (2011)

    Google Scholar 

  13. Ganter, B., Wille, R.: Formal concept analysis. Wissenschaftliche Zeitschrift-Technischen Universitat Dresden 45, 8–13 (1996)

    MATH  Google Scholar 

  14. Gatt, A., et al.: Face2Text: collecting an annotated image description corpus for the generation of rich face descriptions. In: Proceedings of LREC (2018)

    Google Scholar 

  15. Herlitz, A., Lovén, J.: Sex differences and the own-gender bias in face recognition: a meta-analytic review. Visual Cogn. 21(9–10), 1306–1336 (2013)

    Article  Google Scholar 

  16. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  17. Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Discov. 30, 711–762 (2016). https://doi.org/10.1007/s10618-015-0436-8

    Article  MathSciNet  Google Scholar 

  18. Lemmerich, F., Becker, M., Atzmueller, M.: Generic pattern trees for exhaustive exceptional model mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 277–292. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_18

    Chapter  Google Scholar 

  19. Levin, D.T.: Race as a visual feature: using visual search and perceptual discrimination tasks to understand face categories and the cross-race recognition deficit. J. Exp. Psychol. Gen. 129(4), 559–574 (2000)

    Article  Google Scholar 

  20. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  21. Minka, T.: Estimating a Dirichlet distribution. Technical report, MIT (2000)

    Google Scholar 

  22. Sklar, M.: Fast MLE computation for the Dirichlet multinomial. arXiv:1405.0099

  23. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1521–1528. IEEE (2011)

    Google Scholar 

Download references

Acknowledgments

Funding for data collection was provided by a University of Adelaide Interdisciplinary Research Grant to C. Semmler, A. Hendrickson, R Heyer, A. Dick and A. van den Hengel. Furthermore, this work has also been partially supported by the German Research Council (DFG) under grant AT 88/4-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Atzmueller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hendrickson, A.T., Wang, J., Atzmueller, M. (2018). Identifying Exceptional Descriptions of People Using Topic Modeling and Subgroup Discovery. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham. https://doi.org/10.1007/978-3-030-01851-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01851-1_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01850-4

  • Online ISBN: 978-3-030-01851-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics