Skip to main content
Log in

Classifying sensitive content in online advertisements with deep learning

  • Applications
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

In online advertising, an important quality control step is to audit advertising images (“creatives”) before they appear on publishers’ Web pages. This ensures that advertisements only appear on Web pages where the ad is appropriate. If a creative with sensitive content such as gambling and pornography is displayed on the wrong Web page, it can ruin the user’s experience, the publisher’s reputation, and may have legal implications. To protect against this, humans must audit every creative before it is displayed through our ad exchange; this process is costly and time-consuming. To detect sensitive content, we use a pre-trained deep convolutional neural network (Xception Chollet in: The IEEE conference on computer vision and pattern recognition (CVPR), 2017) to process the creative image, and merge its features with the historical distribution of categories associated with the creative’s landing page (the Web page that loads when the ad is clicked, which may also contain sensitive content). This representation is then passed through a series of fully connected layers to predict the sensitive category. The trained model achieves slightly better than human performance (model accuracy 99.92%; human accuracy 99.88%) on a large fraction of creatives (61%), while making 3.5 times fewer mistakes in very sensitive categories. The main challenges we faced were to detect, with high accuracy, creatives from 10 “very sensitive” categories as determined by our Creative Audit team, along with a highly imbalanced data set with 95% of creatives having no sensitive categories. This paper extends the work we described in Austin et al. (in: Proceedings of the 2018 IEEE international conference on data science and advanced analytics (DSAA), DSAA’18, 2018). It demonstrates the successful usage of deep learning in production as a method for detecting sensitive creatives, while respecting the constraints set by business.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), pp. 265–284 (2016)

  2. adperium: Appnexus standards (2017). https://www.adperium.com/misc/AppNexus_standards.pdf

  3. Andrews, M.: File name hashing: creating a hashed directory structure (2017). https://medium.com/eonian-technologies/file-name-hashing-creating-a-hashed-directory-structure-eabb03aa4091

  4. Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, pp 381–387 (2000)

  5. Chen, J., Sun, B., Li, H., Lu, H., Hua, X.S.: Deep ctr prediction in display advertising. In: Proceedings of the 2016 ACM on Multimedia Conference, MM ’16, pp. 811–820. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2964284.2964325. http://doi.acm.org/10.1145/2964284.2964325

  6. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  7. Chollet, F., et al.: Keras. (2015) https://keras.io

  8. Clarifai: Nsfw (2016). https://www.clarifai.com/models/nsfw-image-recognition-model-e9576d86d2004ed1a38ba0cf39ecb4b1

  9. Connie, T., Al-Shabi, M., Goh, M.: Smart content recognition from images using a mixture of convolutional neural networks. In: Kim, K.J., Kim, H., Baek, N. (eds.) IT Convergence and Security 2017, pp. 11–18. Springer, Singapore (2018)

    Chapter  Google Scholar 

  10. Corp., I.: Zeromq whitepapers- multithreading magic (2017). http://zeromq.org/whitepapers:multithreading-magic

  11. Facebook: advertising policies (2017). https://www.facebook.com/policies/ads/

  12. Ge, T., Zhao, L., Zhou, G., Chen, K., Liu, S., Yi, H., Hu, Z., Liu, B., Sun, P., Liu, H., Yi, P., Huang, S., Zhang, Z., Zhu, X., Zhang, Y., Gai, K.: Image matters: visually modeling user behaviors using advanced model server. arXiv preprint arXiv:1711.06505v2 (2017)

  13. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  14. Google: Adwords policies (2017). https://support.google.com/adwordspolicy/answer/6008942?hl=en

  15. Group, T.H.: The hdf5 library and file format (2018). https://www.hdfgroup.org/solutions/hdf5/

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv: 1512.03385 (2015)

  17. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning—Volume 37, pp. 448–456 (2015)

  18. Ling, X., Deng, W., Gu, C., Zhou, H., Li, C., Sun, F.: Model ensemble for click prediction in bing search ads. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 689–698. International World Wide Web Conferences Steering Committee (2017)

  19. Mo, K., Liu, B., Xiao, L., Li, Y., Jiang, J.: Image feature learning for cold start problem in display advertising. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3728–3734. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832747.2832769

  20. Moustafa, M.: Applying deep learning to classify pornographic images and videos. arXiv preprint arXiv:1511.08899 (2015)

  21. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

  22. Ng, A.: Machine learning yearning (2018). https://www.deeplearning.ai/content/uploads/2018/09/Ng-MLY01-12.pdf

  23. van Rossum, G., Drake, F.L.: The Python Language Reference Manual. Network Theory Ltd., New York (2011)

    Google Scholar 

  24. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  25. Sculley, D., Otey, M.E., Pohl, M., Spitznagel, B., Hainsworth, J., Zhou, Y.: Detecting adversarial advertisements in the wild. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pp. 274–282. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2020408.2020455. http://doi.acm.org/10.1145/2020408.2020455

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6 (2015)

  27. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  28. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlena, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567v3 (2015)

  29. Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. arXiv preprint arXiv:1708.05123 (2017)

  30. Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. IEEE Trans. Syst. Man Cybern. Syst. (2017). https://doi.org/10.1109/TSMC.2017.2768547

    Article  Google Scholar 

  31. Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.J.: Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4368–4374 (2016). https://doi.org/10.1109/IJCNN.2016.7727770

  32. Wehrmann, J., Simes, G.S., Barros, R.C., Cavalcante, V.F.: Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272, 432–438 (2018). https://doi.org/10.1016/j.neucom.2017.07.012

    Article  Google Scholar 

  33. Woodard, R.: A keras multithreaded dataframe generator for millions of image files (2017). https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43

  34. Yahoo: Open nsfw model (2016). https://github.com/yahoo/open_nsfw

  35. Zhou, G., Song, C., Zhu, X., Ma, X., Yan, Y., Dai, X., Zhu, H., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978 (2017)

  36. Zhou, K., Zhuo, L., Geng, Z., Zhang, J., Li, X.G.: Convolutional neural networks based pornographic image classification. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), pp. 206–209. IEEE (2016)

Download references

Acknowledgements

The authors would like to thank Victoria Tucci, Tyler Herwick, and the Scaled Services team at AppNexus (now Xandr) for help with data collection and understanding the creative audit process. We would also like to thank Mike Cen, Alex Tandy, and Moussa Taifi for guidance on project scope and engineering support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashutosh Sanzgiri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Austin, D., Sanzgiri, A., Sankaran, K. et al. Classifying sensitive content in online advertisements with deep learning. Int J Data Sci Anal 10, 265–276 (2020). https://doi.org/10.1007/s41060-020-00212-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-020-00212-6

Keywords

Navigation