Classifying sensitive content in online advertisements with deep learning

Austin, Daniel; Sanzgiri, Ashutosh; Sankaran, Kannan; Woodard, Ryan; Lissack, Amit; Seljan, Samuel

doi:10.1007/s41060-020-00212-6

Classifying sensitive content in online advertisements with deep learning

Applications
Published: 20 March 2020

Volume 10, pages 265–276, (2020)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Daniel Austin¹,
Ashutosh Sanzgiri²,
Kannan Sankaran³,
Ryan Woodard²,
Amit Lissack⁴ &
…
Samuel Seljan²

518 Accesses
5 Citations
Explore all metrics

Abstract

In online advertising, an important quality control step is to audit advertising images (“creatives”) before they appear on publishers’ Web pages. This ensures that advertisements only appear on Web pages where the ad is appropriate. If a creative with sensitive content such as gambling and pornography is displayed on the wrong Web page, it can ruin the user’s experience, the publisher’s reputation, and may have legal implications. To protect against this, humans must audit every creative before it is displayed through our ad exchange; this process is costly and time-consuming. To detect sensitive content, we use a pre-trained deep convolutional neural network (Xception Chollet in: The IEEE conference on computer vision and pattern recognition (CVPR), 2017) to process the creative image, and merge its features with the historical distribution of categories associated with the creative’s landing page (the Web page that loads when the ad is clicked, which may also contain sensitive content). This representation is then passed through a series of fully connected layers to predict the sensitive category. The trained model achieves slightly better than human performance (model accuracy 99.92%; human accuracy 99.88%) on a large fraction of creatives (61%), while making 3.5 times fewer mistakes in very sensitive categories. The main challenges we faced were to detect, with high accuracy, creatives from 10 “very sensitive” categories as determined by our Creative Audit team, along with a highly imbalanced data set with 95% of creatives having no sensitive categories. This paper extends the work we described in Austin et al. (in: Proceedings of the 2018 IEEE international conference on data science and advanced analytics (DSAA), DSAA’18, 2018). It demonstrates the successful usage of deep learning in production as a method for detecting sensitive creatives, while respecting the constraints set by business.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Artificial intelligence in the creative industries: a review

Article Open access 02 July 2021

Diversity representation in advertising

Article Open access 26 December 2023

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), pp. 265–284 (2016)
adperium: Appnexus standards (2017). https://www.adperium.com/misc/AppNexus_standards.pdf
Andrews, M.: File name hashing: creating a hashed directory structure (2017). https://medium.com/eonian-technologies/file-name-hashing-creating-a-hashed-directory-structure-eabb03aa4091
Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, pp 381–387 (2000)
Chen, J., Sun, B., Li, H., Lu, H., Hua, X.S.: Deep ctr prediction in display advertising. In: Proceedings of the 2016 ACM on Multimedia Conference, MM ’16, pp. 811–820. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2964284.2964325. http://doi.acm.org/10.1145/2964284.2964325
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Chollet, F., et al.: Keras. (2015) https://keras.io
Clarifai: Nsfw (2016). https://www.clarifai.com/models/nsfw-image-recognition-model-e9576d86d2004ed1a38ba0cf39ecb4b1
Connie, T., Al-Shabi, M., Goh, M.: Smart content recognition from images using a mixture of convolutional neural networks. In: Kim, K.J., Kim, H., Baek, N. (eds.) IT Convergence and Security 2017, pp. 11–18. Springer, Singapore (2018)
Chapter Google Scholar
Corp., I.: Zeromq whitepapers- multithreading magic (2017). http://zeromq.org/whitepapers:multithreading-magic
Facebook: advertising policies (2017). https://www.facebook.com/policies/ads/
Ge, T., Zhao, L., Zhou, G., Chen, K., Liu, S., Yi, H., Hu, Z., Liu, B., Sun, P., Liu, H., Yi, P., Huang, S., Zhang, Z., Zhu, X., Zhang, Y., Gai, K.: Image matters: visually modeling user behaviors using advanced model server. arXiv preprint arXiv:1711.06505v2 (2017)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Google: Adwords policies (2017). https://support.google.com/adwordspolicy/answer/6008942?hl=en
Group, T.H.: The hdf5 library and file format (2018). https://www.hdfgroup.org/solutions/hdf5/
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv: 1512.03385 (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning—Volume 37, pp. 448–456 (2015)
Ling, X., Deng, W., Gu, C., Zhou, H., Li, C., Sun, F.: Model ensemble for click prediction in bing search ads. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 689–698. International World Wide Web Conferences Steering Committee (2017)
Mo, K., Liu, B., Xiao, L., Li, Y., Jiang, J.: Image feature learning for cold start problem in display advertising. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3728–3734. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832747.2832769
Moustafa, M.: Applying deep learning to classify pornographic images and videos. arXiv preprint arXiv:1511.08899 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ng, A.: Machine learning yearning (2018). https://www.deeplearning.ai/content/uploads/2018/09/Ng-MLY01-12.pdf
van Rossum, G., Drake, F.L.: The Python Language Reference Manual. Network Theory Ltd., New York (2011)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sculley, D., Otey, M.E., Pohl, M., Spitznagel, B., Hainsworth, J., Zhou, Y.: Detecting adversarial advertisements in the wild. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pp. 274–282. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2020408.2020455. http://doi.acm.org/10.1145/2020408.2020455
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlena, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567v3 (2015)
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. arXiv preprint arXiv:1708.05123 (2017)
Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. IEEE Trans. Syst. Man Cybern. Syst. (2017). https://doi.org/10.1109/TSMC.2017.2768547
Article Google Scholar
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.J.: Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4368–4374 (2016). https://doi.org/10.1109/IJCNN.2016.7727770
Wehrmann, J., Simes, G.S., Barros, R.C., Cavalcante, V.F.: Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272, 432–438 (2018). https://doi.org/10.1016/j.neucom.2017.07.012
Article Google Scholar
Woodard, R.: A keras multithreaded dataframe generator for millions of image files (2017). https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43
Yahoo: Open nsfw model (2016). https://github.com/yahoo/open_nsfw
Zhou, G., Song, C., Zhu, X., Ma, X., Yan, Y., Dai, X., Zhu, H., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978 (2017)
Zhou, K., Zhuo, L., Geng, Z., Zhang, J., Li, X.G.: Convolutional neural networks based pornographic image classification. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), pp. 206–209. IEEE (2016)

Download references

Acknowledgements

The authors would like to thank Victoria Tucci, Tyler Herwick, and the Scaled Services team at AppNexus (now Xandr) for help with data collection and understanding the creative audit process. We would also like to thank Mike Cen, Alex Tandy, and Moussa Taifi for guidance on project scope and engineering support.

Author information

Authors and Affiliations

Nike, Beaverton, OR, 97005, USA
Daniel Austin
Xandr, Portland, OR, 97205, USA
Ashutosh Sanzgiri, Ryan Woodard & Samuel Seljan
Xandr, New York, NY, 10010, USA
Kannan Sankaran
Opentrons, New York, NY, 11201, USA
Amit Lissack

Authors

Daniel Austin
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Sanzgiri
View author publications
You can also search for this author in PubMed Google Scholar
Kannan Sankaran
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Woodard
View author publications
You can also search for this author in PubMed Google Scholar
Amit Lissack
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Seljan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashutosh Sanzgiri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Austin, D., Sanzgiri, A., Sankaran, K. et al. Classifying sensitive content in online advertisements with deep learning. Int J Data Sci Anal 10, 265–276 (2020). https://doi.org/10.1007/s41060-020-00212-6

Download citation

Received: 14 January 2019
Accepted: 10 March 2020
Published: 20 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s41060-020-00212-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying sensitive content in online advertisements with deep learning

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Artificial intelligence in the creative industries: a review

Diversity representation in advertising

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying sensitive content in online advertisements with deep learning

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Artificial intelligence in the creative industries: a review

Diversity representation in advertising

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation