Abstract
In online advertising, an important quality control step is to audit advertising images (“creatives”) before they appear on publishers’ Web pages. This ensures that advertisements only appear on Web pages where the ad is appropriate. If a creative with sensitive content such as gambling and pornography is displayed on the wrong Web page, it can ruin the user’s experience, the publisher’s reputation, and may have legal implications. To protect against this, humans must audit every creative before it is displayed through our ad exchange; this process is costly and time-consuming. To detect sensitive content, we use a pre-trained deep convolutional neural network (Xception Chollet in: The IEEE conference on computer vision and pattern recognition (CVPR), 2017) to process the creative image, and merge its features with the historical distribution of categories associated with the creative’s landing page (the Web page that loads when the ad is clicked, which may also contain sensitive content). This representation is then passed through a series of fully connected layers to predict the sensitive category. The trained model achieves slightly better than human performance (model accuracy 99.92%; human accuracy 99.88%) on a large fraction of creatives (61%), while making 3.5 times fewer mistakes in very sensitive categories. The main challenges we faced were to detect, with high accuracy, creatives from 10 “very sensitive” categories as determined by our Creative Audit team, along with a highly imbalanced data set with 95% of creatives having no sensitive categories. This paper extends the work we described in Austin et al. (in: Proceedings of the 2018 IEEE international conference on data science and advanced analytics (DSAA), DSAA’18, 2018). It demonstrates the successful usage of deep learning in production as a method for detecting sensitive creatives, while respecting the constraints set by business.
Similar content being viewed by others
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), pp. 265–284 (2016)
adperium: Appnexus standards (2017). https://www.adperium.com/misc/AppNexus_standards.pdf
Andrews, M.: File name hashing: creating a hashed directory structure (2017). https://medium.com/eonian-technologies/file-name-hashing-creating-a-hashed-directory-structure-eabb03aa4091
Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, pp 381–387 (2000)
Chen, J., Sun, B., Li, H., Lu, H., Hua, X.S.: Deep ctr prediction in display advertising. In: Proceedings of the 2016 ACM on Multimedia Conference, MM ’16, pp. 811–820. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2964284.2964325. http://doi.acm.org/10.1145/2964284.2964325
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Chollet, F., et al.: Keras. (2015) https://keras.io
Clarifai: Nsfw (2016). https://www.clarifai.com/models/nsfw-image-recognition-model-e9576d86d2004ed1a38ba0cf39ecb4b1
Connie, T., Al-Shabi, M., Goh, M.: Smart content recognition from images using a mixture of convolutional neural networks. In: Kim, K.J., Kim, H., Baek, N. (eds.) IT Convergence and Security 2017, pp. 11–18. Springer, Singapore (2018)
Corp., I.: Zeromq whitepapers- multithreading magic (2017). http://zeromq.org/whitepapers:multithreading-magic
Facebook: advertising policies (2017). https://www.facebook.com/policies/ads/
Ge, T., Zhao, L., Zhou, G., Chen, K., Liu, S., Yi, H., Hu, Z., Liu, B., Sun, P., Liu, H., Yi, P., Huang, S., Zhang, Z., Zhu, X., Zhang, Y., Gai, K.: Image matters: visually modeling user behaviors using advanced model server. arXiv preprint arXiv:1711.06505v2 (2017)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Google: Adwords policies (2017). https://support.google.com/adwordspolicy/answer/6008942?hl=en
Group, T.H.: The hdf5 library and file format (2018). https://www.hdfgroup.org/solutions/hdf5/
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv: 1512.03385 (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning—Volume 37, pp. 448–456 (2015)
Ling, X., Deng, W., Gu, C., Zhou, H., Li, C., Sun, F.: Model ensemble for click prediction in bing search ads. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 689–698. International World Wide Web Conferences Steering Committee (2017)
Mo, K., Liu, B., Xiao, L., Li, Y., Jiang, J.: Image feature learning for cold start problem in display advertising. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3728–3734. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832747.2832769
Moustafa, M.: Applying deep learning to classify pornographic images and videos. arXiv preprint arXiv:1511.08899 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ng, A.: Machine learning yearning (2018). https://www.deeplearning.ai/content/uploads/2018/09/Ng-MLY01-12.pdf
van Rossum, G., Drake, F.L.: The Python Language Reference Manual. Network Theory Ltd., New York (2011)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Sculley, D., Otey, M.E., Pohl, M., Spitznagel, B., Hainsworth, J., Zhou, Y.: Detecting adversarial advertisements in the wild. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pp. 274–282. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2020408.2020455. http://doi.acm.org/10.1145/2020408.2020455
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlena, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567v3 (2015)
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. arXiv preprint arXiv:1708.05123 (2017)
Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. IEEE Trans. Syst. Man Cybern. Syst. (2017). https://doi.org/10.1109/TSMC.2017.2768547
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.J.: Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4368–4374 (2016). https://doi.org/10.1109/IJCNN.2016.7727770
Wehrmann, J., Simes, G.S., Barros, R.C., Cavalcante, V.F.: Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272, 432–438 (2018). https://doi.org/10.1016/j.neucom.2017.07.012
Woodard, R.: A keras multithreaded dataframe generator for millions of image files (2017). https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43
Yahoo: Open nsfw model (2016). https://github.com/yahoo/open_nsfw
Zhou, G., Song, C., Zhu, X., Ma, X., Yan, Y., Dai, X., Zhu, H., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. arXiv preprint arXiv:1706.06978 (2017)
Zhou, K., Zhuo, L., Geng, Z., Zhang, J., Li, X.G.: Convolutional neural networks based pornographic image classification. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), pp. 206–209. IEEE (2016)
Acknowledgements
The authors would like to thank Victoria Tucci, Tyler Herwick, and the Scaled Services team at AppNexus (now Xandr) for help with data collection and understanding the creative audit process. We would also like to thank Mike Cen, Alex Tandy, and Moussa Taifi for guidance on project scope and engineering support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Austin, D., Sanzgiri, A., Sankaran, K. et al. Classifying sensitive content in online advertisements with deep learning. Int J Data Sci Anal 10, 265–276 (2020). https://doi.org/10.1007/s41060-020-00212-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-020-00212-6