research-article

Detecting Illicit Food Factories from Chemical Declaration Data via Graph-aware Self-supervised Contrastive Anomaly Ranking

Authors:

Sheng-Fang Yang,

Cheng-Te LiAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4501 - 4511

https://doi.org/10.1145/3589334.3648138

Published: 13 May 2024 Publication History

Abstract

In the global food industry, where the line between legitimate and illicit manufacturing is increasingly blurred by the scale and complexity of the supply chain, safeguarding consumer health and trust necessitates innovative detection methods. Addressing this, this paper presents Graph-aware Self-supervised Contrastive Anomaly Ranking (GraphCAR), a novel unsupervised learning model, devised to identify illicit food factories through the scrutiny of chemical declaration data. GraphCAR tackles the scarcity of labeled data and the intricacies inherent in the vast array of declared chemicals, leveraging a Graph Autoencoder fused with a self-supervised contrastive learning mechanism. This fusion not only simplifies the feature space by embedding chemical declarations within a bipartite graph but also adeptly flags subtle, potentially illicit patterns through contrastively inspecting the learned factory representations. Through rigorous evaluations conducted on real-world factory's chemical declaration data, GraphCAR has demonstrated superior performance over conventional methods on unsupervised outlier detection and one-class classification tasks, showcasing its accuracy, robustness and reliability in flagging potential malpractice. With its successful application in food safety, GraphCAR stands as a testament to the potential of AI-driven solutions to address multifaceted challenges for the greater good.

Supplemental Material

MP4 File

Video presentation

Download
1532.03 MB

MP4 File

Supplemental video

Download
119.01 MB

References

[1]

Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimensional spaces. In European conference on principles of data mining and knowledge discovery. Springer, 15--27.

[2]

Yamine Bouzembrak, B Steen, Rabin Neslo, Jens Linge, Vahid Mojtahed, and HJP Marvin. 2018. Development of food fraud media monitoring system based on text mining. Food Control, Vol. 93 (2018), 283--296.

[3]

Leo Breiman. 2001. Random forests. Machine learning, Vol. 45 (2001), 5--32.

Digital Library

[4]

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 93--104.

Digital Library

[5]

Fernando P Carvalho. 2017. Pesticides, environment, and food safety. Food and energy security, Vol. 6, 2 (2017), 48--60.

[6]

Wan-Tzu Chang, Yen-Po Yeh, Hong-Yi Wu, Yu-Fen Lin, Thai Son Dinh, and Ie-bin Lian. 2020. An automated alarm system for food safety by using electronic invoices. Plos one, Vol. 15, 1 (2020), e0228035.

[7]

Sylvain Charlebois, Anita Schwab, Raphael Henn, and Christian W Huck. 2016. Food fraud: An exploratory study for measuring consumer perception towards mislabeled food products and influence on self-authentication intentions. Trends in food science & technology, Vol. 50 (2016), 211--218.

[8]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.

Digital Library

[9]

Zhe Chen and Aixin Sun. 2020. Anomaly detection on dynamic bipartite graph with burstiness. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 966--971.

[10]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734.

[11]

Yingtong Dou, Guixiang Ma, Philip S Yu, and Sihong Xie. 2020. Robust spammer detection by nash reinforcement learning. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 924--933.

Digital Library

[12]

M Esteki, J Regueiro, and J Simal-Gándara. 2019. Tackling fraudsters with global strategies to expose fraud in the food chain. Comprehensive Reviews in Food Science and Food Safety, Vol. 18, 2 (2019), 425--440.

[13]

Lanting Fang, Kaiyu Feng, Jie Gui, Shanshan Feng, and Aiqun Hu. 2023. Anonymous Edge Representation for Inductive Anomaly Detection in Dynamic Bipartite Graph. Proceedings of the VLDB Endowment, Vol. 16, 5 (2023), 1154--1167.

Digital Library

[14]

VJ Feron and JP Groten. 2002. Toxicological evaluation of chemical mixtures. Food and chemical toxicology, Vol. 40, 6 (2002), 825--839.

[15]

Jakub Fibigr, Dalibor vS at'inskỳ, and Petr Solich. 2018. Current trends in the analysis and quality control of food supplements based on plant extracts. Analytica chimica acta, Vol. 1036 (2018), 1--15.

[16]

Boyan Gao, Stephen E Holroyd, Jeffrey C Moore, Kristie Laurvick, Steven M Gendel, and Zhuohong Xie. 2019. Opportunities and challenges using non-targeted methods for food fraud detection. Journal of agricultural and food chemistry, Vol. 67, 31 (2019), 8425--8430.

[17]

Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. 2022. Lunar: Unifying local outlier detection methods via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6737--6745.

[18]

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.

[19]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.

[20]

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations.

[21]

Chuanbo Hu, Bin Liu, Yanfang Ye, and Xin Li. 2023. Fine-grained classification of drug trafficking based on Instagram hashtags. Decision Support Systems, Vol. 165 (2023), 113896.

Digital Library

[22]

Chuanbo Hu, Minglei Yin, Bin Liu, Xin Li, and Yanfang Ye. 2021. Detection of illicit drug trafficking events on instagram: A deep multimodal multilabel learning approach. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3838--3846.

Digital Library

[23]

Lauren S Jackson. 2009. Chemical food safety issues in the United States: past, present, and future. Journal of agricultural and food chemistry, Vol. 57, 18 (2009), 8161--8170.

[24]

Zeren Jiao, Pingfan Hu, Hongfei Xu, and Qingsheng Wang. 2020. Machine learning and deep learning in chemical health and safety: a systematic review of techniques and applications. ACS Chemical Health & Safety, Vol. 27, 6 (2020), 316--334.

[25]

Ana M Jiménez-Carvelo, Antonio González-Casado, M Gracia Bagur-González, and Luis Cuadros-Rodr'iguez. 2019. Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity--A review. Food research international, Vol. 122 (2019), 25--39.

[26]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30 (2017).

[27]

Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).

[28]

Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In International conference on learning representations.

[29]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.

[30]

Indrajeet Kumar, Jyoti Rawat, Noor Mohd, and Shahnawaz Husain. 2021. Opportunities of artificial intelligence and machine learning in the food industry. Journal of Food Quality, Vol. 2021 (2021), 1--10.

[31]

Jiawei Li, Qing Xu, Neal Shah, and Tim K Mackey. 2019. A machine learning approach for the detection and characterization of illicit drug dealers on instagram: model evaluation study. Journal of medical Internet research, Vol. 21, 6 (2019), e13803.

[32]

Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, and Xiyang Hu. 2020. COPOD: copula-based outlier detection. In 2020 IEEE international conference on data mining (ICDM). IEEE, 1118--1123.

[33]

Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George Chen. 2022. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Transactions on Knowledge and Data Engineering (2022).

Digital Library

[34]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 2008 eighth ieee international conference on data mining. IEEE, 413--422.

Digital Library

[35]

Ningjing Liu, Yamine Bouzembrak, Leonieke M Van den Bulk, Anand Gavai, Lukas J van den Heuvel, and Hans JP Marvin. 2022. Automated food safety early warning system in the dairy supply chain using machine learning. Food Control, Vol. 136 (2022), 108872.

[36]

Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. 2019. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 8 (2019), 1517--1528.

[37]

Tim Mackey, Janani Kalyanam, Josh Klugman, Ella Kuzmenko, and Rashmi Gupta. 2018. Solution to detect, classify, and report illicit online marketing and sales of controlled substances via Twitter: using machine learning and web forensics to combat digital opioid access. Journal of medical Internet research, Vol. 20, 4 (2018), e10029.

[38]

Tim K Mackey, Janani Kalyanam, Takeo Katsuki, and Gert Lanckriet. 2017. Twitter-based detection of illegal online sale of prescription opioid. American journal of public health, Vol. 107, 12 (2017), 1910--1915.

[39]

Adyasha Maharana, Kunlin Cai, Joseph Hellerstein, Yulin Hswen, Michael Munsell, Valentina Staneva, Miki Verma, Cynthia Vint, Derry Wijaya, and Elaine O Nsoesie. 2019. Detecting reports of unsafe foods in consumer product reviews. JAMIA open, Vol. 2, 3 (2019), 330--338.

[40]

Georgios Makridis, Philip Mavrepis, and Dimosthenis Kyriazis. 2023. A deep learning approach using natural language processing and time-series forecasting towards enhanced food safety. Machine Learning, Vol. 112, 4 (2023), 1287--1313.

Digital Library

[41]

Louise Manning and Jan Mei Soon. 2016. Food safety, food fraud, and food defense: a fast evolving literature. Journal of food science, Vol. 81, 4 (2016), R823--R834.

[42]

Hans JP Marvin, Yamine Bouzembrak, Esmée M Janssen, HJ van van der Fels-Klerx, Esther D van Asselt, and Gijs A Kleter. 2016. A holistic approach to food safety risks: Food fraud as an example. Food research international, Vol. 89 (2016), 463--470.

[43]

NN Misra, Yash Dixit, Ahmad Al-Mallahi, Manreet Singh Bhullar, Rohit Upadhyay, and Alex Martynenko. 2020. IoT, big data, and artificial intelligence in agriculture and food industry. IEEE Internet of things Journal, Vol. 9, 9 (2020), 6305--6324.

[44]

Thomas G Neltner, Heather M Alger, Jack E Leonard, and Maricel V Maffini. 2013. Data gaps in toxicity testing of chemicals allowed in food in the United States. Reproductive Toxicology, Vol. 42 (2013), 85--94.

[45]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[46]

Pramuditha Perera, Poojan Oza, and Vishal M Patel. 2021. One-class classification: A survey. arXiv preprint arXiv:2101.03064 (2021).

[47]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, Vol. 31 (2018).

Digital Library

[48]

Yuxiang Ren, Hao Zhu, Jiawei Zhang, Peng Dai, and Liefeng Bo. 2021. Ensemfdet: An ensemble approach to fraud detection based on bipartite graph. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2039--2044.

[49]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International conference on machine learning. PMLR, 4393--4402.

[50]

Anamika Paul Rupa and Aryya Gangopadhyay. 2020. Multi-modal Deep Learning Based Fusion Approach to Detect Illicit Retail Networks from Social Media. In 2020 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 238--243.

[51]

RH Schmidt. 2000. Declaration of ingredients and additives: United States. In Food Labelling. Woodhead Publishing, 81--100.

[52]

Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Comput., Vol. 13, 7 (jul 2001), 1443--1471.

Digital Library

[53]

Karandeep Singh, Yu-Che Tsai, Cheng-Te Li, Meeyoung Cha, and Shou-De Lin. 2023. GraphFC: Customs Fraud Detection with Label Scarcity. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4829--4835.

Digital Library

[54]

John Spink and Douglas C Moyer. 2011. Defining the public health threat of food fraud. Journal of food science, Vol. 76, 9 (2011), R157--R163.

[55]

Dandan Tao, Pengkun Yang, and Hao Feng. 2020. Utilization of text mining as a big data analysis tool for food science and nutrition. Comprehensive reviews in food science and food safety, Vol. 19, 2 (2020), 875--894.

[56]

Saskia M van Ruth, Wim Huisman, and Pieternel A Luning. 2017. Food fraud vulnerability and its key factors. Trends in Food Science & Technology, Vol. 67 (2017), 70--75.

[57]

K Verhaelen, A Bauer, F Günther, B Müller, M Nist, B Ülker Celik, C Weidner, H Küchenhoff, and P Wallner. 2018. Anticipation of food safety and fraud issues: ISAR-A new screening tool to monitor food prices and commodity flows. Food Control, Vol. 94 (2018), 93--101.

[58]

Pierina Visciano and Maria Schirone. 2021. Food frauds: Global incidents and misleading situations. Trends in Food Science & Technology, Vol. 114 (2021), 424--442.

[59]

Andrew Z Wang, Rex Ying, Pan Li, Nikhil Rao, Karthik Subbian, and Jure Leskovec. 2021b. Bipartite dynamic representations for abuse detection. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3638--3648.

Digital Library

[60]

Xinxin Wang, Yamine Bouzembrak, AGJM Oude Lansink, and HJ van der Fels-Klerx. 2022. Application of machine learning to the monitoring and prediction of food safety: A review. Comprehensive Reviews in Food Science and Food Safety, Vol. 21, 1 (2022), 416--434.

[61]

Zhiwei Wang, Zhengzhang Chen, Jingchao Ni, Hui Liu, Haifeng Chen, and Jiliang Tang. 2021a. Multi-scale one-class recurrent neural networks for discrete event sequence anomaly detection. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 3726--3734.

Digital Library

[62]

Hongzuo Xu, Guansong Pang, Yijie Wang, and Yongjun Wang. 2023. Deep isolation forest for anomaly detection. IEEE Transactions on Knowledge and Data Engineering (2023).

Digital Library

[63]

Jianke Yu, Hanchen Wang, Xiaoyang Wang, Zhao Li, Lu Qin, Wenjie Zhang, Jian Liao, and Ying Zhang. 2023 a. Group-based fraud detection network on e-commerce platforms. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5463--5475.

Digital Library

[64]

Wei Yu, Wenkai Wang, Guangquan Xu, Huaming Wu, Hongyan Li, Jun Wang, Xiaoming Li, and Juan Liu. 2023 b. MRFS: Mining Rating Fraud Subgraph in Bipartite Graph for Users and Products. IEEE Transactions on Computational Social Systems (2023).

[65]

Fengpan Zhao, Pavel Skums, Alex Zelikovsky, Eric L Sevigny, Monica Haavisto Swahn, Sheryl M Strasser, Yan Huang, and Yubao Wu. 2020. Computational approaches to detect illicit drug ads and find vendor communities within social media platforms. IEEE/ACM transactions on computational biology and bioinformatics, Vol. 19, 1 (2020), 180--191.

[66]

Chong Zhou and Randy C Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 665--674. io

Digital Library

Index Terms

Detecting Illicit Food Factories from Chemical Declaration Data via Graph-aware Self-supervised Contrastive Anomaly Ranking
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

SSCL: Semi-supervised Contrastive Learning for Industrial Anomaly Detection
Pattern Recognition and Computer Vision
Abstract
Anomaly detection is an important machine learning task that aims to identify data points that are inconsistent with normal data patterns. In real-world scenarios, it is common to have access to some labeled and unlabeled samples that are known to ...
JGCL: Joint Self-Supervised and Supervised Graph Contrastive Learning
WWW '22: Companion Proceedings of the Web Conference 2022

Semi-supervised and self-supervised learning on graphs are two popular avenues for graph representation learning. We demonstrate that no single method from semi-supervised and self-supervised learning works uniformly well for all settings in the node ...
Cross-domain graph anomaly detection via anomaly-aware contrastive alignment
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

Cross-domain graph anomaly detection (CD-GAD) describes the problem of detecting anomalous nodes in an unlabelled target graph using auxiliary, related source graphs with labelled anomalous and normal nodes. Although it presents a promising approach to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Council, Taiwan

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
145
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)16

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten