skip to main content
10.1145/3573942.3573951acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

Express-related Counterfeit Cigarette Crime Prediction with Imbalanced Data-based Machine Learning Techniques

Published: 16 May 2023 Publication History

Abstract

The number of cigarette-related packages and normal packages is often extremely unbalanced in practice, this characteristic must be considered in the express-related counterfeit cigarette crime prediction, otherwise the performance of prediction model on unknown packages will be affected badly. In this paper, an approach of using SMOTE-ENN technology to deal with imbalanced express data and establish the express-related counterfeit cigarette crime prediction model is proposed. In the preprocessing stage, original data sets are structured by Chinese word segmentation method. In feature calculation and analysis stage, the spatio-temporal features of express data are calculated and analyzed, and the optimal features are further obtained by feature reduction algorithm named PCA. Next, based on optimal features, balanced data are obtained by mixed sampling. Finally, the classifier model was trained and optimized to predict the probability of a package containing counterfeit cigarettes in express data including 2851 normal samples and 285 cigarette-related samples randomly selected from 3077 cigarette-related samples. The performance of all classifiers is improved after using the SMOTE_ENN to balance datasets, and the indicators are above 98%. Experimental results demonstrate feasibility, effectiveness of the proposed approach.

References

[1]
QIAO Langchao, WANG Jinlu and GAO Baohong, 2021. Express-related counterfeit cigarette criminality analysis based on spatio-temporal data features. Chinese Journal of Tobacco : 1–11. 2022–07–31.
[2]
WEI Yingyi. 2018. Analysis on the Acceptance and Application of Electronic Evidence in Combating Internet Cigarette-related Crimes. Guangxi Tobacco Society 2018 Papers. 2018:187-192.
[3]
MO Minghua. 2020. Study on the Countermeasures of Network Cigarette-related Crime. Legal System and Society, 2020(23):85-87.
[4]
JIANG Yi, YANG Yueyao and LI Xiaofan. 2018. Application of Big Data on' Internet + Delivery 'Cigarette-related Crime. Computer Products and Circulation, 2018(10):80.
[5]
Pizarro M E, Giacobone G, Shammah C, 2021. Illicit tobacco trade: empty pack survey in eight Argentinean cities. Tobacco Control, 2021.
[6]
Pizarro M E, Giacobone G, Shammah C, 2021. Empty pack survey to estimate Illicit Tobacco Trade in the city of Buenos Aires, Argentina. Trends in Organized Crime, 2021: 1-17.
[7]
LV Fei. 2019. Research on Internet Smoke-related Crime Area Division Based on Improved K-Means Clustering Algorithm. China Management Informatization, 2019, 22(22): 161-165.
[8]
Qiao L C, Wang J L, Gao B H, 2021. Utilizing link prediction approach to predict express-related counterfeit cigarette crime cases. IEEE 21st International Conference on Communication Technology, Tianjin, China, October 13-16, 2021:328-332.
[9]
H. Luo, X. Pan, Q. Wang, S. Ye and Y. Qian. 2019. "Logistic Regression and Random Forest for Effective Imbalanced Classification," 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 2019, pp. 916-917.
[10]
H. R. Sanabila and W. Jatmiko, 2018. "Ensemble Learning on Large Scale Financial Imbalanced Data," 2018 International Workshop on Big Data and Information Security (IWBIS), 2018, pp. 93-98.
[11]
S. F. Ismael, E. Aptoula and K. Kayabol, "A Joint Semantic Segmentation Loss Function for Imbalanced Datasets," 2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), 2022, pp. 13-16.
[12]
T. Ishikawa, T. Yakoh and H. Urushihara, "An NLP-Inspired Data Augmentation Method for Adverse Event Prediction Using an Imbalanced Healthcare Dataset," in IEEE Access, vol. 10, pp. 81166-81176, 2022.
[13]
Yang Y, Liu F, Jin Z Y, 2015. Aliasing artefact suppression incompressed sensing MRI for random phase-encode undersampling. IEEE Trans Bio-Med Eng, 2015, 62(9): 2215.
[14]
CLIFTON P, DAMMINDA A, VINCENT L.2004. Minority Report in Fraud Detection: Classification of Skewed Data.ACM SIGKDD Explorations Newsletter, 2004, 6(1):50-59.
[15]
ZHANG J, MANI I.2003. Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction//Proc.Int'1 Conf Machine Learning From Imbalanced Data Sets.Washington DC:AAAI Press, 2003.
[16]
KUBAT M, MATWIN S.1997.Addressing the Curse of Imbalanced Training Sets: One-Sided Selection//Proc.Int'l Conf Machine Learning.San Francisco: Morgan Kaufmann, 1997:179-186.
[17]
Chawla N V, Bowyer K W, Hall L O, 2002. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321.
[18]
HAN H, WANG W Y, MAO B H.2005. Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning.Lecture Notes In Computer Science, 2005, 3644(1):878-887.
[19]
HE H, BAI Y, GARCIA E A.2008. Adaptive Synthetic Sampling Approach for Imbalanced Learning//IEEE.Proc Int'l J Conf Neural Networks.USA:IEEE Press, 2008:1322-1328.
[20]
A. Puri and M. Kumar Gupta, 2020. "Improved Hybrid Bag-Boost Ensemble With K-Means-SMOTE–ENN Technique for Handling Noisy Class Imbalanced Data," in The Computer Journal, vol. 65, no. 1, pp. 124-138, Jan. 2020.
[21]
G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya and M. Ismail. 2021. "SMOTE for Handling Imbalanced Data Problem: A Review," 2021 Sixth International Conference on Informatics and Computing (ICIC), 2021, pp. 1-8.
[22]
S. Sakib, M. A. Bakr Siddique and M. A. Rahman, "Performance Evaluation of t-SNE and MDS Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers," 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 5-8.
[23]
Jieba Chinese word segmentation,https://github.com/fxsjy/jieba

Index Terms

  1. Express-related Counterfeit Cigarette Crime Prediction with Imbalanced Data-based Machine Learning Techniques

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
    September 2022
    1221 pages
    ISBN:9781450396899
    DOI:10.1145/3573942
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Express-related counterfeit cigarette crime
    2. Feature reduction
    3. Imbalanced express data
    4. SMOTE_ENN

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    AIPR 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 48
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media