Fraud Detection in Online Market Research

Kalinichenko, Vera; Atashian, Gasia; Abgaryan, Davit; Wijaya, Natasha

doi:10.1007/978-3-030-82196-8_33

Vera Kalinichenko^10,11,
Gasia Atashian¹¹,
Davit Abgaryan¹¹ &
…
Natasha Wijaya¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 295))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

958 Accesses

Abstract

The key distinguishing approach in this paper is to utilize random sampling on the training set for optimal solution to maximize the fraud coverage as well as minimize the false positive rate. We have experimented with the variety of optimal solutions to discover a different bad actors segment. We have adapted a partial labeled data in the industry setting together with self developed set of SQL based rules in order to compensate in timely manner for just enough labelled data available for supervised learning model to detect fraud users before they negatively impact our business. Here, at DISQO, market research firm that provides raw data to our partners and clients as well as is a reputable panel for consumers to share their feedback on variety of brands and products, we were facing challenges related to noisy labelled data. Thus, set of rules were developed to assess every user against fraud in the following grade, A (red, very suspicious), B (yellow) and C (green). We started with a simple grading system. Then, after the optimal problem was formulated to maximize the fraud detection on the random sampled training set we were solving for optimal solution, and collected all of these solutions to average out and design our final solution in order to detect Fraud with better precision and improved recall from \(26\%\) to \(52\%\). Lastly we have developed a methodology to combine these optimal coefficient solutions in order to have a well generalized fraud detection model as averaging the coefficients next to the dynamic labels via Logistic Regression. However, we have achieved the best results when we solved for the optimal fraud coverage segment and trained on the hand picked number of classifiers to learn the separation in the data between bad and good actors. Then we have created a fraud vector of 5-dimensions, that consisted of the probabilities retrieved from hand picked classifiers based on the optimal solutions (we had 3 fraud segments retrieved from optimal solutions), one of the fraud vector’s dimension contained the CNN probability, other two were XGBoost and Logistic based probability, and kept the auto-encoder reconstruction error as another fraud vector dimension. At the end, we compare fraud vector magnitude on every users to assess quickly the fraud overall risk, we use every classifier probability and auto-encoder reconstruction error as fraud dimensions.

Supported by DISQO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bao, H., Niu, G., Sugiyama, M.: Classification from pairwise similarity and unlabeled data. In: Proceedings of the 35th International Conference on Machine Learning, pp. 452–461 (2018)
Google Scholar
Domingues, R.: Probabilistic modeling for novelty detection with applications to fraud identification (2019). https://arxiv.org/pdf/1903.01730.pdf
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)
Article Google Scholar
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17, 235–249 (2002)
Article MathSciNet Google Scholar
Buthpitiya, S.W.: Geo-trace modeling using n-grams for anomaly detection in user behavior and user location prediction (Doctoral dissertation, Carnegie Mellon University) (2011)
Google Scholar
Hofgesang, P.I., Kowalczyk, W.: Analysing clickstream data: from anomaly detection to visitor profiling. In: Proceedings of ECML/PKDD Discovery Challenge (2005)
Google Scholar
Ivey, H., Appana, R.V., Ramsey, P., Yeh, T.: U.S. Patent Application No. 14/789,710 (2016)
Google Scholar
Lamba, H., Glazier, T.J., Cámara, J., Schmerl, B., Garlan, D., Pfeffer, J.: Model-based cluster analysis for identifying suspicious activity sequences in software. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 17–22. ACM (March 2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California Los Angeles, Los Angeles, USA
Vera Kalinichenko
DISQO, Glendale, CA, USA
Vera Kalinichenko, Gasia Atashian, Davit Abgaryan & Natasha Wijaya

Authors

Vera Kalinichenko
View author publications
You can also search for this author in PubMed Google Scholar
Gasia Atashian
View author publications
You can also search for this author in PubMed Google Scholar
Davit Abgaryan
View author publications
You can also search for this author in PubMed Google Scholar
Natasha Wijaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vera Kalinichenko .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalinichenko, V., Atashian, G., Abgaryan, D., Wijaya, N. (2022). Fraud Detection in Online Market Research. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-82196-8_33
Published: 03 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics