Abstract
The key distinguishing approach in this paper is to utilize random sampling on the training set for optimal solution to maximize the fraud coverage as well as minimize the false positive rate. We have experimented with the variety of optimal solutions to discover a different bad actors segment. We have adapted a partial labeled data in the industry setting together with self developed set of SQL based rules in order to compensate in timely manner for just enough labelled data available for supervised learning model to detect fraud users before they negatively impact our business. Here, at DISQO, market research firm that provides raw data to our partners and clients as well as is a reputable panel for consumers to share their feedback on variety of brands and products, we were facing challenges related to noisy labelled data. Thus, set of rules were developed to assess every user against fraud in the following grade, A (red, very suspicious), B (yellow) and C (green). We started with a simple grading system. Then, after the optimal problem was formulated to maximize the fraud detection on the random sampled training set we were solving for optimal solution, and collected all of these solutions to average out and design our final solution in order to detect Fraud with better precision and improved recall from \(26\%\) to \(52\%\). Lastly we have developed a methodology to combine these optimal coefficient solutions in order to have a well generalized fraud detection model as averaging the coefficients next to the dynamic labels via Logistic Regression. However, we have achieved the best results when we solved for the optimal fraud coverage segment and trained on the hand picked number of classifiers to learn the separation in the data between bad and good actors. Then we have created a fraud vector of 5-dimensions, that consisted of the probabilities retrieved from hand picked classifiers based on the optimal solutions (we had 3 fraud segments retrieved from optimal solutions), one of the fraud vector’s dimension contained the CNN probability, other two were XGBoost and Logistic based probability, and kept the auto-encoder reconstruction error as another fraud vector dimension. At the end, we compare fraud vector magnitude on every users to assess quickly the fraud overall risk, we use every classifier probability and auto-encoder reconstruction error as fraud dimensions.
Supported by DISQO.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, H., Niu, G., Sugiyama, M.: Classification from pairwise similarity and unlabeled data. In: Proceedings of the 35th International Conference on Machine Learning, pp. 452–461 (2018)
Domingues, R.: Probabilistic modeling for novelty detection with applications to fraud identification (2019). https://arxiv.org/pdf/1903.01730.pdf
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17, 235–249 (2002)
Buthpitiya, S.W.: Geo-trace modeling using n-grams for anomaly detection in user behavior and user location prediction (Doctoral dissertation, Carnegie Mellon University) (2011)
Hofgesang, P.I., Kowalczyk, W.: Analysing clickstream data: from anomaly detection to visitor profiling. In: Proceedings of ECML/PKDD Discovery Challenge (2005)
Ivey, H., Appana, R.V., Ramsey, P., Yeh, T.: U.S. Patent Application No. 14/789,710 (2016)
Lamba, H., Glazier, T.J., Cámara, J., Schmerl, B., Garlan, D., Pfeffer, J.: Model-based cluster analysis for identifying suspicious activity sequences in software. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 17–22. ACM (March 2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kalinichenko, V., Atashian, G., Abgaryan, D., Wijaya, N. (2022). Fraud Detection in Online Market Research. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-82196-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)