Abstract
The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites.
Similar content being viewed by others
References
Hu X, Lin Z, Whinston AB, Zhang H (2004) Hope or hype: on the viability of escrow services as trusted third parties in online auction environments. Inf Syst Res 15(3):236–249
Pavlou PA, Gefen D (2004) Building effective online marketplaces with institution-based trust. Inf Syst Res 15(1):37–59
Ba S, Whinston AB, Zhang H (2003) Building trust in online auction markets through an economic incentive mechanism. Decis Support Syst 35(3):273–286
Josang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644
Chua CEH, Wareham J (2004) Fighting internet auction fraud: an assessment and proposal. IEEE Computer, pp. 31–37
Selis P, Ramasastry A, Wright CS (2001) Bidder beware: toward a fraud-free marketplace–best practices for the online auction industry. Annual LCT Conference
IFCC (2003) IFCC internet fraud report: January 1, 2002–December 31, 2002, The National White Collar Crime Center
Antony S, Lin Z, Xu B (2001) Determinants of online escrow service adoption: an experimental study. In: Proceedings of the 11th workshop on information technology and systems (WITS ‘01) pp. 71–76
Airoldi E, Malin B (2004) Data mining challenges for electronic safety: the case of fraudulent intent detection in E-Mails. In: Proceedings of the workshop on privacy and security aspects of data mining
MacInnes I, Damani M, Laska J (2005) Electronic commerce fraud: towards an understanding of the phenomenon. In: Proceedings of the Hawaii international conference on systems sciences
Sullivan B (2002) Fake escrow site scam widens: auction winners sometimes lose $40,000. MSNBC, Dec 17 2002
Chou N, Ledesma R, Teraguchi Y, Boneh D, Mitchell JC (2004) Client-side defense against web-based identity theft. In: Proceedings of the network and distributed system security symposium, San Diego
Kolari P, Finin T, Joshi A (2006) SVMs for the blogosphere: blog identification and splog detection. In: AAAI spring symposium on computational approaches to analysing weblogs
Urvoy T, Lavergne T, Filoche P (2006) Tracking web spam with hidden style similarity. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb)
Fraud.org “Fraud Alert, 2001, http://www.fraud.org/news/newsset.htm
Dellarocas C (2003) The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manage Sci 49(10):1407–1424
Pavlou PA, Gefen D (2005) Psychological contract violation in online marketplaces: antecedents, consequences, and moderating role. Inf Syst Res 16(4):372–399
Malhotra NK, Kim SS, Agarwal J (2004) Internet users’ information privacy concern (IUIPC): the construct, the scale, and a causal model. Inf Syst Res 15(4):336–355
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the international world wide web conference (WWW ‘06), pp. 83–92
Gyongyi Z, Garcia-Molina H (2005) Spam: it’s not just for inboxes anymore. IEEE Comput 38(10):28–34
Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics. In: Proceedings of the seventh international workshop on the web and databases
Steiner I, Steiner D (2002) Online escrow fraud hits ebay members. AuctionBytes.com, 421
Koppel M, Schler J (2003) Exploiting stylistic idiosyncrasies for authorship attribution. In: Proceedings of IJCAI’03 workshop on computational approaches to style analysis and synthesis, Acapulco, Mexico
Abbasi A, Chen H (2005) Identification and comparison of extremist-group web forum messages using authorship analysis. IEEE Intell Syst 20(5):67–75
Salvetti F, Nicolov N (2006) Weblog classification for fast splog filtering: a URL language model segmentation approach. In: Proceedings of the human language technology conference, pp. 137–140
Menczer F, Pant G, Srinivasan ME (2004) Topical web crawlers: evaluating adaptive algorithms. ACM Trans Internet Technol 4(4):378–419
Diligenti M, Coetzee FM, Lawrence S, Giles CL, Gori M (2000) Focused crawling using context graphs. In: Proceedings of the 26th conference on very large databases, Cairo, Egypt, pp. 527–534
Abbasi A, Chen H (2008) Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans Inf Syst 26(2):7
Zheng R, Qin Y, Huang Z, Chen H (2006) A framework for authorship analysis of online messages: writing-style features and techniques. J Am Soc Inf Sci Technol 57(3):378–393
Joachims T, Cristianini N, Shawe-Taylor J (2001) Composite kernels for hypertext categorisation. In: Proceedings of the 18th international conference on machine learning, pp. 250–257
Vapnik V (1999) The nature of statistical learning theory. Springer, Berlin
Li J, Zheng R, Chen H (2006) From fingerprint to writeprint. Commun ACM 49(4):76–82
Stamatatos E, Widmer G (2002) Music performer recognition using an ensemble of simple classifiers. In: Proceedings of the 15th European conference on artificial intelligence
Abbasi A, Chen H (2008) CyberGate: a design framework and system for text analysis of computer- mediated communication. MIS Q 32(4):811–837
Baayen RH, Halteren Hv, Neijt A, Tweedie F (2002) An experiment in authorship attribution. In: Proceedings of the 6th international conference on the statistical analysis of textual data
Binongo JNG, Smith MWA (1999) The application of principal component analysis to stylometry. Lit Linguist Comput 14(4):445–466
Apte C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Littlestone N (1988) Learning quickly when irrelevant attributes are abound: a new linear threshold algorithm. Mach Learn 2:285–318
Quinlan R (1986) Induction of decision trees. Mach Learn 1(1):81–106
Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412
Bayes T (1958) Studies in the history of probability and statistics: XI. Thomas bayes’ essay towards solving a problem in the doctrine of chances. Biometrika 45:293–295
Yang Y, Slattery S, Ghani R (2002) A study of approaches to hypertext categorization. J Intell Inf Syst 18(2–3):219–241
Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp. 1–15
Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Chan P (ed) Working notes of the AAAI workshop on integrating multiple learned models, pp. 15–21
Wu B, Davison BD (2006) Detecting semantic cloaking on the web. In: Proceedings of the world wide web conference (WWW ‘06), pp. 819–828
Kriegel H, Schubert M (2004) Classification of websites as sets of feature vectors. In: Proceedings of the international conference on databases and applications, pp. 127–132
Ester M, Kriegel H, Schubert M (2002) Web site mining: a new way to spot competitors, customers, and suppliers in the world wide web. In: Proceedings of the 8th ACM SIGKDD, pp. 249–258
Kwon O, Lee J (2003) Text categorization based on k-nearest neighbor approach for web site classification. Inf Process Manage 39(1):25–44
Dzerosi S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273
Baldwin RG (2005) Image pixel analysis using Java. Online Press, Austin
Jackson D (1993) Stopping rules in principal component analysis: a comparison of heuristical and statistical approaches. Ecology 74(8):2204–2214
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbasi, A., Chen, H. A comparison of fraud cues and classification methods for fake escrow website detection. Inf Technol Manag 10, 83–101 (2009). https://doi.org/10.1007/s10799-009-0059-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-009-0059-0