Skip to main content
Log in

A comparison of fraud cues and classification methods for fake escrow website detection

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Hu X, Lin Z, Whinston AB, Zhang H (2004) Hope or hype: on the viability of escrow services as trusted third parties in online auction environments. Inf Syst Res 15(3):236–249

    Article  Google Scholar 

  2. Pavlou PA, Gefen D (2004) Building effective online marketplaces with institution-based trust. Inf Syst Res 15(1):37–59

    Article  Google Scholar 

  3. Ba S, Whinston AB, Zhang H (2003) Building trust in online auction markets through an economic incentive mechanism. Decis Support Syst 35(3):273–286

    Article  Google Scholar 

  4. Josang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644

    Article  Google Scholar 

  5. Chua CEH, Wareham J (2004) Fighting internet auction fraud: an assessment and proposal. IEEE Computer, pp. 31–37

  6. Selis P, Ramasastry A, Wright CS (2001) Bidder beware: toward a fraud-free marketplace–best practices for the online auction industry. Annual LCT Conference

  7. IFCC (2003) IFCC internet fraud report: January 1, 2002–December 31, 2002, The National White Collar Crime Center

  8. Antony S, Lin Z, Xu B (2001) Determinants of online escrow service adoption: an experimental study. In: Proceedings of the 11th workshop on information technology and systems (WITS ‘01) pp. 71–76

  9. Airoldi E, Malin B (2004) Data mining challenges for electronic safety: the case of fraudulent intent detection in E-Mails. In: Proceedings of the workshop on privacy and security aspects of data mining

  10. MacInnes I, Damani M, Laska J (2005) Electronic commerce fraud: towards an understanding of the phenomenon. In: Proceedings of the Hawaii international conference on systems sciences

  11. Sullivan B (2002) Fake escrow site scam widens: auction winners sometimes lose $40,000. MSNBC, Dec 17 2002

  12. Chou N, Ledesma R, Teraguchi Y, Boneh D, Mitchell JC (2004) Client-side defense against web-based identity theft. In: Proceedings of the network and distributed system security symposium, San Diego

  13. Kolari P, Finin T, Joshi A (2006) SVMs for the blogosphere: blog identification and splog detection. In: AAAI spring symposium on computational approaches to analysing weblogs

  14. Urvoy T, Lavergne T, Filoche P (2006) Tracking web spam with hidden style similarity. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb)

  15. Fraud.org “Fraud Alert, 2001, http://www.fraud.org/news/newsset.htm

  16. Dellarocas C (2003) The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manage Sci 49(10):1407–1424

    Article  Google Scholar 

  17. Pavlou PA, Gefen D (2005) Psychological contract violation in online marketplaces: antecedents, consequences, and moderating role. Inf Syst Res 16(4):372–399

    Article  Google Scholar 

  18. Malhotra NK, Kim SS, Agarwal J (2004) Internet users’ information privacy concern (IUIPC): the construct, the scale, and a causal model. Inf Syst Res 15(4):336–355

    Article  Google Scholar 

  19. Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the international world wide web conference (WWW ‘06), pp. 83–92

  20. Gyongyi Z, Garcia-Molina H (2005) Spam: it’s not just for inboxes anymore. IEEE Comput 38(10):28–34

    Google Scholar 

  21. Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics. In: Proceedings of the seventh international workshop on the web and databases

  22. Steiner I, Steiner D (2002) Online escrow fraud hits ebay members. AuctionBytes.com, 421

  23. Koppel M, Schler J (2003) Exploiting stylistic idiosyncrasies for authorship attribution. In: Proceedings of IJCAI’03 workshop on computational approaches to style analysis and synthesis, Acapulco, Mexico

  24. Abbasi A, Chen H (2005) Identification and comparison of extremist-group web forum messages using authorship analysis. IEEE Intell Syst 20(5):67–75

    Article  Google Scholar 

  25. Salvetti F, Nicolov N (2006) Weblog classification for fast splog filtering: a URL language model segmentation approach. In: Proceedings of the human language technology conference, pp. 137–140

  26. Menczer F, Pant G, Srinivasan ME (2004) Topical web crawlers: evaluating adaptive algorithms. ACM Trans Internet Technol 4(4):378–419

    Article  Google Scholar 

  27. Diligenti M, Coetzee FM, Lawrence S, Giles CL, Gori M (2000) Focused crawling using context graphs. In: Proceedings of the 26th conference on very large databases, Cairo, Egypt, pp. 527–534

  28. Abbasi A, Chen H (2008) Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans Inf Syst 26(2):7

    Article  Google Scholar 

  29. Zheng R, Qin Y, Huang Z, Chen H (2006) A framework for authorship analysis of online messages: writing-style features and techniques. J Am Soc Inf Sci Technol 57(3):378–393

    Article  Google Scholar 

  30. Joachims T, Cristianini N, Shawe-Taylor J (2001) Composite kernels for hypertext categorisation. In: Proceedings of the 18th international conference on machine learning, pp. 250–257

  31. Vapnik V (1999) The nature of statistical learning theory. Springer, Berlin

    Google Scholar 

  32. Li J, Zheng R, Chen H (2006) From fingerprint to writeprint. Commun ACM 49(4):76–82

    Article  Google Scholar 

  33. Stamatatos E, Widmer G (2002) Music performer recognition using an ensemble of simple classifiers. In: Proceedings of the 15th European conference on artificial intelligence

  34. Abbasi A, Chen H (2008) CyberGate: a design framework and system for text analysis of computer- mediated communication. MIS Q 32(4):811–837

    Google Scholar 

  35. Baayen RH, Halteren Hv, Neijt A, Tweedie F (2002) An experiment in authorship attribution. In: Proceedings of the 6th international conference on the statistical analysis of textual data

  36. Binongo JNG, Smith MWA (1999) The application of principal component analysis to stylometry. Lit Linguist Comput 14(4):445–466

    Article  Google Scholar 

  37. Apte C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251

    Article  Google Scholar 

  38. Littlestone N (1988) Learning quickly when irrelevant attributes are abound: a new linear threshold algorithm. Mach Learn 2:285–318

    Google Scholar 

  39. Quinlan R (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  40. Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412

    Article  Google Scholar 

  41. Bayes T (1958) Studies in the history of probability and statistics: XI. Thomas bayes’ essay towards solving a problem in the doctrine of chances. Biometrika 45:293–295

    Article  Google Scholar 

  42. Yang Y, Slattery S, Ghani R (2002) A study of approaches to hypertext categorization. J Intell Inf Syst 18(2–3):219–241

    Article  Google Scholar 

  43. Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp. 1–15

  44. Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Chan P (ed) Working notes of the AAAI workshop on integrating multiple learned models, pp. 15–21

  45. Wu B, Davison BD (2006) Detecting semantic cloaking on the web. In: Proceedings of the world wide web conference (WWW ‘06), pp. 819–828

  46. Kriegel H, Schubert M (2004) Classification of websites as sets of feature vectors. In: Proceedings of the international conference on databases and applications, pp. 127–132

  47. Ester M, Kriegel H, Schubert M (2002) Web site mining: a new way to spot competitors, customers, and suppliers in the world wide web. In: Proceedings of the 8th ACM SIGKDD, pp. 249–258

  48. Kwon O, Lee J (2003) Text categorization based on k-nearest neighbor approach for web site classification. Inf Process Manage 39(1):25–44

    Article  Google Scholar 

  49. Dzerosi S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273

    Article  Google Scholar 

  50. Baldwin RG (2005) Image pixel analysis using Java. Online Press, Austin

    Google Scholar 

  51. Jackson D (1993) Stopping rules in principal component analysis: a comparison of heuristical and statistical approaches. Ecology 74(8):2204–2214

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Abbasi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasi, A., Chen, H. A comparison of fraud cues and classification methods for fake escrow website detection. Inf Technol Manag 10, 83–101 (2009). https://doi.org/10.1007/s10799-009-0059-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-009-0059-0

Keywords

Navigation