A comparison of fraud cues and classification methods for fake escrow website detection

Abbasi, Ahmed; Chen, Hsinchun

doi:10.1007/s10799-009-0059-0

A comparison of fraud cues and classification methods for fake escrow website detection

Published: 21 July 2009

Volume 10, pages 83–101, (2009)
Cite this article

Information Technology and Management Aims and scope Submit manuscript

Ahmed Abbasi¹ &
Hsinchun Chen²

850 Accesses
24 Citations
Explore all metrics

Abstract

The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Hu X, Lin Z, Whinston AB, Zhang H (2004) Hope or hype: on the viability of escrow services as trusted third parties in online auction environments. Inf Syst Res 15(3):236–249
Article Google Scholar
Pavlou PA, Gefen D (2004) Building effective online marketplaces with institution-based trust. Inf Syst Res 15(1):37–59
Article Google Scholar
Ba S, Whinston AB, Zhang H (2003) Building trust in online auction markets through an economic incentive mechanism. Decis Support Syst 35(3):273–286
Article Google Scholar
Josang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644
Article Google Scholar
Chua CEH, Wareham J (2004) Fighting internet auction fraud: an assessment and proposal. IEEE Computer, pp. 31–37
Selis P, Ramasastry A, Wright CS (2001) Bidder beware: toward a fraud-free marketplace–best practices for the online auction industry. Annual LCT Conference
IFCC (2003) IFCC internet fraud report: January 1, 2002–December 31, 2002, The National White Collar Crime Center
Antony S, Lin Z, Xu B (2001) Determinants of online escrow service adoption: an experimental study. In: Proceedings of the 11th workshop on information technology and systems (WITS ‘01) pp. 71–76
Airoldi E, Malin B (2004) Data mining challenges for electronic safety: the case of fraudulent intent detection in E-Mails. In: Proceedings of the workshop on privacy and security aspects of data mining
MacInnes I, Damani M, Laska J (2005) Electronic commerce fraud: towards an understanding of the phenomenon. In: Proceedings of the Hawaii international conference on systems sciences
Sullivan B (2002) Fake escrow site scam widens: auction winners sometimes lose $40,000. MSNBC, Dec 17 2002
Chou N, Ledesma R, Teraguchi Y, Boneh D, Mitchell JC (2004) Client-side defense against web-based identity theft. In: Proceedings of the network and distributed system security symposium, San Diego
Kolari P, Finin T, Joshi A (2006) SVMs for the blogosphere: blog identification and splog detection. In: AAAI spring symposium on computational approaches to analysing weblogs
Urvoy T, Lavergne T, Filoche P (2006) Tracking web spam with hidden style similarity. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb)
Fraud.org “Fraud Alert, 2001, http://www.fraud.org/news/newsset.htm
Dellarocas C (2003) The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manage Sci 49(10):1407–1424
Article Google Scholar
Pavlou PA, Gefen D (2005) Psychological contract violation in online marketplaces: antecedents, consequences, and moderating role. Inf Syst Res 16(4):372–399
Article Google Scholar
Malhotra NK, Kim SS, Agarwal J (2004) Internet users’ information privacy concern (IUIPC): the construct, the scale, and a causal model. Inf Syst Res 15(4):336–355
Article Google Scholar
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the international world wide web conference (WWW ‘06), pp. 83–92
Gyongyi Z, Garcia-Molina H (2005) Spam: it’s not just for inboxes anymore. IEEE Comput 38(10):28–34
Google Scholar
Fetterly D, Manasse M, Najork M (2004) Spam, damn spam, and statistics. In: Proceedings of the seventh international workshop on the web and databases
Steiner I, Steiner D (2002) Online escrow fraud hits ebay members. AuctionBytes.com, 421
Koppel M, Schler J (2003) Exploiting stylistic idiosyncrasies for authorship attribution. In: Proceedings of IJCAI’03 workshop on computational approaches to style analysis and synthesis, Acapulco, Mexico
Abbasi A, Chen H (2005) Identification and comparison of extremist-group web forum messages using authorship analysis. IEEE Intell Syst 20(5):67–75
Article Google Scholar
Salvetti F, Nicolov N (2006) Weblog classification for fast splog filtering: a URL language model segmentation approach. In: Proceedings of the human language technology conference, pp. 137–140
Menczer F, Pant G, Srinivasan ME (2004) Topical web crawlers: evaluating adaptive algorithms. ACM Trans Internet Technol 4(4):378–419
Article Google Scholar
Diligenti M, Coetzee FM, Lawrence S, Giles CL, Gori M (2000) Focused crawling using context graphs. In: Proceedings of the 26th conference on very large databases, Cairo, Egypt, pp. 527–534
Abbasi A, Chen H (2008) Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans Inf Syst 26(2):7
Article Google Scholar
Zheng R, Qin Y, Huang Z, Chen H (2006) A framework for authorship analysis of online messages: writing-style features and techniques. J Am Soc Inf Sci Technol 57(3):378–393
Article Google Scholar
Joachims T, Cristianini N, Shawe-Taylor J (2001) Composite kernels for hypertext categorisation. In: Proceedings of the 18th international conference on machine learning, pp. 250–257
Vapnik V (1999) The nature of statistical learning theory. Springer, Berlin
Google Scholar
Li J, Zheng R, Chen H (2006) From fingerprint to writeprint. Commun ACM 49(4):76–82
Article Google Scholar
Stamatatos E, Widmer G (2002) Music performer recognition using an ensemble of simple classifiers. In: Proceedings of the 15th European conference on artificial intelligence
Abbasi A, Chen H (2008) CyberGate: a design framework and system for text analysis of computer- mediated communication. MIS Q 32(4):811–837
Google Scholar
Baayen RH, Halteren Hv, Neijt A, Tweedie F (2002) An experiment in authorship attribution. In: Proceedings of the 6th international conference on the statistical analysis of textual data
Binongo JNG, Smith MWA (1999) The application of principal component analysis to stylometry. Lit Linguist Comput 14(4):445–466
Article Google Scholar
Apte C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Article Google Scholar
Littlestone N (1988) Learning quickly when irrelevant attributes are abound: a new linear threshold algorithm. Mach Learn 2:285–318
Google Scholar
Quinlan R (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412
Article Google Scholar
Bayes T (1958) Studies in the history of probability and statistics: XI. Thomas bayes’ essay towards solving a problem in the doctrine of chances. Biometrika 45:293–295
Article Google Scholar
Yang Y, Slattery S, Ghani R (2002) A study of approaches to hypertext categorization. J Intell Inf Syst 18(2–3):219–241
Article Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp. 1–15
Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Chan P (ed) Working notes of the AAAI workshop on integrating multiple learned models, pp. 15–21
Wu B, Davison BD (2006) Detecting semantic cloaking on the web. In: Proceedings of the world wide web conference (WWW ‘06), pp. 819–828
Kriegel H, Schubert M (2004) Classification of websites as sets of feature vectors. In: Proceedings of the international conference on databases and applications, pp. 127–132
Ester M, Kriegel H, Schubert M (2002) Web site mining: a new way to spot competitors, customers, and suppliers in the world wide web. In: Proceedings of the 8th ACM SIGKDD, pp. 249–258
Kwon O, Lee J (2003) Text categorization based on k-nearest neighbor approach for web site classification. Inf Process Manage 39(1):25–44
Article Google Scholar
Dzerosi S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273
Article Google Scholar
Baldwin RG (2005) Image pixel analysis using Java. Online Press, Austin
Google Scholar
Jackson D (1993) Stopping rules in principal component analysis: a comparison of heuristical and statistical approaches. Ecology 74(8):2204–2214
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sheldon B. Lubar School of Business, University of Wisconsin-Milwaukee, Milwaukee, WI, 53201, USA
Ahmed Abbasi
Artificial Intelligence Lab, Department of Management Information Systems, Eller College of Management, University of Arizona, Tucson, AZ, 85721, USA
Hsinchun Chen

Authors

Ahmed Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Abbasi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasi, A., Chen, H. A comparison of fraud cues and classification methods for fake escrow website detection. Inf Technol Manag 10, 83–101 (2009). https://doi.org/10.1007/s10799-009-0059-0

Download citation

Received: 25 June 2009
Accepted: 05 July 2009
Published: 21 July 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10799-009-0059-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of fraud cues and classification methods for fake escrow website detection

Abstract

Access this article

Similar content being viewed by others

Detecto: The Phishing Website Detection

A System Review on Fraudulent Website Detection Using Machine Learning Technique

Fraudulent E-Commerce Websites Detection Through Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparison of fraud cues and classification methods for fake escrow website detection

Abstract

Access this article

Similar content being viewed by others

Detecto: The Phishing Website Detection

A System Review on Fraudulent Website Detection Using Machine Learning Technique

Fraudulent E-Commerce Websites Detection Through Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation