skip to main content
10.1145/1299015.1299017acmotherconferencesArticle/Chapter ViewAbstractPublication PagesecrimeConference Proceedingsconference-collections
Article

Fishing for phishes: applying capture-recapture methods to estimate phishing populations

Published:04 October 2007Publication History

ABSTRACT

We estimate of the extent of phishing activity on the Internet via capture-recapture analysis of two major phishing site reports. Capture-recapture analysis is a population estimation technique originally developed for wildlife conservation, but is applicable in any environment wherein multiple independent parties collect reports of an activity.

Generating a meaningful population estimate for phishing activity requires addressing complex relationships between phishers and phishing reports. Phishers clandestinely occupy machines and adding evasive measures into phishing URLs to evade firewalls and other fraud-detection measures. Phishing reports, in the meantime, may be demonstrate a preference towards certain classes of phish.

We address these problems by estimating population in terms of netblocks and by clustering phishing attempts together into scams, which are phishes that demonstrate similar behavior on multiple axes. We generate population estimates using data from two different phishing reports over an 80-day period, and show that these reports capture approximately 40% of scams and 80% of CIDR/24 (256 contiguous address) netblocks involved in phishing.

References

  1. M. Abu Rajab, J. Zarfoss, F. Monrose, and A. Terzis. My botnet is bigger than yours (maybe, better than yours): why size estimates remain challenging. In Proceedings of the first annual workshop on hot topics in botnets, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Briand, K. Emam, B. Freimut, and O. Laitenberger. A comprehensive evaluation of capture-recatpure models for estimating software defect content. IEEE Transcripts of Software Engineering, 26:518--540, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74:829--836, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Collins, T. Shimeall, S. Faber, J. Janies, R. Weaver, M. De Shon, and J. Kadane. Using uncleanliness to predict future botnet addresses. In Proceedings of the 2007 Internet Measurement Conference, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Cooke, F. Jahanian, and D. McPherson. The zombie roundup: Understanding, detecting and disturbing botnets. In Proceedings of the First Workshop on Steps to reducing unwanted traffic on the internet (SRUTI), July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Dagon, C. Zou, and W. Lee. Modeling botnet propagation using time zones. In Proceedings of the 13th Network and Distributed Security Symposium (NDSS), February 2006.Google ScholarGoogle Scholar
  7. J. Darroch, S. Fienberg, G. Glonek, and B. Junker. A three-sample multiple-recapture approach to census population estimation with heterogenous catchability. Journal of the American Statistical Association, 88:1137--1148, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Fienberg. The Analysis of Cross-Classified Categorical Data. MIT Press, 1980.Google ScholarGoogle Scholar
  9. F. Freiling, T. Holz, and G. Wicherski. Botnet tracking: Exploring a root-cause methodology to prevent denial-of-service attacks. In Proceedings of the 10th European Symposium on Research in Computer Security (ESORICS), September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8, Soviet Physics Doklady, 1966.Google ScholarGoogle Scholar
  11. S. Lohr. Sampling Design and Analysis. Duxbury Press, 1999.Google ScholarGoogle Scholar
  12. P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall/CRC, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Moore, C. Shannon, D. Brown, G. Voelker, and S. Savage. Inferring internet denial-of-service activity. ACM Transactions on Computer Systems, 24(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Moore and R. Clayton. An empirical analysis of the current state of phishing attack and defence. In Proceedings of the 2007 Workshop on the Economics of Information Security (WEIS), 2007.Google ScholarGoogle Scholar
  15. A. Ramachandran and N. Feamster. Understanding the network-level behavior of spammers. In SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, pages 291--302, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Ramachandran, N. Feamster, and D. Dagon. Revealing botnet membership using DNSBL counter-intelligence. In Proceedings of the 2006 USENIX workshop on steps for reducing unwanted traffic on the internet (SRUTI), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Rawlings, S. Pantula, and D. Dickey. Applied Regression Analysis. Springer-Verlag, New York Inc., 1998.Google ScholarGoogle Scholar
  18. R. Thomas and J. Martin. The underground economy: Priceless. Usenix; login;, 31(6), December 2006.Google ScholarGoogle Scholar
  19. J. Wittes. Applications of a multinomial capture-recapture model to epidemiological data. Journal of the American Statistical Association, 69:93--97, 1974.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
    October 2007
    90 pages
    ISBN:9781595939395
    DOI:10.1145/1299015

    Copyright © 2007 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 October 2007

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader