Skip to main content

Clairvoyance: Inferring Blocklist Use on the Internet

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 12671))

Included in the following conference series:

Abstract

One of the staples of network defense is blocking traffic to and from a list of “known bad” sites on the Internet. However, few organizations are in a position to produce such a list themselves, so pragmatically this approach depends on the existence of third-party “threat intelligence” providers who specialize in distributing feeds of unwelcome IP addresses. However, the choice to use such a strategy, let alone which data feeds are trusted for this purpose, is rarely made public and thus little is understood about the deployment of these techniques in the wild. To explore this issue, we have designed and implemented a technique to infer proactive traffic blocking on a remote host and, through a series of measurements, to associate that blocking with the use of particular IP blocklists. In a pilot study of 220K US hosts, we find as many as one fourth of the hosts appear to blocklist based on some source of threat intelligence data, and about 2% use one of the 9 particular third-party blocklists that we evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    One exception is the recent work of Bouwman et al. [7] which has explored aspects of this question through the interview of over a dozen security professionals.

References

  1. Afroz, S., Tschantz, M.C., Sajid, S., Qazi, S.A., Javed, M., Paxson, V.: Exploring Server-side Blocking of Regions. Tech. rep, ICSI (2018)

    Google Scholar 

  2. Anderson, D.: Splinternet behind the great firewall of China. Queue 10(11), 40–49 (2012)

    Article  Google Scholar 

  3. antirez: new TCP scan method. https://seclists.org/bugtraq/1998/Dec/79

  4. Aryan, S., Aryan, H., Halderman, J.A.: Internet censorship in iran: a first look. In: Proceedings of the 3rd USENIX Workshop on Free and Open Communications on the Internet (FOCI) (2013)

    Google Scholar 

  5. Bellovin, S.M.: A Technique for Counting NATted Hosts. In: Proceedings of the 2nd Internet Measurement Conference (IMC), pp. 267–272 (2002)

    Google Scholar 

  6. Bhutani, A., Wadhwani, P.: Threat Intelligence Market Size By Component, By Format Type, By Deployment Type, By Application, Industry Analysis Report, Regional Outlook, Growth Potential, Competitive Market Share and Forecast, 2019–2025 (2019)

    Google Scholar 

  7. Bouwman, X., Griffioen, H., Egbers, J., Doerr, C., Klievink, B., van Eeten, M.: A Different cup of TI? the added value of commercial threat intelligence. In: Proceedings of the 29th USENIX Security Symposium (USENIX Security), pp. 433–450, August 2020

    Google Scholar 

  8. CAIDA: Inferred AS to Organization Mapping Dataset. https://www.caida.org/data/as_organizations.xml

  9. Censys - Public Internet Search Engine. https://censys.io/

  10. Clayton, Richard., Murdoch, Steven J., Watson, Robert N.M.: Ignoring the great firewall of China. In: Danezis, George, Golle, Philippe (eds.) PET 2006. LNCS, vol. 4258, pp. 20–35. Springer, Heidelberg (2006). https://doi.org/10.1007/11957454_2

    Chapter  Google Scholar 

  11. Ensafi, Roya., Knockel, Jeffrey., Alexander, Geoffrey, Crandall, Jedidiah R.: Detecting intentional packet drops on the internet via TCP/IP side channels. In: Faloutsos, Michalis, Kuzmanovic, Aleksandar (eds.) PAM 2014. LNCS, vol. 8362, pp. 109–118. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04918-2_11

    Chapter  Google Scholar 

  12. FireHOL IP Lists - All Cybercrime IP Feeds. http://iplists.firehol.org/

  13. Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 1568–1579. ACM (2016)

    Google Scholar 

  14. Hao, S., Thomas, M., Paxson, V., Feamster, N., Kreibich, C., Grier, C., Hollenbeck, S.: Understanding the Domain Registration Behavior of Spammers. In: Proceedings of the ACM Internet Measurement Conference (IMC), pp. 63–76. ACM (2013)

    Google Scholar 

  15. IP2Location: IP Address to Identify Geolocation. https://www.ip2location.com/

  16. IPdeny IP country blocks. https://www.ipdeny.com/

  17. IPIP.net: The Best IP Geolocation Database. https://en.ipip.net/

  18. Khattak, S., et al.: Do you see what i see? differential treatment of anonymous users. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016)

    Google Scholar 

  19. Klein, A., Pinkas, B.: From IP ID to device ID and KASLR bypass. In: Proceedings of the 28th USENIX Security Symposium (USENIX Security), pp. 1063–1080 (2019)

    Google Scholar 

  20. Kührer, Marc., Rossow, Christian, Holz, Thorsten: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, Angelos, Bos, Herbert, Portokalidis, Georgios (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_1

    Chapter  Google Scholar 

  21. Lemon, J.: Resisting SYN Flood DoS attacks with a SYN Cache. In: Proceedings of the BSD Conference (BSDCon), pp. 89–97. USENIX Association, USA (2002)

    Google Scholar 

  22. Li, G.: An Empirical Analysis on Threat Intelligence: Data Characteristics and Real-World Uses. Ph.D. thesis, UC San Diego (2020)

    Google Scholar 

  23. Li, V.G., Dunn, M., Pearce, P., McCoy, D., Voelker, G.M., Savage, S.: Reading the Tea leaves: a comparative analysis of threat intelligence. In: Proceedings of the 28th USENIX Security Symposium (USENIX Security), pp. 851–867, August 2019

    Google Scholar 

  24. MaxMind: IP Geolocation and Online Fraud Prevention. https://www.maxmind.com/

  25. McDonald, A., et al.: 403 forbidden: a global view of CDN Geoblocking. Proc. Internet Measurement Conf. 2018, 218–230 (2018)

    Article  Google Scholar 

  26. NetAcuity. https://seclists.org/bugtraq/1998/Dec/790

  27. OpenNet Initiative: Survey of Government Internet Filtering Practices Indicates Increasing Internet Censorship, May 2007

    Google Scholar 

  28. Park, J.C., Crandall, J.R.: Empirical study of a national-scale distributed intrusion detection system: backbone-level filtering of HTML responses in China. In: IEEE 30th International Conference on Distributed Computing Systems (ICDCS), pp. 315–326. IEEE (2010)

    Google Scholar 

  29. Pearce, P., Ensafi, R., Li, F., Feamster, N., Paxson, V.: Augur: internet-wide detection of connectivity disruptions. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 427–443. IEEE (2017)

    Google Scholar 

  30. Pitsillidis, A., Kanich, C., Voelker, G.M., Levchenko, K., Savage, S.: Taster’s choice: a comparative analysis of spam feeds. In: Proceedings of the ACM Internet Measurement Conference (IMC), pp. 427–440. Boston, MA, November 2012 (2012)

    Google Scholar 

  31. Ponemon Institute LLC: Third Annual Study on Changing Cyber Threat Intelligence: There Has to Be a Better Way (January 2018)

    Google Scholar 

  32. Postel, J.: RFC0791: Internet Protocol (1981)

    Google Scholar 

  33. Ramachandran, A., Feamster, N., Dagon, D.: Revealing Botnet Membership Using DNSBL Counter-Intelligence. SRUTI 6 (2006)

    Google Scholar 

  34. Shackleford, D.: Cyber Threat Intelligence Uses, Successes and Failures: The SANS 2017 CTI Survey. Technical Report, SANS (2017)

    Google Scholar 

  35. Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Proceedings of the Conference on Email and Anti-Spam (CEAS) (2009)

    Google Scholar 

  36. Singh, R., et al.: Characterizing the nature and dynamics of tor exit blocking. In: Proceedings of the 26th USENIX Security Symposium (USENIX Security), pp. 325–341 (2017)

    Google Scholar 

  37. Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: Proceedings of the 3rd International Conference on Malicious and Unwanted Software (MALWARE), pp. 57–64. IEEE (2008)

    Google Scholar 

  38. Spring, N., Mahajan, R., Wetherall, D.: Measuring ISP topologies with Rocketfuel. ACM SIGCOMM Comput. Commun. Rev. (CCR) 32(4), 133–145 (2002)

    Article  Google Scholar 

  39. Thomas, Kurt., Amira, Rony., Ben-Yoash, Adi., Folger, Ori., Hardon, Amir., Berger, Ari., Bursztein, Elie, Bailey, Michael: The abuse sharing economy: understanding the limits of threat exchanges. In: Monrose, Fabian, Dacier, Marc, Blanc, Gregory, Garcia-Alfaro, Joaquin (eds.) RAID 2016. LNCS, vol. 9854, pp. 143–164. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_7

    Chapter  Google Scholar 

  40. Tounsi, W., Rais, H.: A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72, 212–233 (2018)

    Article  Google Scholar 

  41. University of Oregon Route Views Project. http://www.routeviews.org/routeviews/

  42. Best National University Rankings. https://www.usnews.com/best-colleges/rankings/national-universities, January 2020

  43. Zittrain, J., Edelman, B.: Internet filtering in China. IEEE Internet Computing 7(2), 70–77 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautam Akiwate .

Editor information

Editors and Affiliations

A Inference Technique Details

A Inference Technique Details

Our technique, while simple in theory, needs to handle real-world scenarios, including packet losses, packet reordering during transition, and other traffic on reflectors. The inference method needs to be efficient, accurate, and have low overhead. Blocklists can change frequently, leaving a short window to infer a stable behavior. As such, for the measurement to finish in a reasonable amount of time requires an efficient inference method. Additionally, the method should also have low false positive and false negative rates so that we can be confident about the result. Finally, it should require as few packets as possible to reduce potential impact on reflectors.

The first step is to find reflectors suitable to our measurement technique. Recall that a suitable reflector should have minimal background traffic, and not be part of a network doing ingress filtering for spoofed packets. To find quiescent hosts, reflectors with low background traffic, we send 24 probes to each candidate host, 1 per second, and repeat the experiment 5 times at different times of the day. We then only select hosts where at least 30% of their IP ID increases are equal to 1 per second—the host did not receive any extra traffic in that one second. We use the 30% threshold to select hosts that are largely “quiet”, and thus more likely to yield a perfect signal in the experiment. Next, to identify hosts behind ingress filtering, we acquired 7 vantage points around the world to exercise different paths to the reflector. We sent spoofed packets from our measurement machine to the hosts with spoofed source addresses corresponding to the 7 vantage points, and then collected responses at each vantage point. We only select the hosts that send responses to all 7 vantage points, meaning they did not drop spoofed packets on any of the exercised network paths.

Next, we describe how we infer if a given reflector blocks an IP using multiple trials. We define a trial as a single experiment that tests if a reflector blocks one blocklist IP. Figure 6 shows the process of one trial. For each trial, the measurement machine sends five consecutive probe packets to the reflector, with each packet being sent one second apart. In our experiment, the probe packets are TCP SYN-ACK packets and we get IP IDs from response RST packets. Between the third and fourth probe packets, the measurement machine sends five spoofed packets, also TCP SYN-ACK, with source IPs equal to the blocklist IP. And between the fourth and the fifth probe packets, it sends another five spoofed packets. We send the five spoofed packets 0.15 s apart consecutively each time, spreading them across the one-second window between two probes.

Fig. 6.
figure 6

Blocking inference methodology. Solid blue lines are probe packets, dashed red lines are spoofed packets. (Color figure online)

Fig. 7.
figure 7

Experiment design and false positive and false negative analysis

We then inspect the increases between the IP IDs in the packets received by the measurement machine. Ideally, assuming no additional traffic and no packet loss, the IP ID should increase by exactly one between consecutive probes. For the last two deltas, since we send the spoofed packets in between our probe packets, the final IP ID increases will be different based on the host’s blocking behavior.

If the reflector does not block the blocklist IP, then we will observe an IP ID increase sequence in our received RST responses that is: [+1, +1, +6, +6]. Here the last two deltas are +6 since the reflector does not block the blocklist IP and thus responds to spoofed packets, causing IP ID to increase by 5, and our probe packet causes it to increase by another 1, which together make +6.

On the other hand, if the reflector blocks the blocklist IP, then we will see an IP ID increase sequence that is: [+1, +1, +1, +1]. Here the last two deltas are +1 since the reflector blocks the blocklist IP, leading to no extra change in IP ID.

The first three probes—corresponding to the first two IP ID deltas—act as a control. The last two “probe and spoof” patterns perform the actual experiment. Seeing the initial two “+1” indicates this host is in a quiet period (no extra network traffic). Therefore, we can be more confident that the following IP ID jump (“+6” in our case) is because of our experiment. While the choice of the numbers in the experiment may seem arbitrary, there is a rationale behind the choice which we will discuss in following sections.

1.1 A.1 Inference Criteria

We now look at the criteria to infer if a reflector blocks a blocklist IP or not. Our limited vantage point from the measurement machine limits our information to the IP IDs seen from the reflector. Moreover, we desire to be conservative when inferring blocking. Thus, our approach is to try the same trial, between a reflector and a blocklist IP, until we get a “perfect signal”—a response which matches all the criteria below:

  1. 1.

    The measurement machine received exactly five RST responses from the reflector.

  2. 2.

    The five responses are received one second apart consecutively.

  3. 3.

    The IP ID increase sequence is either [+1, +1, +6, +6], which we will conclude as no blocking, or [+1, +1, +1, +1], which we will conclude as blocking.

  4. 4.

    If any of the above three criteria are not met, we repeat the same experiment again. We repeat up to 15 trials before giving up.

The first requirement ensures no packet loss. The second requirement ensures responses we received reflect the real IP ID changes in the reflector. The Internet does not guarantee the order of packet arrival. Although we send one probe packet per second, these packets might not arrive at the reflector in the same order. Thus, the IP ID sequence from the response packets might not represent the real order of IP ID changes at the host. Hence, by requiring that the response packets cannot be less than 0.85 or more than 1.15 s apart we can minimize the probability of reordered packets.

The third requirement is the core of our inference logic. Since we ignore everything other than an IP ID increase sequence of [+1, +1, +1, +1] or [+1, +1, +6, +6], we can assure that our inference of blocking is conservative. If we saw a sequence of [+1, +1, +1, +1] but the reflector does not block the blocklist IP, that would mean all 10 spoofed packets were lost. On the other hand, if we see [+1, +1, +6, +6] and the reflector actually blocks the blocklist IP, that would mean there are exactly five extra packets generated by the reflector during each of the last two seconds. Both cases are very unlikely, which we will demonstrate next with an analysis of false positives and false negatives.

1.2 A.2 False Positive and False Negative Analysis

For our experiment, a false positive is when a reflector is not blocking a blocklist IP, but we mistakenly conclude it is blocking. On the other hand, a false negative is when a reflector is blocking a blocklist IP, but we mistakenly conclude it is not. To evaluate false positive and false negative rates, we conduct experiments on all the reflectors under consideration and measure the false positive and false negative rates.

For false positive evaluation, we first acquire a list of IPs that are verifiably not being blocked by reflectors. Since we own these IPs, we can easily verify by directly probing reflectors from these IPs. We acquired and tested 1,265 IPs from five different /24s. Then we probe reflectors and send the spoofed packets with source addresses set to these pre-selected IPs. Since these IPs are not being blocked, if we observe an IP ID increase sequence of [+1, +1, +1, +1], then we know it is a false positive.

For false negatives, we run the experiment with only probe packets, and no spoofed packets. This scenario is equivalent to the one where the reflector blocks the spoofed IP. If we observe an IP ID increase sequence of [+1, +1, +6, +6], then we know it was due to the background traffic at the reflector and hence is a false negative.

Although we present the experiment design with five spoofed packets in each of the last two seconds, we also experimented with a range of numbers and calculated their false positive and negative rates. We tested 15 times with spoofed packets equal to 3, 4, 5, 6, and 7 with every reflector, and we repeated the experiment again on a different day. The final results are shown in Fig. 7.

We need to trade off between keeping false positive and negative rates low while generating as little traffic as possible. We choose 5 spoofed packets as a balance. By sending 5 spoofed packets, we get a false positive rate of 2.5e-5, and a false negative rate of 8.5e-5. Furthermore, we also experimented with strategies where we send 4 probe packets, from which we get 3 IP ID deltas, and sending 6 probe packets, from which we get 5 IP ID deltas. With only 3 deltas we suffer a higher false negative rate, as it is easier for the reflector to show the same IP ID increase sequence with extra traffic. With 6 probes, on the other hand, we prolong the experiment, making it harder to get a “perfect signal”. Thus, our choice of 5 probe packets with 5 spoofed packets in between is a good balance between competing factors.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, V.G., Akiwate, G., Levchenko, K., Voelker, G.M., Savage, S. (2021). Clairvoyance: Inferring Blocklist Use on the Internet. In: Hohlfeld, O., Lutu, A., Levin, D. (eds) Passive and Active Measurement. PAM 2021. Lecture Notes in Computer Science(), vol 12671. Springer, Cham. https://doi.org/10.1007/978-3-030-72582-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72582-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72581-5

  • Online ISBN: 978-3-030-72582-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics