A Paged Domain Name System for Query Privacy

Asoni, Daniele E.; Hitz, Samuel; Perrig, Adrian

doi:10.1007/978-3-030-02641-7_12

Daniele E. Asoni¹⁵,
Samuel Hitz¹⁵ &
Adrian Perrig¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11261))

Included in the following conference series:

International Conference on Cryptology and Network Security

768 Accesses

Abstract

The lack of privacy in DNS and DNSSEC is a problem that has only recently begun to see widespread attention by the Internet and research communities, and the solutions proposed so far only look at a narrow slice of the design space. In this paper we investigate a new approach for a privacy-preserving DNS mechanism that hides query information from root name servers and TLD registries. Our architecture lets TLD registries group the DNS records in their zones together into pages. Resolvers cache all pages locally, and retrieve only small incremental updates to optimize performance. We show that this strategy is particularly effective given the relatively static nature of TLD zone records. We analyze the privacy guarantees to assess the potential and limitations of our approach; we also evaluate the memory overhead for a resolver, and obtain feasibility guarantees through a prototype implementation of the new functionalities for resolvers and registries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We consider SHA-256 as a reasonable choice for the hash function. While the size could be reduced to 16 bytes while still retaining a negligible collision probability in a non-adversarial setting, a larger size is necessary if we want to have a negligible probability even in a scenario where the adversary actively tries to find a domain name which will result in a collision.
2.
In practice, popularity will vary on a regional basis. We envision that replication may be made region-specific (the non-replicated part of each page would remain the same). We leave a more detailed analysis of these aspects to future work.
3.
PRPs of small domain size can be implemented using format-preserving encryption (FPE) schemes, there exist suitable encryption modes that use standard AES block ciphers as a primitive and achieve FPEs of arbitrary domain size.
4.
The pages in T are chosen uniformly at random; the only dependency that T has from k is for its size. For instance, |T| may be chosen such that the total number of page requests is higher than or equal to a given minimum.
5.
We are slightly approximating the exact value in Eq. 6, ignoring the fact that the pages in T are chosen from the set of all m pages excluding those that are already part of the fingerprint.
6.
To formally show this step, one needs to average out the probability over all possible replica sets that could be assumed by all domains, i.e., over all possible hash functions (or all possible sets of domain names of size N).
7.
The number of accessed domains is almost half of the 20,000 we consider: this is because many of them did not have a www host, and also due to some restrictions we imposed on the loading time.
8.
The reason almost 25% of domains were not resolved is that for our monitoring we kept low timeouts, and excluded the domains which frequently resulted in time-outs.
9.
With a growth rate of ${\sim }5\%$ the number of domains and thus the number of pages doubles approximately every 14 years. We expect that the available bandwidth and computing power can easily keep up with the growth of PageDNS.
10.
It appears that Verisign, Inc. was able to obtain a patent [28] on this technology, and it is unclear what this will mean for its adoption.

References

Google Analytics Solutions. https://www.google.com/analytics. Accessed 22 Sept 2017
NSA Spying on Americans. https://www.eff.org/nsa-spying. Accessed 22 Sept 2017
OpenDNS. https://www.opendns.com/. Accessed 22 Sept 2017
Aguilar-Melchor, C., Barrier, J., Fousse, L., Killijian, M.-O.: XPIR: private information retrieval for everyone. In: PETS (2016)
Google Scholar
Arends, R., Austein, R., Larson, M., Massey, D., Rose, S.: DNS security introduction and requirements. RFC 4033 (2005)
Google Scholar
Barnes, R., et al.: Confidentiality in the face of pervasive surveillance: a threat model and problem statement. RFC 7624 (2015)
Google Scholar
Bernstein, D.J.: DNSCurve: usable security for DNS. https://dnscurve.org/. Accessed 22 Sept 2017
Bortzmeyer, S.: DNS privacy considerations. RFC 7626 (2015)
Google Scholar
Bortzmeyer, S.: DNS query name minimisation to improve privacy. RFC 7816 (2016)
Google Scholar
Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In: IEEE FOCS (1995)
Google Scholar
Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. J. ACM 45(6) (1998)
Article MathSciNet Google Scholar
Cohen, E., Kaplan, H.: Proactive caching of DNS records: addressing a performance bottleneck. In: IEEE/IPSJ International Symposium on Applications and the Internet (SAINT) (2001)
Google Scholar
Denis, F., Fu, Y.: DNSCrypt (2011). https://dnscrypt.org/. Accessed 22 Sept 2017
Devet, C., Goldberg, I., Heninger, N.: Optimally robust private information retrieval. In: USENIX Security (2012)
Google Scholar
Dickinson, J., Dickinson, S., Bellis, R., Mankin, A., Wessels, D.: DNS transport over TCP - implementation requirements. RFC 7766 (2016)
Google Scholar
Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: USENIX Security (2004)
Google Scholar
Farrell, S., Tschofenig, H.: Pervasive monitoring is an attack. RFC 7258 (2014)
Google Scholar
Federrath, H., Fuchs, K.-P., Herrmann, D., Piosecny, C.: Privacy-preserving DNS: analysis of broadcast, range queries and mix-based protection methods. In: ESORICS (2011)
Google Scholar
Grothoff, C., Wachs, M., Emert, M., Appelbaum, J.: NSA’s MORECOWBELL: knell for DNS. Technical report, GNUnet e.V. (2015)
Google Scholar
Handley, M., Greenhalgh, A.: The case for pushing DNS. In: HotNets (2005)
Google Scholar
Hu, S., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., Hoffman, P.: Specification for DNS over Transport Layer Security (TLS). RFC 7858 (2016)
Google Scholar
ICANN: .com Monthly Registry Reports. https://www.icann.org/resources/pages/com-2014-03-04-en. Accessed 22 Sept 2017
Jung, J., Sit, E., Balakrishnan, H., Morris, R.: DNS performance and the effectiveness of caching. IEEE/ACM TON 10(5), 589–603 (2002)
Article Google Scholar
Kangasharju, J., Ross, K.W.: A replicated architecture for the domain name system. In: IEEE INFOCOM (2000)
Google Scholar
Kushilevitz, E., Ostrovsky, R.: Replication is not needed: single database, computationally-private information retrieval. In: IEEE FOCS (1997)
Google Scholar
Laurie, B., Sisson, G., Arends, R., Blacka, D.: DNS security (DNSSEC) hashed authenticated denial of existence. RFC 5155 (2008)
Google Scholar
Lu, Y., Tsudik, G.: Towards plugging privacy leaks in the domain name system. In: IEEE P2P (2010)
Google Scholar
McPherson, D., Osterweil, E.: Providing privacy enhanced resolution system in the domain name system. US Patent 8,880,686 B2 (2014)
Google Scholar
Mockapetris, P.: Domain names - concepts and facilities. RFC 1034 (1987)
Google Scholar
Mockapetris, P.: Domain names - implementation and specification. RFC 1035 (1987)
Google Scholar
Ostrovsky, R., Skeith III, W.E.: A survey of single-database PIR: techniques and applications. In: PKC (2007)
Google Scholar
Pappas, V., Massey, D., Terzis, A., Zhang, L.: A comparative study of the DNS design with DHT-based alternatives. In: IEEE INFOCOM (2006)
Google Scholar
Ramasubramanian, V., Sirer, E.G.: The design and implementation of a next generation name service for the Internet. In: ACM SIGCOMM (2004)
Google Scholar
Rossow, C.: Amplification hell: revisiting network protocols for DDoS abuse. In: NDSS (2014)
Google Scholar
Alexa the Web Information Company. Alexa Top 500 Global Sites (2016). http://www.alexa.com/topsites
Toledo, R.R., Danezis, G., Goldberg, I.: Lower-cost $\epsilon $-private information retrieval. PoPETS 2016(4), 184–201 (2016)
Google Scholar
Verisign, Inc.: The domain name industry brief, vol. 14, no. 2 (2017). https://www.verisign.com/assets/domain-name-report-Q12017.pdf. Accessed 22 Sept 2017
Wachs, M., Schanzenbach, M., Grothoff, C.: A censorship-resistant, privacy-enhancing and fully decentralized name system. In: International Conference on Cryptology and Network Security (CANS) (2014)
Google Scholar
Wijngaards, W., Wiley, G.: Confidential DNS. Internet Draft draft-wijngaards-dnsop-confidentialdns-03 (2015)
Google Scholar
Zhao, F., Hori, Y., Sakurai, K.: Analysis of privacy disclosure in DNS query. In: International Conference on Multimedia and Ubiquitous Engineering (MUE) (2007)
Google Scholar
Zhu, L., Hu, Z., Heidemann, J., Wessels, D., Mankin, A., Somaiya, N.: Connection-oriented DNS to improve privacy and security. In: IEEE Symposium on Security and Privacy (2015)
Google Scholar

Download references

Acknowledgments

We thank Jinank Jain for his help with the prototype implementation.

Author information

Authors and Affiliations

Network Security Group, Department of Computer Science, ETH Zürich, Zurich, Switzerland
Daniele E. Asoni, Samuel Hitz & Adrian Perrig

Authors

Daniele E. Asoni
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Hitz
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Perrig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele E. Asoni .

Editor information

Editors and Affiliations

ETH Zürich, Zürich, Switzerland
Srdjan Capkun
Chinese University of Hong Kong, Shatin, Hong Kong
Sherman S. M. Chow

Appendices

A Replication Function

Let $\mathcal {P}$ be a page of PageDNS containing n records, and let $k'$ be the rank of a record in $\mathcal {P}$. For ease of notation, we write $k' \in \mathcal {P}$, and in general we will often use a domain’s rank to refer to the domain. We assume a total of N domains, thus $k' \in \{1,\dots ,N\}$, where $k'=1$ is the highest rank. We consider a random variable K indicating the rank of a domain chosen at random (by a generic client) according to a Zipf distribution with parameter $s=0.91$, i.e., such that:

$$\begin{aligned} \Pr (K = k) = f(k; s, N) = \frac{1}{k^{s} H_{N,s}} \quad \;\text {with}\ H_{N,s} = \sum _{k=1}^{N} \frac{1}{k^s} \end{aligned}$$

(10)

To simplify the notation, we will write f(k) to mean f(k; s, N) and H for $H_{N,s}$.

Now we try to analytically express the probability that an adversary would assign to $k'$ being the target domain having observed a request to $\mathcal {P}$, which is equal to the probability that $K = k'$ given that K is restricted to $\mathcal {P}$. By applying Bayes theorem, we obtain the following equation:

$$\begin{aligned} \Pr (K=k' \mid K\in \mathcal {P}) = \frac{\Pr (K\in \mathcal {P}\mid K=k') \Pr (K=k')}{\sum _{k\in \mathcal {P}} \Pr (K\in \mathcal {P}\mid K=k) \Pr (K=k)} \end{aligned}$$

(11)

Probability $\Pr (K\in \mathcal {P}\mid K=k)$ is equal to 1 if k is not replicated, since we are assuming that $k\in \mathcal {P}$. More generally, if k has a replication degree of r(k) (i.e., the record for domain k exists on r(k) pages), then the probability of choosing the replica in $\mathcal {P}$ is 1 / r(k). We can therefore rewrite Eq. 11 as follows:

$$\begin{aligned} \Pr (K=k' \mid K\in \mathcal {P}) = \frac{f(k')/r(k')}{\sum _{k\in \mathcal {P}} f(k)/r(k)} \end{aligned}$$

(12)

Now let $k''$ be another domain on the same page, i.e., $k''\in \mathcal {P}$. Ideally, we would like the replication function to be such that $\Pr (K=k' \mid K\in \mathcal {P}) = \Pr (K=k'' \mid K\in \mathcal {P})$ for all possible choices of $k'$ and $k''$. Unfortunately, it is possible to see that the only scenario where this could theoretically be achieved is one where the number of pages is equal to the number of domains, and the cost of replication would be excessive (the total size of all PageDNS pages would increase by almost a hundredfold). Instead, we try to get the ratio of those probabilities as close to 1 as possible. Since we also want to minimize the cost of replication, we do not replicate the least popular domain (i.e., $r(N)=1$): replication should only help to reduce the probability in Eq. 11 for high-rank domains, to get it closer to the probability of the more unpopular domains. It is reasonable therefore for the ratio to be at its maximum when $k'=1$ and $k''=N$.

$$\begin{aligned} \rho _{ MAX }= \frac{\Pr (K=1 \mid K\in \mathcal {P})}{\Pr (K=N \mid K\in \mathcal {P})} = \frac{f(1)/r(1)}{f(N)/r(N)} \end{aligned}$$

(13)

Denoting with R the replication degree of the most popular domain ($r(1) = R$), and since $r(N)=1$, Eq. 13 becomes the following:

$$\begin{aligned} \rho _{ MAX }= \frac{f(1)/R}{f(N)} = \frac{H^{-1}/R}{N^{-s} H^{-1}} = \frac{N^s}{R} \end{aligned}$$

(14)

All other domains should be replicated in order not to increase this ratio further. From this requirement, we obtain the following bound $\forall k$.

$$\begin{aligned}&\frac{\Pr (K=k \mid K\in \mathcal {P})}{\Pr (K=N \mid K\in \mathcal {P})}&\le \rho _{ MAX }\end{aligned}$$

(15)

$$\begin{aligned}&\implies \quad&\frac{f(k)/r(k)}{f(N)/r(N)} = \frac{k^{-s}H^{-1}/r(k)}{N^{-s} H^{-1}} = \frac{N^s}{k^s r(k)}&\le \frac{N^s}{R}\end{aligned}$$

(16)

$$\begin{aligned}&\implies \quad&r(k)&\ge \frac{R}{k^s} \end{aligned}$$

(17)

We derived the bound in Eq. 17 for the worst case of the page containing both the most and the least popular domains, so by applying it generally to the replication for all k-s we ensure that on no page there will be two domains for which the ratio of their identification probabilities (Eq. 11) exceeds $\rho _{ MAX }$. Furthermore, with the approximation that the denominator in Eq. 12, $\sum _{k\in \mathcal {P}} f(k)/r(k)$, has the same value for all pages, the bound in Eq. 17 actually guarantees the following for any two pages $\mathcal {P}$, $\mathcal {P}'$:

$$\begin{aligned} \forall k\in \mathcal {P}, \forall k'\in \mathcal {P}'\quad \frac{\Pr (K=k \mid K\in \mathcal {P})}{\Pr (K=k' \mid K\in \mathcal {P}')} \le \rho _{ MAX }\end{aligned}$$

(18)

Since we desire to minimize the cost of replication, we try to match the bound of Eq. 17 as closely as possible (rounding it to the nearest integer), with the additional constraint that replication of any domain be at least 1. Thus the replication function we use is the following:

$$\begin{aligned} r(k) = max\{1, round (R k^{-s})\} \end{aligned}$$

(19)

In Fig. 5 we plot the function. Note how only the most popular domains with rank from 1 to $k^{*}$ are replicated: these will all have approximately (because of rounding) the same identification probability, while for less popular domains the probability will be lower. We also point out that in our scenario $R\le m$, where m is the number of pages, and that the best (lowest) probabilities are obtained for the equality: in this case, we have the most popular domain replicated on all pages. For realistic values ($m = 10,000$ and $s=0.91$), we obtain $k^{*} \simeq 200,000$.

B Page-Size Variance

We can think of the size of a page P as the sum of N random variables $X_{k}$, each assuming value 1 if the k-th domain is assigned by the hash function to page P, and value 0 otherwise. The size of page P is thus $X = \sum _{k=1}^{N} X_{k}$. Assuming that the hash function behaves as a random function, and considering a set of m pages, we can easily compute the expected value of the size of the generic page P as follows:

(20)

Since X is the sum of independent random variables with values in the set $\{0,1\}$, we can apply the multiplicative Chernoff bound to estimate the probability that the size of a specific page will deviate from the expected value $\mu $ by a certain factor $(1-\delta )$ (we aim to find a lower bound). The bound has the following form.

$$\begin{aligned} \Pr (X \le (1-\delta )\mu ) \le e^{-\frac{\delta ^2\mu }{2}} \end{aligned}$$

(21)

Considering for the parameters the values $N = 10^9$ and $m = 10^5$, as we have done throughout the paper, we obtain from Eq. 20 that $\mu = 10^4$. Setting $\delta = 0.1$ for a deviation of at least 10% from the mean, Eq. 21 yields the following bound:

$$\begin{aligned} \Pr (X \le 0.9 \mu ) \le e^{-\frac{10^{-2}10^4}{2}} = e^{-50} \simeq 2\cdot 10^{-22} \end{aligned}$$

(22)

We see from these numbers that the probability of having pages significantly smaller than the average is clearly negligible. Another Chernoff bound can be used to find similar limitations for the probability of pages to be 10% larger than the average.

Replicas Distribution. To determine the distribution of the number of replicas per page, we find that Chernoff bounds are not effective, as they do not allow us to rule out extreme cases such as having only 2 or 3 replicas on some page. Instead, we use a simulation over $10^6$ pages, and find that the median is 25 records per page, though it can be as low as 7 in exceptional cases.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asoni, D.E., Hitz, S., Perrig, A. (2018). A Paged Domain Name System for Query Privacy. In: Capkun, S., Chow, S. (eds) Cryptology and Network Security. CANS 2017. Lecture Notes in Computer Science(), vol 11261. Springer, Cham. https://doi.org/10.1007/978-3-030-02641-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-02641-7_12
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02640-0
Online ISBN: 978-3-030-02641-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics