Skip to main content
Log in

Improving resource location with locally precomputed partial random walks

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Random walks can be used to search complex networks for a desired resource. To reduce search lengths, we propose a mechanism based on building random walks connecting together partial walks (PW) previously computed at each network node. Resources found in each PW are registered. Searches can then jump over PWs where the resource is not located. However, we assume that perfect recording of resources may be costly, and hence, probabilistic structures like Bloom filters are used. Then, unnecessary hops may come from false positives at the Bloom filters. Two variations of this mechanism have been considered, depending on whether we first choose a PW in the current node and then check it for the resource, or we first check all PWs and then choose one. In addition, PWs can be either simple random walks or self-avoiding random walks. Analytical models are provided to predict expected search lengths and other magnitudes of the resulting four mechanisms. Simulation experiments validate these predictions and allow us to compare these techniques with simple random walk searches, finding very large reductions of expected search lengths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A round is a unit of discrete time in which every node is allowed to send a message to one of its neighbors. According to this definition, a simple random walk of length \(\ell \) would then take \(\ell \) rounds to be computed.

  2. More concretely, \(p\) is the probability of obtaining a positive result conditioned on the desired resource not being in the filter.

  3. This is, in fact, a pessimistic assumption. The distribution of trailing steps is approximately uniform, but shorter walks have a slightly higher probability than longer ones. This can be shown analytically and has been confirmed in our experiments (see “Appendix A”). Therefore, the expected value in our analysis, derived from a perfectly uniform distribution, is slightly higher than the real average value.

  4. In the following, we make implicit use of the linearity properties of expectations of random variables.

  5. If \(Y\) is a random variable with a binomial distribution with success probability \(p\), in which the number of experiments is in turn the random variable \(X\), it can be easily shown that \(\overline{Y} = \overline{X}\cdot p\) (see “Appendix B”).

  6. For each network, the expected length of a random walk search (\(\overline{L}\)) is needed. We estimate these expected values by simulating \(10^6\) simple random walk searches and averaging their lengths in each of the networks (these average search lengths are denoted using lowercase \((\overline{l})\) to distinguish them from the actual expected value (\(\overline{L}\)) in the model. The values obtained from the experiments are: \(\overline{l}_{reg} = 11246\), \(\overline{l}_{ER} = 12338\), and \(\overline{l}_{sf} = 15166\)). These results agree with the approximate analytical method in [12] (a modification of the one provided in [5]), which produces the following results: \(\overline{l}_{reg} = 11095\), \(\overline{l}_{ER} = 12191\), and \(\overline{l}_{sf} = 14920\).

  7. The distribution of simple random walk searches has also been obtained experimentally, showing that Eq. 10 is a good approximation.

References

  1. Adamic LA, Lukose RM, Puniyani AR, Huberman BA (2001) Search in power-law networks. Phys Rev E 64(046135)

  2. Qin L, Pei C, Edith C, Kai L, Scott S (2002) Search and replication in unstructured peer-to-peer networks. In: ICS ’02: Proceedings of the 16th international conference on supercomputing. ACM, New York, pp 84–95

  3. Yang S-J (2005) Exploring complex networks by walking on them. Phys Rev E 71(016107)

  4. Gkantsidis C, Mihail M, Saberi A (2006) Random-walks in peer-to-peer networks: algorithms and evaluation. Perform Evaluation 63:241–263

    Article  Google Scholar 

  5. Rodero-Merino L, Fernández Anta A, López L, Cholvi V (2010) Performance of random walks in one-hop replication networks. Comput Netw 54(5):781–796

    Article  MATH  Google Scholar 

  6. Jacobson V, Smetters DK, Thornton JD, Plass MF, Briggs N, Braynard R (2012) Networking named content. Commun ACM 55(1):117–124

    Article  Google Scholar 

  7. Andrei B, Michael M (2004) Network applications of bloom filters: a survey. Internet Math 1(4): 485–509

    Google Scholar 

  8. Das Sarma A, Nanongkai D, Pandurangan G, Tetali P (2013) Distributed random walks. J ACM 60(1):2:1–2:31

    Article  MathSciNet  Google Scholar 

  9. Phouvieng H, Shigeo S (2010) Characteristics of random walk search on embedded tree structure for unstructured p2ps. In: International conference on parallel and distributed systems, pp 782–787

  10. López Millán VM, Cholvi V, López L, Fernández Anta A (2012) Resource location based on partial random walks in networks with resource dynamics. In: Proceedings of the 4th international workshop on theoretical aspects of dynamic distributed systems, TADDS ’12. ACM, New York, pp 26–31

  11. Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64(026118)

  12. López Millán VM, Cholvi V, López L, Fernández Anta A (2012) A model of self-avoiding random walks for searching complex networks. Networks 60(2):71–85

    MATH  MathSciNet  Google Scholar 

  13. Lovász L (1993) Random walks on graphs: a survey. Combinatorics, Paul Erdős is eighty, vol 2. Keszthely, Hungary, pp 1–46

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor M. López Millán.

Additional information

This research was supported in part by Comunidad de Madrid grant S2009TIC-1692, Spanish MINECO grant TEC2011-29688-C02-01, Spanish MINECO grant TIN2011-28347-C02-01, Bancaixa grant P11B2010-28, and National Natural Science Foundation of China grant 61020106002.

Appendices

Appendix A: Distributions of the number of trailing steps

The proof of Theorem 1 assumes that the distribution of the number of trailing steps in the last PW is uniform between 0 and \(s-1\), corresponding to the cases where the first node/last node in the PW holds the desired resource. Recall that the Bloom filter stores the resources held by the \(s\) first nodes in the PW, from the node that precomputed the partial walk to the one before its last node (which is included in the partial walks departing from it). We have obtained that distribution from the \(10^6\) searches in our experiment for each of the three networks. Figure 6 shows the results for the regular network when \(s=10\), \(s=s_{opt}=150\) and \(s=1000\). Distributions for the ER and scale-free networks are similar in shape.

Fig. 6
figure 6

Distributions of the number of trailing steps in the regular network

A slight decrease in the frequency is observed as the number of steps grows. This is due to the fact that the number of trailing steps is essentially the length of the total walk modulus the length of PWs (\(s\)). The total walk is a random walk, and its distribution can be obtained approximately by Eq. 10.Footnote 7 Since it is a decreasing function, as it is shown below, the frequency on the left end of an interval of width \(s\) is always higher than the frequency on the right end, thus accounting for the observed decrease.

This means that the result provided by Theorem 1 is pessimistic, since the estimated average number of trailing steps is slightly higher than the real one. Results in Sect. 3.2 have shown that expected search lengths predicted by Eq. 1 are very similar to values averaged from simulations data, with larger error for higher values of \(s\).

The probability distribution of simple random walk searches can be estimated using Eq. 10. We show below that it is strictly decreasing, that is: \(P_{i} - P_{i-1} < 0\) for \(0 \le i < \infty \):

$$\begin{aligned} P_i = \left( 1 - \sum _{j=0}^{i-1}{P_j}\right) \cdot \frac{1}{N-1},\ \mathrm {for}\ i>0; \ \ P_0 = \frac{1}{N}. \end{aligned}$$
(10)

First, it is shown by induction that \(0 < \sum _{i=0}^k P_i < 1\) for \(k\ge 0\) and \(N>0\). It holds trivially for \(k=0\). Then, it is also true for \(k>0\) if it holds for \(k-1\):

$$\begin{aligned} \sum _{i=0}^k P_i&= \sum _{i=0}^{k-1} P_i + \left( 1 - \sum _{i=0}^{k-1} P_i \right) \cdot \frac{1}{N-1} = \frac{N-2}{N-1} \cdot \sum _{i=0}^{k-1} P_i \nonumber \\&\quad + \frac{1}{N-1} < \frac{N-2}{N-1} + \frac{1}{N-1} = 1. \end{aligned}$$

Next, it is shown that \(0 < P_i < 1\) for \(i\ge 0\) as a corollary of the previous result. It is checked for \(i=0\) by inspection. For \(i>0\), we have that \(P_i = \left( 1 - \sum _{j=0}^{i-1} P_j \right) \cdot \frac{1}{N-1} \). Then:

$$\begin{aligned} 0 < 1 - \sum _{j=0}^{i-1} P_j < 1, \ \ \mathrm {and\ then\!:} \ \ 0 < P_i = \left( 1 - \sum _{j=0}^{i-1} P_j\right) \cdot \frac{1}{N-1} < 1. \end{aligned}$$

Finally, it is shown that \(P_i - P_{i-1} < 0\) for \(i>0\). For \(i=1\), by inspection. For \(i>1\):

$$\begin{aligned} P_i - P_{i-1} = \left( 1 - \sum _{j=0}^{i-1} P_j \right) \frac{1}{N-1} - \left( 1 - \sum _{j=0}^{i-2} P_j \right) \frac{1}{N-1} = -\frac{P_{i-1}}{N-1}. \end{aligned}$$

Since we have shown that \(0 < P_{i-1} < 1\), it follows that \(P_i - P_{i-1} < 0\).

Appendix B: Expectation of a random variable with a binomial distribution in which the number of experiments is another random variable

Let \(X\) be a random variable with sample space \(S = \mathbb {N}_0 = \{0,1,2\ldots \}\). Let \(Y\) be a random variable representing the number of successes when \(X\) experiments are performed with a success probability \(p\). \(Y\) has a binomial probability distribution \(Y\sim \mathrm {B}(X,p)\), where the number of experiments is, in turn, a random variable. Then, from the definition of expectation and the Total Probability Theorem, the expectation of \(Y\) is \(\mathrm {E}[Y] = \mathrm {E}[X]\cdot p\).

$$\begin{aligned} \mathrm{E}[Y]&= \sum _{y=0}^\infty y \cdot \hbox {P}_\mathrm{r}[Y=y] = \sum _{y=0}^\infty y \cdot \left\{ \sum _{x=0}^\infty \hbox {P}_\mathrm{r}[Y=y|X=x] \cdot \hbox {P}_\mathrm{r}[X=x] \right\} \\&= \sum _{x=0}^\infty \mathrm{E}[Y|X=x] \cdot \hbox {P}_\mathrm{r}[X=x] = \sum _{x=0}^\infty x \cdot p \cdot \hbox {P}_\mathrm{r}[X=x] = \mathrm{E}[X] \cdot p. \end{aligned}$$

Appendix C: Searches based on reused partial walks

We explore here the search length distributions when the total walks are built reusing a limited number \(w\) of PWs per node. How many PWs are necessary for the distributions to be similar to those of non-reused PWs? For the networks considered in our experiment and for the optimal PW size (\(s_{opt}\)), we have found that it is enough to have as few as two PWs. The extreme case of one PW yields a significant fraction of unfinished searches, since it is relatively easy to build walks that are loops that do not visit all the nodes. Indeed, if the last node of a PW is a node whose (only) PW has been previously used in that total walk, it will repeatedly take the search to the same place again. However, if one PW is chosen randomly among several ones, the chances of entering a loop are very small.

Figure 7 shows the search lengths distributions in the regular network. The top plots correspond to non-reused PWs. The middle and bottom plots correspond to reusing a single PW or two PW per node, respectively. The shape of the distributions is the same for all \(w\). However, distributions for \(w=1\) are lower and the average search length (marked as a vertical bar) is also smaller. This is due to a significant percentage of unfinished searches (about 26 %), left out of the histograms, due to loops as explained above. If we focus on the case for \(w=2\), we note that both the distribution and the average search length are very similar to those of non-reused PWs. Additional experiments with higher \(w\) confirm this observation. As a global measure of the difference between the distributions for \(w=2\) and for non-reused PWs, we compute the mean relative difference as \( \frac{1}{L_{90\,\%}+1}\sum _{l=0}^{L_{90\,\%}} \frac{|h_2(l) - h_{\mathrm {nr}}(l)|}{h_{\mathrm {nr}}(l)}, \) where \(h_w(l)\) and \(h_{\mathrm {nr}}(l)\) are the frequencies of searches with length \(\ell \) when using \(w\) partial walks and non-reused PWs, respectively. The tail of the distribution is removed, including searches within the 90 % percentile \((L_{90\,\%})\). The mean relative differences for \(p=0,\, p=0.01\) and \(p=0.1\) are, respectively, 0.023, 0.035 and 0.076. This suggests that two PWs per node are enough to obtain a behavior close to the theorical case of non-reused PWs. The conclusion for the ER network and the scale-free network is the same.

Fig. 7
figure 7

Distributions for non-reused PWs and for \(w=1,2\) (regular network)

Rights and permissions

Reprints and permissions

About this article

Cite this article

López Millán, V.M., Cholvi, V., López, L. et al. Improving resource location with locally precomputed partial random walks. Computing 97, 871–891 (2015). https://doi.org/10.1007/s00607-013-0353-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0353-x

Keywords

Mathematics Subject Classification

Navigation