Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems

Cordeiro, Weverton; Gaspary, Luciano; Beltran, Rafael; Paim, Kayuã; Mansilha, Rodrigo

doi:10.1007/s12083-020-01012-2

Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems

Published: 13 November 2020

Volume 14, pages 687–707, (2021)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Weverton Cordeiro ORCID: orcid.org/0000-0001-7536-0586¹,
Luciano Gaspary¹,
Rafael Beltran²,
Kayuã Paim³ &
…
Rodrigo Mansilha²

273 Accesses
2 Citations
Explore all metrics

Abstract

Accuratecomprehension of users’ behavior is paramount for understanding the dynamics of several systems, such as e-commerce platforms, social networks, and mobile computing. To this end, several strategies have been proposed to obtain data sets based on the capture of usage information, which can then serve for user analytics. A popular strategy consists of taking periodic snapshots of online users, a practical instance of the coupon collector’s problem tailored to users monitoring in networked systems. Due to system-specific limitations, however, users may fail to appear in some snapshots, although online. To bridge this gap, we present a methodology to correct ill-collected snapshots and build more accurate data sets. In summary, we formally model user snapshotting as an instance of the coupon collector’s problem, estimate the probability that some users are missing in a given snapshot following a Bernoulli process, and correct those snapshots should the probability exceed a given threshold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification and impact of discoverers in online social systems

Article Open access 30 September 2016

Community-based time segmentation from network snapshots

Article Open access 28 May 2019

Like a Pack of Wolves: Community Structure of Web Trackers

Notes

Link to trace files: https://www.inf.ufrgs.br/~wlccordeiro/swarm_traces/
Link to the Github repo with scripts and source code of the software used in this work: https://github.com/ComputerNetworks-UFRGS/TraceCollection/
Parameter min interval, in https://wiki.theory.org/BitTorrentSpecification

References

Vasilakos AV, Li Z, Simon G, You W (2015) Information centric network: Research challenges and opportunities. J Netw Comput Appl 52:1–10
Article Google Scholar
Han J, Choi D, Chung T, Chuah C-N, Kim H-C, Kwon TT (2019) Predicting content consumption from content-to-content relationships. J Netw Comput Appl 132:1–9
Article Google Scholar
Tokuyama K, Miyoshi N (2018) Data rate and handoff rate analysis for user mobility in cellular networks. In: 2018 IEEE wireless communications and networking conference (WCNC). IEEE, pp 1–6
Lareida A, Hoßfeld T, Stiller B (2017) The bittorrent peer collector problem. In: 2017 IFIP/IEEE symposium on integrated network and service management (IM). IEEE, pp 449–455
Zannettou S, Caulfield T, Blackburn J, De Cristofaro E, Sirivianos M, Stringhini G, Suarez-Tangil G (2018), NY, USA
Hoßfeld T, Lehrieder F, Hock D, Oechsner S, Despotovic Z, Kellerer W, Michel M (2011) Characterization of BitTorrent swarms and their distribution in the Internet. Comput Netw 55(5):1197–1215
Article Google Scholar
Stadje W (1990) The collector’s problem with group drawings. Adv Appl Probab 22(4):866–882
Article MathSciNet Google Scholar
Cuevas R, Kryczka M, Cuevas A, Kaune S, Guerrero C, Rejaie R (2010) Is content publishing in BitTorrent altruistic or profit-driven?. In: 6Th international conference on emerging networking EXperiments and technologies (co-NEXT’10)
Yoshida M, Nakao A (2011) Measuring BitTorrent swarms beyond reach. In: IEEE international conference on peer-to-peer computing (P2P 2011), pp 220–229
Mansilha RB, Bays LR, Lehmann MB, Mezzomo A, Facchini G, Gaspary LP, Barcellos MP (2011) Observing the bittorrent universe through telescopes. In: 2011 IFIP/IEEE international symposium on integrated network management
Nyang D, Shin D (2016) Recyclable counter with confinement for real-time per-flow measurement. IEEE/ACM Trans Netw 24(5):3191–3203
Article Google Scholar
Tan R, Kong X, Zhang Y, Tan Q, Lu H, Li M, Sun Y (2019) Bitcoin network size estimation based on coupon collection model. In: International conference on artificial intelligence and security. Springer, pp 298–307
Smith JA, Moody J, Morgan JH (2017) Network sampling coverage ii: The effect of non-random missing data on network measurement. Soc Netw 48:78–99. https://doi.org/10.1016/j.socnet.2016.04.005, http://www.sciencedirect.com/science/article/pii/S0378873316301551
Flores H, Hui P, Nurmi P, Lagerspetz E, Tarkoma S, Manner J, Kostakos V, Li Y, Su X (2017) Evidence-aware mobile computational offloading. IEEE Trans Mob Comput 17(8):1834–1850
Article Google Scholar
Lareida A, Stiller B (2018) Big torrent measurement: a country-, network-, and content-centric analysis of video sharing in bittorrent. In: NOMS 2018–2018 IEEE/IFIP network operations and management symposium. IEEE, pp 1–9
Padmanabhan R, Schulman A, Levin D, Spring N (2019) Residential links under the weather. In: Proceedings of the ACM special interest group on data communication, SIGCOMM ’19. ACM, New York, pp 145–158
Xie K, Wang L, Wang X, Xie G, Wen J, Zhang G (2016) Accurate recovery of internet traffic data: A tensor completion approach. In: IEEE INFOCOM 2016 - the 35th annual IEEE international conference on computer communications, pp 1–9
Xie K, Peng C, Wang X, Xie G, Wen J (2017) Accurate recovery of internet traffic data under dynamic measurements. In: IEEE INFOCOM 2017 - the 36th annual IEEE international conference on computer communications, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8057218
Zhou H, Tan L, Zeng Q, Wu C (2016) Traffic matrix estimation: A neural network approach with extended input and expectation maximization iteration. J Netw Comput Appl 60:220–232. https://doi.org/10.1016/j.jnca.2015.11.013
Xie K, Wang L, Wang X, Xie G, Wen J, Zhang G, Cao J, Zhang D (2018) Accurate recovery of internet traffic data: A sequential tensor completion approach. IEEE/ACM Trans Netw 26 (2):793–806
Article Google Scholar
Cheng L, Niu J, Kong L, Luo C, Gu Y, He W, Das SK (2017) Co mpressive sensing based data quality improvement for crowd-sensing applications. J Netw Comput Appl 77:123–134. https://doi.org/10.1016/j.jnca.2016.10.004, http://www.sciencedirect.com/science/article/pii/S1084804516302338
Xie K, Chen Y, Wang X, Xie G, Cao J, Wen J (2020) Accurate and fast recovery of network monitoring data: A gpu accelerated matrix completion. IEEE/ACM Trans Netw:1–14
Xie K, Wang L, Wang X, Xie G, Wen J, Zhang G, Cao J, Zhang D (2019) Accurate recovery of missing network measurement data with localized tensor completion. IEEE/ACM Trans Netw 27 (6):2222–2235
Article Google Scholar
Xie K, Ning X, Wang G, Xie D, Cao J, Xie G, Wen J (2016) Recover corrupted data in sensor networks: a matrix completion solution. IEEE Trans Mob Comput 16:1–1. https://doi.org/10.1109/TMC.2016.2595569
Google Scholar
Wang J, Shen J, Li P, Xu H (2017) Online matrix completion for signed link prediction. In: Proceedings of the tenth acm international conference on web search and data mining, WSDM ’17, association for computing machinery, New York, pp 475–484. https://doi.org/10.1145/3018661.3018681
Izal M, Urvoy-Keller G, Biersack E, Felber P, Al Hamra A, Garcés-Erice L. (2004) Dissecting bittorrent: Five months in a torrent’s lifetime. In: Passive and active network measurement, vol 3015. Springer, Berlin, pp 1–11
Guo L, Chen S, Xiao Z, Tan E, Ding X, Zhang X (2005) Measurements, analysis, and modeling of bittorrent-like systems. In: 5th ACM SIGCOMM conference on internet measurement, IMC ’05. USENIX Association, Berkeley, pp 4–4
Steiner M, En-Najjary T, Biersack EW (2009) Long term study of peer behavior in the kad dht. IEEE/ACM Trans Netw 17(5):1371–1384
Article Google Scholar
Jiang J-Y, Li C-T, Chen Y, Wang W (2018) Identifying users behind shared accounts in online streaming services. In: The 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR ’18. ACM, New York, pp 65–74
Sottocornola G., Symeonidis P., Zanker M. (2018) Session-based news recommendations. In: Companion Proceedings of the The Web Conference 2018, WWW ’18, pp 1395–1399
Zhang C, Dhungel P, Wu D, Ross KW (2011) Unraveling the bittorrent ecosystem. IEEE Trans Parallel Distrib Syst 22(7):1164–1177
Article Google Scholar
Qureshi A, Megías D, Rifà-pous H (2016) PSUM. J Netw Comput Appl 66(C):180–197. https://doi.org/10.1016/j.jnca.2016.03.007
Article Google Scholar
Ojo OE, Iyadi CO, Oluwatope AO, Akinwale AT AyoPeer: The adapted ayo-game for minimizing free riding in peer-assisted network, Peer-to-Peer Networking and Applications. https://doi.org/10.1007/s12083-020-00913-6
Naik AR, Keshavamurthy BN (2020) Next level peer-to-peer overlay networks under high churns: a survey. Peer-to-Peer Netw Appl 13(3):905–931. https://doi.org/10.1007/s12083-019-00839-8
Article Google Scholar
Bhagatkar N, Dolas K, Ghosh RK, Das SK An integrated P2P framework for E-learning, Peer-to-Peer Networking and Applications. https://doi.org/10.1007/s12083-020-00919-0
Nwebonyi FN, Martins R, Correia ME (2019) Reputation based approach for improved fairness and robustness in P2P protocols. Peer-to-Peer Netw Appl 12(4):951–968. https://doi.org/10.1007/s12083-018-0701-x
Article Google Scholar
Kobza J, Jacobson S, Vaughan D (2007) A survey of the coupon collector’s problem with random sample sizes. Methodol Comput Appl Probab 9(4):573–584. https://doi.org/10.1007/s11009-006-9013-3
Article MathSciNet Google Scholar
Van Steen M, Tanenbaum AS (2017) Distributed Systems, 3rd Edn. https://www.distributed-systems.net/
Yao Z, Wang X, Leonard D, Loguinov D Node isolation model and age-based neighbor selection in unstructured p2p networks. IEEE/ACM Trans Netw:17(1)

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

INF-UFRGS, Porto Alegre, Brazil
Weverton Cordeiro & Luciano Gaspary
PPGES-UNIPAMPA, Alegrete, Brazil
Rafael Beltran & Rodrigo Mansilha
UNIPAMPA, Alegrete, Brazil
Kayuã Paim

Authors

Weverton Cordeiro
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Gaspary
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Beltran
View author publications
You can also search for this author in PubMed Google Scholar
Kayuã Paim
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Mansilha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weverton Cordeiro.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A Formal Proof of Theorems 1 and 2

Theorem 1

Let c be the budget required for capturing snapshots from G, and let L(v_i,b_k) be the acquaintance list returned by b_k ∈ B upon request from v_i ∈ V^′. Considering that crawler v_i makes only one acquaintance list request per snapshot, we then have \(s_{t} = \bigcup _{i=1}^{c} L(v_{i}, b_{i}) = V_{t} \iff L_{max} \to \infty \), c ≥|B|, and each bootstrap entity is queried at least once.

Proof

A ⇒ B. Suppose that each user v_i ∈ V is regarded as currently online by a single bootstrap entity b_k ∈ B only. We thus have from Definition 1 that \(\bigcup _{k=1}^{|B|} e(b_{k}) = V_{t}\) and \(\bigcap _{k=1}^{|B|} e(b_{k}) = \emptyset \). From Definition 3, we have \(L(v_{i}, b_{k}) \subseteq e({b_{k}})\). Now consider the restrictions that a crawler v_i ∈ V^′ makes only one acquaintance list request per snapshot. In this case, obtaining a full snapshot of the system (s_t = V_t) requires contacting each of the bootstrap entities in the system, and that L(v_i,b_k) = e(b_k). The condition for L(v_i,b_k) = e(b_k) being true regardless of the size of a bootstrap acquaintance list is that \(L_{max} \to \infty \), whereas the condition for \(\bigcup _{i=1}^{c} L(v_{i}, b_{i}) = V_{t}\) is that there is at least one crawler for each bootstrap entity (i.e., c ≥|B|), and that each bootstrap entity be queried by a crawler at least once.

B ⇒ A. Suppose that \(L_{max} \to \infty \). In this case, L(v_i,b_k) = e(b_k). Suppose also that c = |B| and that each bootstrap entity is queried at least once when building a snapshot set. In this case, we have \(\bigcup _{i=1}^{c} L(v_{i}, b_{i}) = V_{t}\), which also respects the restriction that each crawler makes only one acquaintance list request per snapshot. □

Theorem 2

Let c be the budget required for capturing snapshots from G, and let L(vi′,v_j) be the acquaintance list returned by v_j ∈ V_t upon request from a instrumented client vi′∈ V^′. Considering that client vi′ makes one acquaintance list request per snapshot, we then have \(s_{t} = \bigcup _{i=1}^{c} L(v'_{i}, v_{i}) = V_{t} \iff \bigcup _{j=1}^{|V_{t}|} e({v_{j}}) = V_{t}\), \(L_{max} \to \infty \), c ≥|V_t ∖ V^′|, and each user v_i is queried at least once.

Proof

A ⇒ B. Suppose that each ordinary user v_i ∈ V is regarded as online by at least one user v_j ∈ V (i.e., E_t = V_t × V_t). This situation may lead to several possible configurations in the acquaintance graph; it ranges from one in which the graph forms a single cycle, to another one in which just a single user v_i is aware of all other users v_k in the system, and some other user v_j is aware of v_i as being online. We thus have from Definition 2 that \(\bigcup _{j=1}^{|V_{t}|} e(v_{j}) = V_{t}\) and \(\bigcap _{j=1}^{|V_{t}|} e(v_{j}) = \emptyset \). From Definition 4, we have \(L(v_{i}, v_{j}) \subseteq e({v_{j}})\). Now consider the restrictions that a crawler v_i ∈ V^′ makes only one acquaintance list request per snapshot. In this case, obtaining a full snapshot of the system (s_t = V_t) requires contacting each of the users in the system, and that L(vi′,v_j) = e(v_j). The condition for L(vi′,v_j) = e(v_j) being true regardless of the size of a user’s acquaintance list is that \(L_{max} \to \infty \). The condition for \(\bigcup _{i=1}^{c} L(v'_{i}, v_{i}) = V_{t}\) is that (i) there is at least one crawler for each ordinary user (c ≥|V_t ∖ V^′|), (ii) that each ordinary user be queried by a crawler at least once, and (iii) that every user be regarded as currently online by at least another user in the system (which results in \(\bigcup _{j=1}^{|V_{t}|} e({v_{j}}) = V_{t}\)).

B ⇒ A. Suppose that \(L_{max} \to \infty \). In this case, L(vi′,v_i) = e(v_i). Suppose also that c ≥|V_t ∖ V^′|, that each ordinary user is queried at least once when building a snapshot set, and that every user be regarded as currently online by at least another user in the system (i.e., \(\bigcup _{j=1}^{|V_{t}|} e({v_{j}}) = V_{t}\)). In this case, querying all ordinary users using the budget we have and making the union of the peer lists obtained, we obtain a full snapshot of the system (\(\bigcup _{i=1}^{c} L(v'_{i}, v_{i}) = V_{t}\)). Observe that since we have a crawler for each ordinary user, the restriction that each crawler makes only one acquaintance list request per snapshot can thus be respected. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cordeiro, W., Gaspary, L., Beltran, R. et al. Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems. Peer-to-Peer Netw. Appl. 14, 687–707 (2021). https://doi.org/10.1007/s12083-020-01012-2

Download citation

Received: 23 April 2020
Accepted: 07 October 2020
Published: 13 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s12083-020-01012-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems

Abstract

Access this article

Similar content being viewed by others

Identification and impact of discoverers in online social systems

Community-based time segmentation from network snapshots

Like a Pack of Wolves: Community Structure of Web Trackers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Appendix

1.1 A Formal Proof of Theorems 1 and 2

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Revisiting the coupon collector’s problem to unveil users’ online sessions in networked systems

Abstract

Access this article

Similar content being viewed by others

Identification and impact of discoverers in online social systems

Community-based time segmentation from network snapshots

Like a Pack of Wolves: Community Structure of Web Trackers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Appendix

Appendix

1.1 A Formal Proof of Theorems 1 and 2

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation