The network-untangling problem: from interactions to activity timelines

Rozenshtein, Polina; Tatti, Nikolaj; Gionis, Aristides

doi:10.1007/s10618-020-00717-5

The network-untangling problem: from interactions to activity timelines

Published: 03 October 2020

Volume 35, pages 213–247, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

436 Accesses
5 Citations
Explore all metrics

Abstract

In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge $(u,v,t)\in E$ denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (u, v, t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals for all entities in the network, so as to explain the observed interactions. This problem, the network-untangling problem, can be applied to discover event timelines from complex entity interactions. We provide two formulations of the network-untangling problem: (i) minimizing the total interval length over all entities (sum version), and (ii) minimizing the maximum interval length (max version). We study separately the two problems for $k=1$ and $k>1$ activity intervals per entity. For the case $k=1$, we show that the sum problem is NP-hard, while the max problem can be solved optimally in linear time. For the sum problem we provide efficient algorithms motivated by realistic assumptions. For the case of $k>1$, we show that both formulations are inapproximable. However, we propose efficient algorithms based on alternative optimization. We complement our study with an evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Crossings States and Sets of States in Random Walks

Article 15 February 2023

Vyacheslav M. Abramov

Graph based anomaly detection and description: a survey

Article 05 July 2014

Leman Akoglu, Hanghang Tong & Danai Koutra

Information diffusion modeling and analysis for socially interacting networks

Article 09 January 2021

Pawan Kumar & Adwitiya Sinha

Availability of data and materials

The scripts used to generate the Synthetic and $\textit{Synthetic} _k$ datasets are available at https://github.com/polinapolina/the-network-untangling-problem. The Twitter dataset used for the case study is not publicly available.

Notes

Due to the property of implication graph, either all or none variables will be set in the component.
The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.

References

Aspvall B, Plass MF, Tarjan RE (1979) A linear-time algorithm for testing the truth of certain quantified Boolean formulas. Inf Process Lett 8(3):121–123
Article MathSciNet Google Scholar
Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on twitter, vol 11. Association for the Advancement of Artificial Intelligence, Menlo Park
Google Scholar
Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284
Article Google Scholar
Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 115–130
Bogdanov P, Mongiovì M, Singh AK (2011) Mining heavy subgraphs in time-evolving networks. In: 2011 IEEE 11th international conference on data mining. IEEE, pp 81–90
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, pp 1–10
Chvatal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235
Article MathSciNet Google Scholar
Dakiche N, Tayeb FBS, Slimani Y, Benatchba K (2019) Tracking community evolution in social networks: a survey. Inf Process Manag 56(3):1084–1102
Article Google Scholar
Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, pp 181–192
Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PLoS ONE 9(1):e86028
Article Google Scholar
Guha S, Koudas N, Shim K (2006) Approximation and streaming algorithms for histogram construction problems. ACM Trans Database Syst (TODS) 31(1):396–438
Article Google Scholar
Hartmanis J (1982) Computers and intractability: a guide to the theory of np-completeness (Michael R. Garey and David S. Johnson). SIAM Rev 24(1):90
Article Google Scholar
Hartmann T, Kappes A, Wagner D (2016) Clustering evolving networks. In: Kliemann L (ed) Algorithm engineering. Springer, Berlin, pp 280–329
Chapter Google Scholar
He J, Chen D (2015) A fast algorithm for community detection in temporal network. Physica A Stat Mech Appl 429:87–94
Article Google Scholar
He D, Parker DS (2010) Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452
Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125
Article Google Scholar
Hopcroft JE, Ullman JD (1983) Data structures and algorithms, vol 175. Addison-Wesley, Boston
MATH Google Scholar
Hulovatyy Y, Chen H, Milenković T (2015) Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31(12):i171–i180
Article Google Scholar
Ihler A, Hutchins J, Smyth P (2006) Adaptive event detection with time-varying Poisson processes. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 207–216
Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer, Berlin, pp 85–103
Chapter Google Scholar
Khot S, Regev O (2008) Vertex cover might be hard to approximate to within 2-$\varepsilon $. J Comput Syst Sci 74(3):335–349
Article MathSciNet Google Scholar
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397
Article MathSciNet Google Scholar
Kovanen L, Karsai M, Kaski K, Kertész J, Saramäki J (2013) Temporal motifs. In: Holme P, Saramaki J (eds) Temporal networks. Springer, Berlin, pp 119–133
Chapter Google Scholar
Lahiri M, Berger-Wolf TY (2008) Mining periodic behavior in dynamic social networks. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 373–382
Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 477–486
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506
Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34
Article Google Scholar
Masuda N, Holme P (2019) Detecting sequences of system states in temporal networks. Sci Rep 9(1):1–11
Article Google Scholar
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 1155–1158
McGregor A (2014) Graph stream algorithms: a survey. ACM SIGMOD Rec 43(1):9–20
Article Google Scholar
Meladianos P, Nikolentzos G, Rousseau F, Stavrakas Y, Vazirgiannis M (2015) Degeneracy-based real-time sub-event detection in twitter stream. In: The international AAAI conference on web and social media (ICWSM), vol 15, pp 248–257
Michail O (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Math 12(4):239–280
Article MathSciNet Google Scholar
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Article MathSciNet Google Scholar
Paranjape A, Benson AR, Leskovec J (2017a) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610
Paranjape A, Benson AR, Leskovec J (2017b) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610
Pietilänen AK, Diot C (2012) Dissemination in opportunistic social networks: the role of temporal communities. In: Proceedings of the thirteenth ACM international symposium on mobile ad hoc networking and computing, pp 165–174
Rossetti G, Cazabet R (2018) Community discovery in dynamic networks: a survey. ACM Comput Surv (CSUR) 51(2):1–37
Article Google Scholar
Rozenshtein P, Gionis A (2019) Mining temporal networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3225–3226
Rozenshtein P, Tatti N, Gionis A (2017) The network-untangling problem: from interactions to activity timelines. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 701–716
Shahaf D, Guestrin C, Horvitz E (2012a) Metro maps of science. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1122–1130
Shahaf D, Guestrin C, Horvitz E (2012b) Trains of thought: generating information maps. In: Proceedings of the 21st international conference on world wide web, pp 899–908
Shahaf D, Yang J, Suen C, Jacobs J, Wang H, Leskovec J (2013) Information cartography: creating zoomable, large-scale maps of information. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1097–1105
Shah N, Koutra D, Zou T, Gallagher B, Faloutsos C (2015) Timecrunch: interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1055–1064
Tatti N (2019) Strongly polynomial efficient approximation scheme for segmentation. Inf Process Lett 142:1–8
Article MathSciNet Google Scholar
Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, pp 131–142
Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM (2010) Frequent subgraph discovery in dynamic networks. In: Proceedings of the eighth workshop on mining and learning with graphs, pp 155–162
Weng J, Lee BS (2011) Event detection in twitter. In: The international AAAI conference on web and social media (ICWSM)
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 336–345

Download references

Funding

This work was supported by three Academy of Finland Projects (286211, 313927, 317085), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, AutonomousSystems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Institute of Data Science, National University of Singapore, Singapore, Singapore
Polina Rozenshtein
Helsinki Institute for Information Technology, Aalto University, Espoo, Finland
Polina Rozenshtein & Aristides Gionis
Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
Nikolaj Tatti
F-Secure Corp, Helsinki, Finland
Nikolaj Tatti
KTH Royal Institute of Technology, Stockholm, Sweden
Aristides Gionis

Authors

Polina Rozenshtein
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaj Tatti
View author publications
You can also search for this author in PubMed Google Scholar
Aristides Gionis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Polina Rozenshtein.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicting/competing interests.

Code availability

The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.

Additional information

Responsible editor: Tina Eliassi-Rad, Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proofs regarding computational complexity

Proof of Proposition 2

To prove the result we provide a reduction from VertexCover. Assume that we are given a graph H with n vertices and an integer k. We construct a temporal network G as follows: We place H at time stamp 0. We then add k fully-connected graphs $C_1, \ldots , C_k$ with n vertices at time stamps $1, \ldots , k$. Figure 14 shows an example.

We claim that G has k-interval cover with zero cost if and only if H can be covered with k vertices. This shows that solving whether there is a zero-cost solution for either $k$-MinTimeline $_{+}$ or $k$-MinTimeline $_\infty $ is NP-complete, which also automatically implies that these problems are inapproximable.

To prove the claim, first assume that H can be covered with k vertices, say $w_1, \ldots , w_k$. Construct a k-activity timeline by first covering $w_1, \ldots , w_k$ at time stamp 0 with zero-length intervals. Similarly, cover every vertex v at every time stamp $t = 1, \ldots , k$ with a zero-length intervals, except if $v = w_t$. Figure 14 shows an example. By definition H is covered, and $n - 1$ vertices in each $C_i$ are also covered. Consequently, the timeline covers G. Since each vertex uses exactly k intervals, we have proven the first direction.

To prove the other direction, assume that there is a zero-solution for G. Since each $C_i$ is fully-connected, we must have at least $n - 1$ vertices covered in each $C_i$. This means that we have at most k spare intervals to cover H. The vertices that these intervals cover form a k-vertex cover, proving the claim. $\square $

Proof of Proposition 3

We will prove the hardness by reducing VertexCover to MinTimeline $_{+}$. Assume that we are given a (static) network $H = (W, A)$ with n vertices $W = \{ w_1, \ldots , w_n \}$ and a budget $\ell $. In the VertexCover problem we are asked to decide whether there exists a subset $U \subseteq W$ of at most $\ell $ vertices ($|U|\le \ell $) covering all edges in A.

We map an instance of VertexCover to an instance of MinTimeline $_{+}$ by creating a temporal network $G = (V, E)$, as follows. The vertices V consist of 2n vertices: for each $w_i \in W$, we add vertices $v_i$ and $u_i$. The edges are as follows: For each edge $(w_i, w_j) \in A$, we add a temporal edge $(v_i, v_j, 0)$ to E. For each vertex $w_i \in W$, we add two temporal edges $(v_i, u_i, 1)$ and $(v_i, u_i, n + 2)$ to E. Figure 15 shows an example.

Let $\mathcal {T}^*$ be an optimal timeline covering G. We claim that $ S \mathopen {}\left( \mathcal {T}^*\right) \le \ell $ if and only if there is a vertex cover of H with $\ell $ vertices.

To prove the if direction, consider a vertex cover of H, say U, with $\ell $ vertices. Consider the following coverage: cover each $u_i$ at $n + 2$, and each $v_i$ at 1. For each $w_i \in U$, cover $v_i$ at 0. Figure 15 shows an example. Now every vertex $u_i$ has a 0-length activity interval $[n+2, n+2]$. Vertices $v_i$, which correspond to $w_i \not \in U$, also have 0-length activity intervals [1, 1]. Only vertices $v_i$, which correspond to $w_i \in U$, have 1-length activity intervals [0, 1]. All vertices in V have activity intervals of total length $\ell $. Every interaction in G belongs to some activity interval: by construction, interactions at $t=n+2$ are spanned by activity intervals of $\{u_i\}$, interactions at $t=1$ are spanned by activity intervals of $\{v_i\}$, and interactions at $t=0$ are spanned by activity intervals $\ell $ of vertices from $\{v_i\}$. The resulting intervals are indeed forming a timeline with a total span of $\ell $.

To prove the other direction, first note that if we cover each $v_i$ by an interval [0, 1] and each $u_i$ by an interval $[n + 2, n + 2]$, then this yields a timeline $\mathcal {T}$ covering G. The total span of intervals in $\mathcal {T}$ is n. Thus, $ S \mathopen {}\left( \mathcal {T}^*\right) \le S \mathopen {}\left( \mathcal {T}\right) = n$. This guarantees that if $0 \in I_{v_i}$, then $n + 2 \notin I_{v_i}$, so $n + 2 \in I_{u_i}$. Otherwise $\mathcal {T}^*$ contains an $(n+2)$-length interval, this contradicts to $ S \mathopen {}\left( \mathcal {T}^*\right) \le n$. By the same argument, $1 \notin I_{u_i}$ and so $1 \in I_{v_i}$. In summary, if $0 \in I_{v_i}$, then $ \sigma \mathopen {}\left( I_{v_i}\right) = 1$. This implies that if $ S \mathopen {}\left( \mathcal {T}^*\right) \le \ell $, then we have at most $\ell $ active vertices at 0. Let U be the set of those vertices. Since $\mathcal {T}^*$ is a timeline covering G, then U is a vertex cover for H. $\square $

Proofs regarding Maximal

Proof of Proposition 4

We first show that a maximal dual solution is a feasible timeline. Let $e_i = (u, v, t)$ be a temporal edge. If $ p \mathopen {}\left( i; v\right) \cap X_v = \emptyset $ and $ p \mathopen {}\left( i; u\right) \cap X_u = \emptyset $, then we can increase the value of $\alpha _i$ without violating the dual constraints, so the solution is not maximal. Thus $t \in I_{v} \cup I_{u}$, making $\mathcal {T}$ a feasible timeline.

Next we show that the resulting solution $\mathcal {T}$ is a 2-approximation to Coalesce. Write $x_v = \min (X_v)$ and $y_v = \max (X_v)$. Let $\mathcal {T}^*$ be the optimal solution. Then

$$\begin{aligned} \begin{aligned} S \mathopen {}\left( \mathcal {T}\right)&= \sum _{v \in V} | t \mathopen {}\left( x_v\right) - m_v| + | t \mathopen {}\left( y_v\right) - m_v| \\&= \sum _{v \in V} h \mathopen {}\left( v; x_v\right) + h \mathopen {}\left( v; y_v\right) \\&\le \sum _{v \in V} \sum _{e \in E \mathopen {}\left( v\right) } \alpha _e = 2\sum _{e \in E} \alpha _e \le 2 S \mathopen {}\left( \mathcal {T}^*\right) , \\ \end{aligned} \end{aligned}$$

where the second equality follows from the definition of $X_v$, the first inequality follows from the fact that $\alpha _e \ge 0$, and the last inequality follows from primal-dual theory.$\square $

Proof of Lemma 1

We will prove this result by showing that $\alpha _j \le z(v)$ if and only if all dual constraints related to v are valid. Since the same holds also for u the lemma follows. We consider two cases.

First case: $ t \mathopen {}\left( j\right) < m_v$. In this case we have

$$\begin{aligned} z(v) = \min \left\{ m_w - t, b[w]\right\} = \min _{i \le j} \left\{ t \mathopen {}\left( i\right) - m_v - h \mathopen {}\left( v; i\right) \right\} , \end{aligned}$$

before increasing $\alpha _j$. This guarantees that if $\alpha _j \le z(v)$, then $ h \mathopen {}\left( v; i\right) \le | t \mathopen {}\left( i\right) - m_v|$, for every $i \le j$. Moreover, when $\alpha _j = z(v)$ one of these constraints becomes tight. Since these are the only constraints containing $\alpha _j$, we have proven the first case.

Second case: $ t \mathopen {}\left( j\right) \ge m_v$. If $i < j$, the sum $ h \mathopen {}\left( v; i\right) $ does not contain $\alpha _j$, so the corresponding constraint remains valid. If $i \ge j$, then the corresponding constraint is valid if and only if $ h \mathopen {}\left( v; j\right) \le | t \mathopen {}\left( j\right) - m_v|$. This is because $\alpha _\ell = 0$ for all $\ell > j$. But $z(v) = t - m_v - a[v]$ corresponds exactly to the amount we can increase $\alpha _j$ so that $ h \mathopen {}\left( v; j\right) = |t - m_v|$.$\square $

Proofs regarding k-Maximal

Proof of Proposition 5

Let us first prove the feasibility of $\mathcal {T}$, that is, show that every edge is covered. Let $e_i = (u, v, t)$ be an edge, and let $\ell $ be an index such that $i \in S_{v\ell }$.

At least, one of the four cases of left-maximality must hold. Assume Case (i), that is, $ rh \mathopen {}\left( v; j\right) = \eta _{vj}$ for some $j \in lp \mathopen {}\left( i; v\right) $. Then $ t \mathopen {}\left( j\right) \in X_{v\ell }$ and $x_{v\ell } \le t \mathopen {}\left( i\right) \le m_{v\ell }$, so $e_i$ is covered. Case (ii) is similar.

Assume that Cases (i) and (ii) do not hold. Then Case (iii) or Case (iv) holds. Assume Case (iii). If $x_{v\ell } \le t \mathopen {}\left( i\right) $, then $e_i$ is covered. Assume that $ t \mathopen {}\left( i\right) < x_{v\ell }$. Then, by definition, $ t \mathopen {}\left( i\right) \in Y_{v\ell }$. Thus $m_{v(\ell - 1)} < t \mathopen {}\left( i\right) \le y_{v\ell }$, so $e_i$ is covered. Case (iv) is similar.

We have shown that $\mathcal {T}$ is feasible. Next we show that the resulting solution $\mathcal {T}$ is a 2-approximation to k-Coalesce. Let $\mathcal {T}^*$ be the optimal solution. Then

$$\begin{aligned} \begin{aligned} S \mathopen {}\left( \mathcal {T}\right)&= \sum _{\ell = 1}^k \sum _{v \in V} | t \mathopen {}\left( x_{v\ell }\right) - m_{v\ell }| + | t \mathopen {}\left( y_{v(\ell + 1)}\right) - m_{v\ell }| \\&= \sum _{\ell = 1}^k \sum _{v \in V} rh \mathopen {}\left( v; x_{v\ell }\right) + lh \mathopen {}\left( v; y_{v\ell }\right) \\&\le \sum _{v \in V} \sum _{e \in E \mathopen {}\left( v\right) } \alpha _e = 2\sum _{e \in E} \alpha _e \le 2 S \mathopen {}\left( \mathcal {T}^*\right) , \\ \end{aligned} \end{aligned}$$

where the second equality follows from the definition of $X_{vi}$ and $Y_{vi}$, the first inequality follows from the fact that $\alpha _e \ge 0$ and the intervals in $\mathcal {T}$ do not intersect, and the last inequality follows from primal-dual theory.

$\square $

We will prove Proposition 6 in a sequence of lemmas.

Let us write $a_i[v, \ell ]$ and $b_i[v, \ell ]$ to be the values of these counters at the beginning of the ith iteration. We maintain the following invariants. The first counter, $a_{i + 1}[v, \ell ]$ matches $ lh \mathopen {}\left( v; i\right) $. The second counter, $b_i[v, \ell ]$ is how much we can afford to increase $\alpha _i$ without violating $ rh \mathopen {}\left( v; s\right) \le \eta _{vs}$, where $s \le i$. This is formalized in the next lemma.

Lemma 2

Let v be a vertex and $\ell = 1, \ldots , k + 1$ be an integer. Shorten $S = S_{v\ell }$ and write $f(j, i) = \sum _{o \in S, j \le o \le i} \alpha _o$.

Then for any $i \in S$,

$$\begin{aligned} a_{i + 1}[v, \ell ] = lh \mathopen {}\left( v; i\right) \end{aligned}$$

(2)

and

$$\begin{aligned} b_{i + 1}[v, \ell ] = \min _{j \in S, j \le i} \left( \eta _{vj} - f(j, i) \right) . \end{aligned}$$

(3)

Proof

We prove this claim by induction. Let i be the first index in S. Then $a_i[v, \ell ] = 0$ and $a_{i + 1}[v, \ell ] = \alpha _i$, and $b_i[v, \ell ] = \infty $ and $b_{i + 1}[v, \ell ] = \eta _{vi} - \alpha _{vi}$.

Assume that i is not the first index in S, and let $j \in S$ be the previous index. Since $a_i[v, \ell ] = a_{j + 1}[v, \ell ]$, we have

$$\begin{aligned} a_{i + 1}[v, \ell ] = a_{i}[v, \ell ] + \alpha _i = lh \mathopen {}\left( v; j\right) + \alpha _i = lh \mathopen {}\left( v; i\right) . \end{aligned}$$

Also, $b_i[v, \ell ] = b_{j + 1}[v, \ell ]$, and

$$\begin{aligned} \begin{aligned} b_{i + 1}[v, \ell ]&= \min ( b_i[v, \ell ] - \alpha _i, \eta _{vi} - \alpha _i ) \\&= \min \left( \min _{s \in S, s \le j} \left( \eta _{vs} - f(s, i)\right) , \eta _{vi} - \alpha _i \right) \\&= \min _{s \in S, s \le i} \left( \eta _{vs} - f(s, i)\right) , \end{aligned} \end{aligned}$$

proving the lemma.$\square $

Our next step is to prove the feasibility of the output of k-Maximal. In order to do that, we first bound the counters.

Lemma 3

For each vertex v, index $\ell = 1, \ldots , k$, and $i \in S_{v\ell }$,

$$\begin{aligned} a_{i + 1}[v, \ell ] \le \theta _{vi} \end{aligned}$$

(4)

and

$$\begin{aligned} b_{i + 1}[v, \ell ] \ge 0. \end{aligned}$$

(5)

Proof

Since, $\alpha _i \le \theta _{vi} - a_i[v, \ell ]$ we have

$$\begin{aligned} a_{i + 1}[v, \ell ] = a_i[v, \ell ] + \alpha _i \le \theta _{vi}. \end{aligned}$$

Also since, $\alpha _i \le \min (b_i[v, \ell ], \eta _{vi})$ we have

$$\begin{aligned} b_{i + 1}[v, \ell ] = \min (b_{i}[v, \ell ], \eta _{vi}) - \alpha _i \ge 0. \end{aligned}$$

This proves the claim.$\square $

To prove the feasibility, we first show that $\alpha _i \ge 0$.

Lemma 4

$\alpha _i \ge 0$, for all i.

Proof

Let $(u, v, t) = e_i$, and let $\ell $ such that $i \in S_{v\ell }$. Let $z_1$, and $z_2$ be as defined by the algorithm in the ith round.

If i is the first index in $S_{v\ell }$, then $a_i[v, \ell ] = 0$ and $b_i[v, \ell ] = \infty $. Then $z_1 \ge 0$, and similarly $z_2 \ge 0$. Consequently, $\alpha _i \ge 0$.

Assume that i is not the first index in $S_{v\ell }$, and let j be the previous edge index. Then Eq. 4 implies

$$\begin{aligned} a_{i}[v, \ell ] = a_{j + 1}[v, \ell ] \le \theta _{vj} \le \theta _{vi}. \end{aligned}$$

In addition, Eq. 5 implies $b_i[v, \ell ] \ge 0$, so $z_1 \ge 0$. Similarly, $z_2 \ge 0$. Consequently, $\alpha _i \ge 0$.$\square $

Next lemma shows that $\left\{ \alpha _i\right\} $ satisfies the constraints, making the dual solution feasible.

Lemma 5

Let $v \in V$, $\ell = 1, \ldots , k + 1$, and $i \in S_{v\ell }$. Then $ rh \mathopen {}\left( v; i\right) \le \eta _{vi}$ and $ lh \mathopen {}\left( v; i\right) \le \theta _{vi}$.

Proof

Equations 2 and 4 give us

$$\begin{aligned} lh \mathopen {}\left( v; i\right) = a_{i + 1}[v, \ell ] \le \theta _{vi}. \end{aligned}$$

Moreover, $ lh \mathopen {}\left( v; i\right) $ remains constant in the later rounds.

Let j be the last index in $S_{v\ell }$. Then Eqs. 3 and 5 state that

$$\begin{aligned} 0 \le b_{j + 1}[v, \ell ] \le \eta _{vi} - rh \mathopen {}\left( v; i\right) . \end{aligned}$$

Moreover, the sum $ rh \mathopen {}\left( v; i\right) $ remains constant in the later rounds. Thus, $\left\{ \alpha _i\right\} $ is a feasible dual solution.$\square $

Proof

Let $e_i = (u, v, t)$ be an edge, and let $\ell $ and r be the indices such that $i \in S_{v\ell }$ and $i \in S_{ur}$. We need to show that $e_i$ is left-maximal.

If $b_{i + 1}[v, \ell ] = 0$, then Eq. 3 states that there is $j \in X_{v\ell }$ with $j \le i$ such that $\eta _{vj} = rh \mathopen {}\left( v; j\right) $. This is the definition of Case (i) of left-maximality. Similarly, $b_{i + 1}[u, r] = 0$ leads to Case (ii).

Assume $b_{i + 1}[v, \ell ] > 0$ and $b_{i + 1}[u, r] > 0$. This can only happen if

$$\begin{aligned} \alpha _e< \min (b_i[v, \ell ], \eta _{vi}) \quad \text {and}\quad \alpha _e < \min (b_i[u, r], \eta _{ui}). \end{aligned}$$

Consequently, $\alpha _i = \theta _{vi} - a_i[v, \ell ]$ or $\alpha _i = \theta _{ui} - a_i[u, r]$. If the former, then Eq. 2 states $ lh \mathopen {}\left( v; i\right) = a_{i + 1}[v, \ell ] = \theta _{vi}$, leading to Case (iii). Similarly, the latter case leads to Case (iv). Thus, $e_i$ is left-maximal.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rozenshtein, P., Tatti, N. & Gionis, A. The network-untangling problem: from interactions to activity timelines. Data Min Knowl Disc 35, 213–247 (2021). https://doi.org/10.1007/s10618-020-00717-5

Download citation

Received: 26 October 2018
Accepted: 16 September 2020
Published: 03 October 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10618-020-00717-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The network-untangling problem: from interactions to activity timelines

Abstract

Access this article

Similar content being viewed by others

Crossings States and Sets of States in Random Walks

Graph based anomaly detection and description: a survey

Information diffusion modeling and analysis for socially interacting networks

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Appendices

Proofs regarding computational complexity

Proof of Proposition 2

Proof of Proposition 3

Proofs regarding Maximal

Proof of Proposition 4

Proof of Lemma 1

Proofs regarding k-Maximal

Proof of Proposition 5

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation