Skip to main content
Log in

The network-untangling problem: from interactions to activity timelines

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge \((u,v,t)\in E\) denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (uvt) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals for all entities in the network, so as to explain the observed interactions. This problem, the network-untangling problem, can be applied to discover event timelines from complex entity interactions. We provide two formulations of the network-untangling problem: (i) minimizing the total interval length over all entities (sum version), and (ii) minimizing the maximum interval length (max version). We study separately the two problems for \(k=1\) and \(k>1\) activity intervals per entity. For the case \(k=1\), we show that the sum problem is NP-hard, while the max problem can be solved optimally in linear time. For the sum problem we provide efficient algorithms motivated by realistic assumptions. For the case of \(k>1\), we show that both formulations are inapproximable. However, we propose efficient algorithms based on alternative optimization. We complement our study with an evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

The scripts used to generate the Synthetic and \(\textit{Synthetic} _k\) datasets are available at https://github.com/polinapolina/the-network-untangling-problem. The Twitter dataset used for the case study is not publicly available.

Notes

  1. Due to the property of implication graph, either all or none variables will be set in the component.

  2. The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.

References

  • Aspvall B, Plass MF, Tarjan RE (1979) A linear-time algorithm for testing the truth of certain quantified Boolean formulas. Inf Process Lett 8(3):121–123

    Article  MathSciNet  Google Scholar 

  • Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on twitter, vol 11. Association for the Advancement of Artificial Intelligence, Menlo Park

    Google Scholar 

  • Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284

    Article  Google Scholar 

  • Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 115–130

  • Bogdanov P, Mongiovì M, Singh AK (2011) Mining heavy subgraphs in time-evolving networks. In: 2011 IEEE 11th international conference on data mining. IEEE, pp 81–90

  • Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, pp 1–10

  • Chvatal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235

    Article  MathSciNet  Google Scholar 

  • Dakiche N, Tayeb FBS, Slimani Y, Benatchba K (2019) Tracking community evolution in social networks: a survey. Inf Process Manag 56(3):1084–1102

    Article  Google Scholar 

  • Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, pp 181–192

  • Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PLoS ONE 9(1):e86028

    Article  Google Scholar 

  • Guha S, Koudas N, Shim K (2006) Approximation and streaming algorithms for histogram construction problems. ACM Trans Database Syst (TODS) 31(1):396–438

    Article  Google Scholar 

  • Hartmanis J (1982) Computers and intractability: a guide to the theory of np-completeness (Michael R. Garey and David S. Johnson). SIAM Rev 24(1):90

    Article  Google Scholar 

  • Hartmann T, Kappes A, Wagner D (2016) Clustering evolving networks. In: Kliemann L (ed) Algorithm engineering. Springer, Berlin, pp 280–329

    Chapter  Google Scholar 

  • He J, Chen D (2015) A fast algorithm for community detection in temporal network. Physica A Stat Mech Appl 429:87–94

    Article  Google Scholar 

  • He D, Parker DS (2010) Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452

  • Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125

    Article  Google Scholar 

  • Hopcroft JE, Ullman JD (1983) Data structures and algorithms, vol 175. Addison-Wesley, Boston

    MATH  Google Scholar 

  • Hulovatyy Y, Chen H, Milenković T (2015) Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31(12):i171–i180

    Article  Google Scholar 

  • Ihler A, Hutchins J, Smyth P (2006) Adaptive event detection with time-varying Poisson processes. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 207–216

  • Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer, Berlin, pp 85–103

    Chapter  Google Scholar 

  • Khot S, Regev O (2008) Vertex cover might be hard to approximate to within 2-\(\varepsilon \). J Comput Syst Sci 74(3):335–349

    Article  MathSciNet  Google Scholar 

  • Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397

    Article  MathSciNet  Google Scholar 

  • Kovanen L, Karsai M, Kaski K, Kertész J, Saramäki J (2013) Temporal motifs. In: Holme P, Saramaki J (eds) Temporal networks. Springer, Berlin, pp 119–133

    Chapter  Google Scholar 

  • Lahiri M, Berger-Wolf TY (2008) Mining periodic behavior in dynamic social networks. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 373–382

  • Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 477–486

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506

  • Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34

    Article  Google Scholar 

  • Masuda N, Holme P (2019) Detecting sequences of system states in temporal networks. Sci Rep 9(1):1–11

    Article  Google Scholar 

  • Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 1155–1158

  • McGregor A (2014) Graph stream algorithms: a survey. ACM SIGMOD Rec 43(1):9–20

    Article  Google Scholar 

  • Meladianos P, Nikolentzos G, Rousseau F, Stavrakas Y, Vazirgiannis M (2015) Degeneracy-based real-time sub-event detection in twitter stream. In: The international AAAI conference on web and social media (ICWSM), vol 15, pp 248–257

  • Michail O (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Math 12(4):239–280

    Article  MathSciNet  Google Scholar 

  • Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    Article  MathSciNet  Google Scholar 

  • Paranjape A, Benson AR, Leskovec J (2017a) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610

  • Paranjape A, Benson AR, Leskovec J (2017b) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610

  • Pietilänen AK, Diot C (2012) Dissemination in opportunistic social networks: the role of temporal communities. In: Proceedings of the thirteenth ACM international symposium on mobile ad hoc networking and computing, pp 165–174

  • Rossetti G, Cazabet R (2018) Community discovery in dynamic networks: a survey. ACM Comput Surv (CSUR) 51(2):1–37

    Article  Google Scholar 

  • Rozenshtein P, Gionis A (2019) Mining temporal networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3225–3226

  • Rozenshtein P, Tatti N, Gionis A (2017) The network-untangling problem: from interactions to activity timelines. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 701–716

  • Shahaf D, Guestrin C, Horvitz E (2012a) Metro maps of science. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1122–1130

  • Shahaf D, Guestrin C, Horvitz E (2012b) Trains of thought: generating information maps. In: Proceedings of the 21st international conference on world wide web, pp 899–908

  • Shahaf D, Yang J, Suen C, Jacobs J, Wang H, Leskovec J (2013) Information cartography: creating zoomable, large-scale maps of information. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1097–1105

  • Shah N, Koutra D, Zou T, Gallagher B, Faloutsos C (2015) Timecrunch: interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1055–1064

  • Tatti N (2019) Strongly polynomial efficient approximation scheme for segmentation. Inf Process Lett 142:1–8

    Article  MathSciNet  Google Scholar 

  • Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, pp 131–142

  • Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM (2010) Frequent subgraph discovery in dynamic networks. In: Proceedings of the eighth workshop on mining and learning with graphs, pp 155–162

  • Weng J, Lee BS (2011) Event detection in twitter. In: The international AAAI conference on web and social media (ICWSM)

  • Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186

  • Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 336–345

Download references

Funding

This work was supported by three Academy of Finland Projects (286211, 313927, 317085), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, AutonomousSystems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Polina Rozenshtein.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicting/competing interests.

Code availability

The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.

Additional information

Responsible editor: Tina Eliassi-Rad, Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proofs regarding computational complexity

Proof of Proposition 2

To prove the result we provide a reduction from VertexCover. Assume that we are given a graph H with n vertices and an integer k. We construct a temporal network G as follows: We place H at time stamp 0. We then add k fully-connected graphs \(C_1, \ldots , C_k\) with n vertices at time stamps \(1, \ldots , k\). Figure 14 shows an example.

We claim that G has k-interval cover with zero cost if and only if H can be covered with k vertices. This shows that solving whether there is a zero-cost solution for either \(k\)-MinTimeline \(_{+}\) or \(k\)-MinTimeline \(_\infty \) is NP-complete, which also automatically implies that these problems are inapproximable.

To prove the claim, first assume that H can be covered with k vertices, say \(w_1, \ldots , w_k\). Construct a k-activity timeline by first covering \(w_1, \ldots , w_k\) at time stamp 0 with zero-length intervals. Similarly, cover every vertex v at every time stamp \(t = 1, \ldots , k\) with a zero-length intervals, except if \(v = w_t\). Figure 14 shows an example. By definition H is covered, and \(n - 1\) vertices in each \(C_i\) are also covered. Consequently, the timeline covers G. Since each vertex uses exactly k intervals, we have proven the first direction.

To prove the other direction, assume that there is a zero-solution for G. Since each \(C_i\) is fully-connected, we must have at least \(n - 1\) vertices covered in each \(C_i\). This means that we have at most k spare intervals to cover H. The vertices that these intervals cover form a k-vertex cover, proving the claim. \(\square \)

Fig. 14
figure 14

Example of a temporal network used in proof of Proposition 2. Circled nodes are active at the corresponding time points

Proof of Proposition 3

We will prove the hardness by reducing VertexCover to MinTimeline \(_{+}\). Assume that we are given a (static) network \(H = (W, A)\) with n vertices \(W = \{ w_1, \ldots , w_n \}\) and a budget \(\ell \). In the VertexCover problem we are asked to decide whether there exists a subset \(U \subseteq W\) of at most \(\ell \) vertices (\(|U|\le \ell \)) covering all edges in A.

We map an instance of VertexCover to an instance of MinTimeline \(_{+}\) by creating a temporal network \(G = (V, E)\), as follows. The vertices V consist of 2n vertices: for each \(w_i \in W\), we add vertices \(v_i\) and \(u_i\). The edges are as follows: For each edge \((w_i, w_j) \in A\), we add a temporal edge \((v_i, v_j, 0)\) to E. For each vertex \(w_i \in W\), we add two temporal edges \((v_i, u_i, 1)\) and \((v_i, u_i, n + 2)\) to E. Figure 15 shows an example.

Let \(\mathcal {T}^*\) be an optimal timeline covering G. We claim that \( S \mathopen {}\left( \mathcal {T}^*\right) \le \ell \) if and only if there is a vertex cover of H with \(\ell \) vertices.

To prove the if direction, consider a vertex cover of H, say U, with \(\ell \) vertices. Consider the following coverage: cover each \(u_i\) at \(n + 2\), and each \(v_i\) at 1. For each \(w_i \in U\), cover \(v_i\) at 0. Figure 15 shows an example. Now every vertex \(u_i\) has a 0-length activity interval \([n+2, n+2]\). Vertices \(v_i\), which correspond to \(w_i \not \in U\), also have 0-length activity intervals [1, 1]. Only vertices \(v_i\), which correspond to \(w_i \in U\), have 1-length activity intervals [0, 1]. All vertices in V have activity intervals of total length \(\ell \). Every interaction in G belongs to some activity interval: by construction, interactions at \(t=n+2\) are spanned by activity intervals of \(\{u_i\}\), interactions at \(t=1\) are spanned by activity intervals of \(\{v_i\}\), and interactions at \(t=0\) are spanned by activity intervals \(\ell \) of vertices from \(\{v_i\}\). The resulting intervals are indeed forming a timeline with a total span of \(\ell \).

To prove the other direction, first note that if we cover each \(v_i\) by an interval [0, 1] and each \(u_i\) by an interval \([n + 2, n + 2]\), then this yields a timeline \(\mathcal {T}\) covering G. The total span of intervals in \(\mathcal {T}\) is n. Thus, \( S \mathopen {}\left( \mathcal {T}^*\right) \le S \mathopen {}\left( \mathcal {T}\right) = n\). This guarantees that if \(0 \in I_{v_i}\), then \(n + 2 \notin I_{v_i}\), so \(n + 2 \in I_{u_i}\). Otherwise \(\mathcal {T}^*\) contains an \((n+2)\)-length interval, this contradicts to \( S \mathopen {}\left( \mathcal {T}^*\right) \le n\). By the same argument, \(1 \notin I_{u_i}\) and so \(1 \in I_{v_i}\). In summary, if \(0 \in I_{v_i}\), then \( \sigma \mathopen {}\left( I_{v_i}\right) = 1\). This implies that if \( S \mathopen {}\left( \mathcal {T}^*\right) \le \ell \), then we have at most \(\ell \) active vertices at 0. Let U be the set of those vertices. Since \(\mathcal {T}^*\) is a timeline covering G, then U is a vertex cover for H. \(\square \)

Fig. 15
figure 15

Example of a temporal network used in proof of Proposition 3. Circled nodes are active at the corresponding time points

Proofs regarding Maximal

Proof of Proposition 4

We first show that a maximal dual solution is a feasible timeline. Let \(e_i = (u, v, t)\) be a temporal edge. If \( p \mathopen {}\left( i; v\right) \cap X_v = \emptyset \) and \( p \mathopen {}\left( i; u\right) \cap X_u = \emptyset \), then we can increase the value of \(\alpha _i\) without violating the dual constraints, so the solution is not maximal. Thus \(t \in I_{v} \cup I_{u}\), making \(\mathcal {T}\) a feasible timeline.

Next we show that the resulting solution \(\mathcal {T}\) is a 2-approximation to Coalesce. Write \(x_v = \min (X_v)\) and \(y_v = \max (X_v)\). Let \(\mathcal {T}^*\) be the optimal solution. Then

$$\begin{aligned} \begin{aligned} S \mathopen {}\left( \mathcal {T}\right)&= \sum _{v \in V} | t \mathopen {}\left( x_v\right) - m_v| + | t \mathopen {}\left( y_v\right) - m_v| \\&= \sum _{v \in V} h \mathopen {}\left( v; x_v\right) + h \mathopen {}\left( v; y_v\right) \\&\le \sum _{v \in V} \sum _{e \in E \mathopen {}\left( v\right) } \alpha _e = 2\sum _{e \in E} \alpha _e \le 2 S \mathopen {}\left( \mathcal {T}^*\right) , \\ \end{aligned} \end{aligned}$$

where the second equality follows from the definition of \(X_v\), the first inequality follows from the fact that \(\alpha _e \ge 0\), and the last inequality follows from primal-dual theory.\(\square \)

Proof of Lemma 1

We will prove this result by showing that \(\alpha _j \le z(v)\) if and only if all dual constraints related to v are valid. Since the same holds also for u the lemma follows. We consider two cases.

First case: \( t \mathopen {}\left( j\right) < m_v\). In this case we have

$$\begin{aligned} z(v) = \min \left\{ m_w - t, b[w]\right\} = \min _{i \le j} \left\{ t \mathopen {}\left( i\right) - m_v - h \mathopen {}\left( v; i\right) \right\} , \end{aligned}$$

before increasing \(\alpha _j\). This guarantees that if \(\alpha _j \le z(v)\), then \( h \mathopen {}\left( v; i\right) \le | t \mathopen {}\left( i\right) - m_v|\), for every \(i \le j\). Moreover, when \(\alpha _j = z(v)\) one of these constraints becomes tight. Since these are the only constraints containing \(\alpha _j\), we have proven the first case.

Second case: \( t \mathopen {}\left( j\right) \ge m_v\). If \(i < j\), the sum \( h \mathopen {}\left( v; i\right) \) does not contain \(\alpha _j\), so the corresponding constraint remains valid. If \(i \ge j\), then the corresponding constraint is valid if and only if \( h \mathopen {}\left( v; j\right) \le | t \mathopen {}\left( j\right) - m_v|\). This is because \(\alpha _\ell = 0\) for all \(\ell > j\). But \(z(v) = t - m_v - a[v]\) corresponds exactly to the amount we can increase \(\alpha _j\) so that \( h \mathopen {}\left( v; j\right) = |t - m_v|\).\(\square \)

Proofs regarding k-Maximal

Proof of Proposition 5

Let us first prove the feasibility of \(\mathcal {T}\), that is, show that every edge is covered. Let \(e_i = (u, v, t)\) be an edge, and let \(\ell \) be an index such that \(i \in S_{v\ell }\).

At least, one of the four cases of left-maximality must hold. Assume Case (i), that is, \( rh \mathopen {}\left( v; j\right) = \eta _{vj}\) for some \(j \in lp \mathopen {}\left( i; v\right) \). Then \( t \mathopen {}\left( j\right) \in X_{v\ell }\) and \(x_{v\ell } \le t \mathopen {}\left( i\right) \le m_{v\ell }\), so \(e_i\) is covered. Case (ii) is similar.

Assume that Cases (i) and (ii) do not hold. Then Case (iii) or Case (iv) holds. Assume Case (iii). If \(x_{v\ell } \le t \mathopen {}\left( i\right) \), then \(e_i\) is covered. Assume that \( t \mathopen {}\left( i\right) < x_{v\ell }\). Then, by definition, \( t \mathopen {}\left( i\right) \in Y_{v\ell }\). Thus \(m_{v(\ell - 1)} < t \mathopen {}\left( i\right) \le y_{v\ell }\), so \(e_i\) is covered. Case (iv) is similar.

We have shown that \(\mathcal {T}\) is feasible. Next we show that the resulting solution \(\mathcal {T}\) is a 2-approximation to k-Coalesce. Let \(\mathcal {T}^*\) be the optimal solution. Then

$$\begin{aligned} \begin{aligned} S \mathopen {}\left( \mathcal {T}\right)&= \sum _{\ell = 1}^k \sum _{v \in V} | t \mathopen {}\left( x_{v\ell }\right) - m_{v\ell }| + | t \mathopen {}\left( y_{v(\ell + 1)}\right) - m_{v\ell }| \\&= \sum _{\ell = 1}^k \sum _{v \in V} rh \mathopen {}\left( v; x_{v\ell }\right) + lh \mathopen {}\left( v; y_{v\ell }\right) \\&\le \sum _{v \in V} \sum _{e \in E \mathopen {}\left( v\right) } \alpha _e = 2\sum _{e \in E} \alpha _e \le 2 S \mathopen {}\left( \mathcal {T}^*\right) , \\ \end{aligned} \end{aligned}$$

where the second equality follows from the definition of \(X_{vi}\) and \(Y_{vi}\), the first inequality follows from the fact that \(\alpha _e \ge 0\) and the intervals in \(\mathcal {T}\) do not intersect, and the last inequality follows from primal-dual theory.

\(\square \)

We will prove Proposition 6 in a sequence of lemmas.

Let us write \(a_i[v, \ell ]\) and \(b_i[v, \ell ]\) to be the values of these counters at the beginning of the ith iteration. We maintain the following invariants. The first counter, \(a_{i + 1}[v, \ell ]\) matches \( lh \mathopen {}\left( v; i\right) \). The second counter, \(b_i[v, \ell ]\) is how much we can afford to increase \(\alpha _i\) without violating \( rh \mathopen {}\left( v; s\right) \le \eta _{vs}\), where \(s \le i\). This is formalized in the next lemma.

Lemma 2

Let v be a vertex and \(\ell = 1, \ldots , k + 1\) be an integer. Shorten \(S = S_{v\ell }\) and write \(f(j, i) = \sum _{o \in S, j \le o \le i} \alpha _o\).

Then for any \(i \in S\),

$$\begin{aligned} a_{i + 1}[v, \ell ] = lh \mathopen {}\left( v; i\right) \end{aligned}$$
(2)

and

$$\begin{aligned} b_{i + 1}[v, \ell ] = \min _{j \in S, j \le i} \left( \eta _{vj} - f(j, i) \right) . \end{aligned}$$
(3)

Proof

We prove this claim by induction. Let i be the first index in S. Then \(a_i[v, \ell ] = 0\) and \(a_{i + 1}[v, \ell ] = \alpha _i\), and \(b_i[v, \ell ] = \infty \) and \(b_{i + 1}[v, \ell ] = \eta _{vi} - \alpha _{vi}\).

Assume that i is not the first index in S, and let \(j \in S\) be the previous index. Since \(a_i[v, \ell ] = a_{j + 1}[v, \ell ]\), we have

$$\begin{aligned} a_{i + 1}[v, \ell ] = a_{i}[v, \ell ] + \alpha _i = lh \mathopen {}\left( v; j\right) + \alpha _i = lh \mathopen {}\left( v; i\right) . \end{aligned}$$

Also, \(b_i[v, \ell ] = b_{j + 1}[v, \ell ]\), and

$$\begin{aligned} \begin{aligned} b_{i + 1}[v, \ell ]&= \min ( b_i[v, \ell ] - \alpha _i, \eta _{vi} - \alpha _i ) \\&= \min \left( \min _{s \in S, s \le j} \left( \eta _{vs} - f(s, i)\right) , \eta _{vi} - \alpha _i \right) \\&= \min _{s \in S, s \le i} \left( \eta _{vs} - f(s, i)\right) , \end{aligned} \end{aligned}$$

proving the lemma.\(\square \)

Our next step is to prove the feasibility of the output of k-Maximal. In order to do that, we first bound the counters.

Lemma 3

For each vertex v, index \(\ell = 1, \ldots , k\), and \(i \in S_{v\ell }\),

$$\begin{aligned} a_{i + 1}[v, \ell ] \le \theta _{vi} \end{aligned}$$
(4)

and

$$\begin{aligned} b_{i + 1}[v, \ell ] \ge 0. \end{aligned}$$
(5)

Proof

Since, \(\alpha _i \le \theta _{vi} - a_i[v, \ell ]\) we have

$$\begin{aligned} a_{i + 1}[v, \ell ] = a_i[v, \ell ] + \alpha _i \le \theta _{vi}. \end{aligned}$$

Also since, \(\alpha _i \le \min (b_i[v, \ell ], \eta _{vi})\) we have

$$\begin{aligned} b_{i + 1}[v, \ell ] = \min (b_{i}[v, \ell ], \eta _{vi}) - \alpha _i \ge 0. \end{aligned}$$

This proves the claim.\(\square \)

To prove the feasibility, we first show that \(\alpha _i \ge 0\).

Lemma 4

\(\alpha _i \ge 0\), for all i.

Proof

Let \((u, v, t) = e_i\), and let \(\ell \) such that \(i \in S_{v\ell }\). Let \(z_1\), and \(z_2\) be as defined by the algorithm in the ith round.

If i is the first index in \(S_{v\ell }\), then \(a_i[v, \ell ] = 0\) and \(b_i[v, \ell ] = \infty \). Then \(z_1 \ge 0\), and similarly \(z_2 \ge 0\). Consequently, \(\alpha _i \ge 0\).

Assume that i is not the first index in \(S_{v\ell }\), and let j be the previous edge index. Then Eq. 4 implies

$$\begin{aligned} a_{i}[v, \ell ] = a_{j + 1}[v, \ell ] \le \theta _{vj} \le \theta _{vi}. \end{aligned}$$

In addition, Eq. 5 implies \(b_i[v, \ell ] \ge 0\), so \(z_1 \ge 0\). Similarly, \(z_2 \ge 0\). Consequently, \(\alpha _i \ge 0\).\(\square \)

Next lemma shows that \(\left\{ \alpha _i\right\} \) satisfies the constraints, making the dual solution feasible.

Lemma 5

Let \(v \in V\), \(\ell = 1, \ldots , k + 1\), and \(i \in S_{v\ell }\). Then \( rh \mathopen {}\left( v; i\right) \le \eta _{vi}\) and \( lh \mathopen {}\left( v; i\right) \le \theta _{vi}\).

Proof

Equations 2 and 4 give us

$$\begin{aligned} lh \mathopen {}\left( v; i\right) = a_{i + 1}[v, \ell ] \le \theta _{vi}. \end{aligned}$$

Moreover, \( lh \mathopen {}\left( v; i\right) \) remains constant in the later rounds.

Let j be the last index in \(S_{v\ell }\). Then Eqs. 3 and 5 state that

$$\begin{aligned} 0 \le b_{j + 1}[v, \ell ] \le \eta _{vi} - rh \mathopen {}\left( v; i\right) . \end{aligned}$$

Moreover, the sum \( rh \mathopen {}\left( v; i\right) \) remains constant in the later rounds. Thus, \(\left\{ \alpha _i\right\} \) is a feasible dual solution.\(\square \)

Proof

Let \(e_i = (u, v, t)\) be an edge, and let \(\ell \) and r be the indices such that \(i \in S_{v\ell }\) and \(i \in S_{ur}\). We need to show that \(e_i\) is left-maximal.

If \(b_{i + 1}[v, \ell ] = 0\), then Eq. 3 states that there is \(j \in X_{v\ell }\) with \(j \le i\) such that \(\eta _{vj} = rh \mathopen {}\left( v; j\right) \). This is the definition of Case (i) of left-maximality. Similarly, \(b_{i + 1}[u, r] = 0\) leads to Case (ii).

Assume \(b_{i + 1}[v, \ell ] > 0\) and \(b_{i + 1}[u, r] > 0\). This can only happen if

$$\begin{aligned} \alpha _e< \min (b_i[v, \ell ], \eta _{vi}) \quad \text {and}\quad \alpha _e < \min (b_i[u, r], \eta _{ui}). \end{aligned}$$

Consequently, \(\alpha _i = \theta _{vi} - a_i[v, \ell ]\) or \(\alpha _i = \theta _{ui} - a_i[u, r]\). If the former, then Eq. 2 states \( lh \mathopen {}\left( v; i\right) = a_{i + 1}[v, \ell ] = \theta _{vi}\), leading to Case (iii). Similarly, the latter case leads to Case (iv). Thus, \(e_i\) is left-maximal.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rozenshtein, P., Tatti, N. & Gionis, A. The network-untangling problem: from interactions to activity timelines. Data Min Knowl Disc 35, 213–247 (2021). https://doi.org/10.1007/s10618-020-00717-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00717-5

Keywords

Navigation