Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed.
Available at http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html
Image courtesy of xura, criminalatt, David Castillo Dominici, happykanppy at FreeDigitalPhotos.net.
There are several variations of the Lotka-Volterra model, e.g., the predator-prey/parasitism model. However, in this paper, we only focus on the simplest case where a i j ≥ 0(i ≠ j) for all species i and j (i.e., neutralism/amensalism/competition).
For example, given N users, there are N × 24 hours/resources per day, or fewer, depending on the keyword and the demographic group it appeals to.
In this paper, we assume that P(t) is the popularity density of a keyword, i.e., 0≤P(t)≤1, however, our equations can also handle other settings, such as the actual numbers of keyword appearances.
We can also say: the amount of available user resources for keyword i with a limited size of maximum popularity size K i is: \( K_{i} - {\sum }_{j=1}^{d} a_{ij} P_{j}(t). \)
Here, \(\log ^{*}\) is the universal code length for integers.
We digitize the floating number into c F = 8 bits.
Here, μ, σ 2 need 2c F bits, but we can eliminate them because they are constant values and independent of our modeling.
The authors would like to thank Christina Cowan for her help with interpreting the patterns of apparel companies. This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research Number 15H02705, 26730060, 26280112. This material is based upon work supported by the National Science Foundation under Grants No. CNS-1314632 and IIS-1408924; and by the Army Research Laboratory (ARL) under Cooperative Agreement Number W911NF-09-2-0053; and by a Google Focused Research Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, ARL, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Matsubara, Y., Sakurai, Y. & Faloutsos, C. Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities. World Wide Web 20, 439–465 (2017). https://doi.org/10.1007/s11280-016-0389-x
