Skip to main content
Log in

Contextual bandits with hidden contexts: a focused data capture from social media streams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

This paper addresses the problem of real time data capture from social media. Due to different limitations, it is not possible to collect all the data produced by social networks such as Twitter. Therefore, to be able to gather enough relevant information related to a predefined need, it is necessary to focus on a subset of the information sources. In this work, we focus on user-centered data capture and consider each account of a social network as a source that can be followed at each iteration of a data capture process. This process, whose aim is to maximize the cumulative utility of the captured information for the specified need, is constrained at each time step by the number of users that can be monitored simultaneously. The problem of selecting a subset of accounts to listen to over time is a sequential decision problem under constraints, which we formalize as a bandit problem with multiple selections. In this work, we propose a contextual UCB-like approach, that uses the activity of any user during the current step to predict his future behavior. Besides the capture of usefulness variations, considering contexts also enables to improve the efficiency of the process by leveraging some structure in the search space. However, existing contextual bandit approaches do not fit for our setting where most of the contexts are hidden from the agent. We therefore propose a new algorithm, called HiddenLinUCB, which aims at dealing with such missing information via variational inference. Experiments demonstrate the very good behavior of this approach compared to existing methods for tasks of data capture from social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For instance on Twitter, up to 7000 messages can be released per second.

  2. In this paper, we focus on Twitter but several other media could have been considered, such as Facebook which proposes similar real-time APIs with restrictions. The full documentation of the Twitter APIs is available at: https://dev.twitter.com/streaming/public.

  3. \(\Phi ^{-1}\) corresponds to the Normal inverse cumulative distribution function.

  4. At step t, for any action i, \(q^*_{\tau _{i}}\) being a Gamma distribution with shape \(a_{i,t-1}\) and scale \(b_{i,t-1}\), the expectation \(\mathbb {E}[\tau _{i}]\) equals \(\dfrac{a_{i,t-1}}{b_{i,t-1}}\).

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 Dec 2011, Granada, Spain, pp 2312–2320

  • Agrawal S, Goyal N (2012a) Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT 2012—the 25th annual conference on learning theory, 25–27 June 2012, Edinburgh, Scotland, pp 39.1–39.26

  • Agrawal S, Goyal N (2012b) Further optimal regret bounds for Thompson sampling. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, PMLR 31, 2013, pp 99–107

  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 127–135

  • Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire RE (2014) Taming the monster: a fast and simple algorithm for contextual bandits. In: Proceedings of the 31th international conference on machine learning (ICML), 2014, pp 1638–1646

  • Agrawal S, Devanur NR, Li L (2015) Contextual bandits with global constraints and objective. CoRR abs/1506.03374, arXiv:1506.03374

  • Alla J, Lavrenko V, Papka R (1998) Event tracking. CIIR Technical report IR—128, Department of Computer Science, University of Massachusetts

  • Audibert J, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. In: COLT 2009—the 22nd conference on learning theory, Montreal, Quebec, Canada, 18–21 June 2009

  • Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor Comput Sci 410(19):1876–1902. https://doi.org/10.1016/j.tcs.2009.01.016

    Article  MathSciNet  MATH  Google Scholar 

  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256. https://doi.org/10.1023/A:1013689704352

    Article  MATH  Google Scholar 

  • Bao Y, Huang W, Yi C, Jiang J, Xue Y, Dong Y (2015) Effective deployment of monitoring points on social networks. In: International conference on computing, networking and communications, ICNC 2015, Garden Grove, CA, USA, 16–19 Feb 2015, pp 62–66. https://doi.org/10.1109/ICCNC.2015.7069316

  • Bechini A, Gazzè D, Marchetti A, Tesconi M (2016) Towards a general architecture for social media data capture from a multi-domain perspective. In: 30th IEEE international conference on advanced information networking and applications, AINA 2016, Crans-Montana, Switzerland, 23–25 Mar 2016, pp 1093–1100. https://doi.org/10.1109/AINA.2016.75

  • Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Discovery science—13th international conference, DS 2010, Canberra, Australia, 6–8 Oct 2010. Proceedings, pp 1–15. https://doi.org/10.1007/978-3-642-16184-1_1

  • Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York

    MATH  Google Scholar 

  • Bnaya Z, Puzis R, Stern R, Felner A (2013) Bandit algorithms for social network queries. In: 2013 international conference on social computing, pp 148–153. https://doi.org/10.1109/SocialCom.2013.29

  • Boanjak M, Oliveira E, Martins J, Mendes Rodrigues E, Sarmento L (2012) Twitterecho: a distributed focused crawler to support open research with twitter data. In: Proceedings of the 21st international conference companion on World Wide Web, ACM, New York, NY, USA, WWW ’12 Companion, pp 1233–1240. https://doi.org/10.1145/2187980.2188266

  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5(1):1–122

    Article  Google Scholar 

  • Cappé O, Garivier A, Maillard OA, Munos R, Stoltz G (2013) Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann Stat 41(3):1516–1541

    Article  MathSciNet  Google Scholar 

  • Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, ACM, New York, NY, USA, MDMKDD ’10, pp 4:1–4:10. https://doi.org/10.1145/1814245.1814249

  • Catanese S, Meo PD, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. In: Proceedings of the international conference on web intelligence, mining and semantics, WIMS 2011, Sogndal, Norway, 25–27 May 2011, p 52. https://doi.org/10.1145/1988688.1988749

  • Cesa-Bianchi N, Gentile C, Zappella G (2013) A gang of bandits. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held 5–8 Dec 2013, Lake Tahoe, Nevada, US, pp 737–745

  • Chakrabarti S, van den Berg M, Dom B (1999) Focused crawling: a new approach to topic-specific web resource discovery. In: Proceedings of the eighth international conference on World Wide Web, Elsevier North-Holland, Inc., New York, NY, USA, WWW ’99, pp 1623–1640

  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. In: Advances in neural information processing systems 24, 12–14 Dec 2011. Granada, Spain, pp 2249–2257

  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: general framework and applications. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 151–159

  • Chu W, Li L, Reyzin L, Schapire RE (2011) Contextual bandits with linear payoff functions. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, 11–13 Apr 2011, pp 208–214

  • Colbaugh R, Glass K (2011) Emerging topic detection for business intelligence via predictive analysis of ’meme’ dynamics. In: AI for business agility, papers from the 2011 AAAI spring symposium, technical report SS-11-03, Stanford, California, USA, 21–23 Mar 2011

  • De Choudhury M, Lin YR, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? In: Proceedings of the 4th international association for the advancement of artificial intelligence (AAAI) conference on weblogs and social media

  • Garivier A, Moulines E (2011) On upper-confidence bound policies for switching bandit problems. In: Algorithmic learning theory—22nd international conference, ALT 2011, Espoo, Finland, 5–7 Oct 2011. Proceedings, pp 174–188. https://doi.org/10.1007/978-3-642-24412-4_16

  • Gentile C, Li S, Zappella G (2014) Online clustering of bandits. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 757–765

  • Gisselbrecht T, Denoyer L, Gallinari P, Lamprier S (2015) Whichstreams: a dynamic approach for focused data capture from large social media. In: Proceedings of the ninth international conference on web and social media, ICWSM 2015, University of Oxford, Oxford, UK, 26–29 May 2015, pp 130–139

  • Gisselbrecht T, Lamprier S, Gallinari P (2016) Linear bandits in unknown environments. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2016, Riva del Garda, Italy, 19–23 Sept 2016, Proceedings, Part II, pp 282–298. https://doi.org/10.1007/978-3-319-46227-1_18

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: INFOCOM 2010. 29th IEEE international conference on computer communications, joint conference of the IEEE computer and communications societies, 15–19 Mar 2010, San Diego, CA, USA, pp 2498–2506. https://doi.org/10.1109/INFCOM.2010.5462078

  • Gupta P, Goel A, Lin JJ, Sharma A, Wang D, Zadeh R (2013) WTF: the who to follow service at twitter. In: 22nd international World Wide Web conference, WWW ’13, Rio de Janeiro, Brazil, 13–17 May 2013, pp 505–514

  • Hannon J, Bennett M, Smyth B (2010) Recommending twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the 2010 ACM conference on recommender systems, RecSys 2010, Barcelona, Spain, 26–30 Sept 2010, pp 199–206. https://doi.org/10.1145/1864708.1864746

  • Hoffman M, Blei DM, Wang C, Paisley J (2012) Stochastic variational inference. J Mach Learn Res 14:1303–1347

    MathSciNet  MATH  Google Scholar 

  • Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, 2010, pp 80–88

  • Joshi S, Boyd S (2009) Sensor selection via convex optimization. IEEE Trans Signal Process 57(2):451–462. https://doi.org/10.1109/TSP.2008.2007095

    Article  MathSciNet  MATH  Google Scholar 

  • Kale S, Reyzin L, Schapire RE (2010) Non-stochastic bandit slate problems. In: Proceedings of the 23rd international conference on neural information processing systems, vol 1. Curran Associates Inc., USA, NIPS’10, pp 1054–1062

  • Kaufmann E, Korda N, Munos R (2012) Thompson sampling: an asymptotically optimal finite-time analysis. In: Algorithmic learning theory—23rd international conference, ALT 2012, Lyon, France, 29–31 Oct 2012. Proceedings, pp 199–213. https://doi.org/10.1007/978-3-642-34106-9_18

  • Kazerouni A, Ghavamzadeh M, Abbasi-Yadkori Y, Van Roy B (2016) Conservative contextual linear bandits. In: Proceedings of the thirty-first conference on neural information processing systems, NIPS’17, 2017. Proceedings, pp 3913–3922

  • Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: Proceedings of the 2nd international conference on learning representations, ICLR’14

  • Lage R, Denoyer L, Gallinari P, Dolog P (2013) Choosing which message to publish on social networks: a contextual bandit approach. In: Advances in social networks analysis and mining 2013, ASONAM ’13, Niagara, ON, Canada—25–29 Aug 2013, pp 620–627

  • Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6(1):4–22

    Article  MathSciNet  Google Scholar 

  • Lamprier S, Gisselbrecht T, Gallinari P (2017) Variational Thompson sampling for relational recurrent bandits. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2017, Skopje, Macedonia, 18–22 Sept 2017, proceedings, Part II, pp 405–421. https://doi.org/10.1007/978-3-319-71246-8_25

  • Li L, Chu W, Langford J, Wang X (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In: Proceedings of the forth international conference on web search and web data mining, WSDM 2011, Hong Kong, China, 9–12 Feb 2011, pp 297–306. https://doi.org/10.1145/1935826.1935878

  • Li R, Wang S, Chang KC (2013) Towards social data platform: Automatic topic-focused monitor for twitter stream. In: Proceedings of the 39th international conference on very large data bases, PVLDB’13, 6(14), pp 1966–1977

  • Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. In: Proceedings of the 34-th international conference on machine learning (ICML), 2017, pp 2071–2080

  • Micarelli A, Gasparetti F (2007) Adaptive focused crawling. In: Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web. Lecture notes in computer science, vol 4321. Springer, Berling, pp 231–262. https://doi.org/10.1007/978-3-540-72079-9_7

  • Qin L, Chen S, Zhu X (2014) Contextual combinatorial bandit and its application on diversified online recommendation. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, 24–26 Apr 2014, pp 461–469. https://doi.org/10.1137/1.9781611973440.53

  • Slivkins A (2014) Contextual bandits with similarity information. J Mach Learn Res 15(1):2533–2568

    MathSciNet  MATH  Google Scholar 

  • Spaan MTJ, Lima PU (2009) A decision-theoretic approach to dynamic sensor selection in camera networks. In: Proceeding of the 19th international conference on automated planning and scheduling, pp 279–304

  • Taxidou I (2013) Realtime analysis of information diffusion in social media. In: Proceedings of the 39th international conference on very large data bases, PVLDB’13, 6(12), pp 1416–1421

  • Taxidou I, Fischer PM (2014) Rapid: A system for real-time analysis of information diffusion in twitter. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM 2014, Shanghai, China, 3–7 Nov 2014, pp 2060–2062. https://doi.org/10.1145/2661829.2661849

  • Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull Am Math Soc 25:285–294

    MATH  Google Scholar 

  • Vaswani S, Lakshmanan LVS (2015) Influence maximization with bandits. CoRR abs/1503.00024, arXiv:1503.00024

  • Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S (2012) A system for real-time twitter sentiment analysis of 2012 U.S. presidential election cycle. In: The 50th annual meeting of the association for computational linguistics, proceedings of the system demonstrations, 10 July 2012, Jeju Island, Korea, pp 115–120

  • Wang H, Wu Q, Wang H (2016) Learning hidden features for contextual bandits. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’16, pp 1633–1642. https://doi.org/10.1145/2983323.2983847

  • Winn JM, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research work has been carried out in the framework of the Technological Research Institute SystemX, and therefore granted with public funds within the scope of the French Program “Investissements d’Avenir”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvain Lamprier.

Additional information

Responsible editor: Kristian Kersting.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof for Proposition 3

Following the variational principle described above, the optimal variational distribution \(q_{\beta }^{\star }(\beta )\) for \(\beta \) satisfies:

$$\begin{aligned} \log q_{\beta }^{\star }(\beta )&=\,\mathbb {E}_{\beta ^{\backslash }}\Bigg [ \sum \limits _{i=1}^K\sum \limits _{s\in \mathcal {A}_{i,t-1}\cup \mathcal {B}_{i,t-1}} \log p(r_{i,s} | \beta , x_{i,s}) + \log p(\beta ) \Bigg ]+C\\&=\mathbb {E}_{\beta ^{\backslash }} \Bigg [ - \dfrac{1}{2} \sum \limits _{i=1}^{K} \sum \limits _{s\in \mathcal {A}_{i,t-1}\cup \mathcal {B}_{i,t-1}} (r_{i,s}-x_{i,s}^{\top }\beta )^{2} - \dfrac{1}{2} \beta ^{\top }\beta \Bigg ]+ C \\&= \mathbb {E}_{\beta ^{\backslash }} \Bigg [ - \dfrac{1}{2} \beta ^{\top }\left( I+\sum \limits _{i=1}^{K} \sum \limits _{s\in \mathcal {A}_{i,t-1}\cup \mathcal {B}_{i,t-1}}x_{i,s}x_{i,s}^{\top } \right) \beta \\&\quad + \beta ^{\top }\left( \sum \limits _{i=1}^{K} \sum \limits _{s\in \mathcal {A}_{i,t-1}\cup \mathcal {B}_{i,t-1}}x_{i,s}r_{i,s} \right) \Bigg ] +C \end{aligned}$$

where \(\mathbb {E}_{\beta ^{\backslash }}\) stands for the expectation over all terms but \(\beta \) and C denotes terms which do not depend on \(\beta \).

By considering the independence of factors as defined in the formulation (18), and the definitions of \(V_{t-1}\) and \(\hat{\beta }_{t-1}\) in the proposition, we get:

$$\begin{aligned} \log q_{\beta }^{\star }(\beta )&= -\dfrac{1}{2}(\beta -\hat{\beta }_{t-1})^{\top }V_{t-1}(\beta -\hat{\beta }_{t-1}) +C \end{aligned}$$

which corresponds to a Gaussian \(\mathcal {N}(\hat{\beta }_{t-1},V_{t-1}^{-1})\).

1.2 Proof for Proposition  4

Following the variational principle described above, the optimal variational distribution \(q_{x_{i,s}}^{\star }(x_{i,s})\) for every action i and step \(s<t\) satisfies:

$$\begin{aligned} \log q_{x_{i,s}}^{\star }(x_{i,s})&=\mathbb {E}_{x_{i,s}^{\backslash }}[\log p(r_{i,s} | \beta ,x_{i,s})+\log p(x_{i,s}|\mu _{i},\tau _{i})]+C\\&=-\dfrac{1}{2}\mathbb {E}_{x_{i,s}^{\backslash }}\left[ (r_{i,s}-x_{i,s}^{\top }\beta )^{2} +\tau _{i}(x_{i,s}-\mu _{i})^{\top }(x_{i,s}-\mu _{i})\right] +C\\&=-\dfrac{1}{2}\mathbb {E}_{x_{i,s}^{\backslash }}\left[ x_{i,s}^{\top }(\beta \beta ^{\top }+\tau _{i}I)x_{i,s}-2x_{i,s}^{\top }(\beta r_{i,s}+\mu _{i}\tau _{i})\right] +C\\ \end{aligned}$$

where \(\mathbb {E}_{x_{i,s}^{\backslash }}\) stands for the expectation over all terms but \(x_{i,s}\) and C denotes terms which do not depend on \(x_{i,s}\).

By considering the independence of factors as defined in the formulation (18), and the definitions of \(W_{i,s}^{-1}\) and \(\hat{x}_{i,s}\) in the proposition, we get:

$$\begin{aligned} \log q_{x_{i,s}}^{\star }(x_{i,s})&=-\dfrac{1}{2}(x_{i,s}-\hat{x}_{i,s})^{\top }W_{i,s}(x_{i,s}-\hat{x}_{i,s}) +C \end{aligned}$$

which corresponds to a Gaussian \(\mathcal {N}(\hat{x}_{i,s},W_{i,s}^{-1})\).

1.3 Proof for Proposition  5

Following the variational principle described above, the optimal variational distribution \(q_{\mu _{i}}^{\star }(\mu _{i})\) for every action i at step t satisfies:

$$\begin{aligned} \log q_{\mu _{i}}^{\star }(\mu _{i})&=\mathbb {E}_{\mu _{i}^{\backslash }}\Bigg [\log p((x_{i,s})_{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}}|\mu _{i},\tau _{i}) +\log p(\mu _{i}|\tau _{i}) \Bigg ]+C\\&=\,\mathbb {E}_{\mu _{i}^{\backslash }} \Bigg [- \dfrac{\tau _{i}}{2} \left( \sum \limits _{\begin{array}{c} s\in \mathcal {B}_{i,t-1} \\ \cup \mathcal {C}_{i,t-1} \end{array}}(x_{i,s}-\mu _{i})^{\top }(x_{i,s}-\mu _{i}) + \mu _{i}^{\top }\mu _{i} \right) \Bigg ] +C\\&=\mathbb {E}_{\mu _{i}^{\backslash }}\Bigg [- \dfrac{\tau _i}{2} \mu _{i}^{\top }\mu _{i}(1+n_{i,t-1}) + \tau _i \mu _{i}^{\top } \sum \limits _{\begin{array}{c} s\in \mathcal {B}_{i,t-1} \\ \cup \mathcal {C}_{i,t-1} \end{array}}x_{i,s} \Bigg ]+C \end{aligned}$$

where \(\mathbb {E}_{\mu _{i}^{\backslash }}\) stands for the expectation over all terms but \(\mu _{i}\), C denotes terms which do not depend on \(\mu _{i}\) and \(n_{i,t-1}=|\mathcal {B}_{i,t-1}|+|\mathcal {C}_{i,t-1}|\).

By considering the independence of factors as defined in the formulation (18), and the definitions of \(\Sigma _{i,t-1}^{-1}\) and \(\hat{\mu }_{i,t-1}\) in the proposition, we get:

$$\begin{aligned} \log q_{\mu _{i}}^{\star }(\mu _{i})&= -\dfrac{1}{2}(\mu _{i}-\hat{\mu }_{i,t-1})^{\top }\Sigma _{i,t-1}(\mu _{i}-\hat{\mu }_{i,t-1}) +C \end{aligned}$$

which corresponds to a Gaussian \(\mathcal {N}(\hat{\mu }_{i,t-1},\Sigma _{i,t-1}^{-1})\).

1.4 Proof for Proposition  6

Following the variational principle described above, the optimal variational distribution \(q_{\tau _{i}}^{\star }(\tau _{i})\) for every action i at step t satisfies:

$$\begin{aligned} \log q_{\tau _{i}}^{\star }(\tau _{i})&=\mathbb {E}_{\tau _{i}^{\backslash }} \Bigg [ \log p((x_{i,s})_{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}}|\mu _{i},\tau _{i})+ \log p(\mu _{i}|\tau _{i})+\log p(\tau _{i}) \Bigg ] +C \end{aligned}$$

where \(\mathbb {E}_{\tau _{i}^{\backslash }}\) stands for the expectation over all terms but \(\tau _{i}\) and C denotes terms which do not depend on \(\tau _{i}\).

Let us remind that \(p(\tau _{i})\) corresponds to a Gamma distribution with parameters \(a_{0}\) and \(b_{0}\), whose density is given by: \(p(\tau _{i})=\frac{b_{0}^{b_{0}}\tau _{i}^{a_{0}-1}e^{-\tau _{i}b_{0}}}{\Gamma (a_{0})}\), where \(\Gamma (a_{0})\) stands for a normalization constant. Also, since the density of a Gaussian of mean m and covariance matrix \(\tau ^{-1} I\) is given by \(f(x;m,\tau ^{-1} I)=\frac{\tau _{i}^{d/2}}{(2\pi )^{d/2}}e^{-\tau _{i}/2(x-m)^{\top }(x-m)}\), we get:

$$\begin{aligned} \log q_{\tau _{i}}^{\star }(\tau _{i})&= \mathbb {E}_{\tau _{i}^{\backslash }} \Bigg [ -\dfrac{1}{2}\sum \limits _{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}}\tau _{i}(x_{i,s}-\mu _{i})^{\top }(x_{i,s}-\mu _{i}) \\&\quad -\dfrac{1}{2}\tau _{i}\mu _{i}^{\top }\mu _{i} +\sum \limits _{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}} \dfrac{d}{2}\log \tau _{i} \\&\quad + \dfrac{d}{2}\log \tau _{i} +(a_{0}-1)\log \tau _{i} -b_{0}\tau _{i} \Bigg ] +C \\&= \mathbb {E}_{\tau _{i}^{\backslash }} \Bigg [ -\dfrac{\tau _{i}}{2} (n_{i,t-1}+1) \mu _{i}^{\top }\mu _{i} + \tau _{i} \mu _{i}^{\top }\sum \limits _{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}}x_{i,s} \\&\quad + -\dfrac{\tau _{i}}{2} \sum \limits _{s\in \mathcal {B}_{i,t-1} \cup \mathcal {C}_{i,t-1}}x_{i,s}^{\top }x_{i,s} \\&\quad + (a_{0}-1)\log \tau _{i} -b_{0}\tau _{i}+\dfrac{d(n_{i,t-1}+1)}{2} \log \tau _{i} \Bigg ] + C \end{aligned}$$

By considering the independence of factors as defined in the formulation (18), and the definitions of \(a_{i,t-1}\) and \(b_{i,t-1}\) given in the proposition, we get:

$$\begin{aligned} \log q_{\tau _{i}}^{\star }(\tau _{i})&= (a_{i,t-1}-1 )\log \tau _{i}-b_{i,t-1}\tau _{i}+C \end{aligned}$$

which corresponds to a Gamma with parameters \(a_{i,t-1}\), \(b_{i,t-1}\).

1.5 Proof for Proposition  8

From Proposition 3, at step t, \(\beta \) follows a Gaussian distribution with mean \(\hat{\beta }_{t-1}\) and variance \(V_{t-1}^{-1}\). Therefore, \(\beta ^{\top }\hat{\mu }_{i,t-1}\) follows a Gaussian distribution with mean \(\hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1}\) and variance \(\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}\), hence we directly have:  \(\mathbb {P}\Big (|\beta ^{\top }\hat{\mu }_{i,t-1}-\hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1} | \le \alpha _{1} \sqrt{\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}}\Big )= 1-\delta _{1}\).

1.6 Proof for Proposition  9

From Proposition 5, at any step t, \(\mu _{i}\) follows a Gaussian law of mean \(\hat{\mu }_{i,t-1}\) and variance \(\Sigma _{i,t-1}^{-1}\). Thus, the random variable \(\mu _{i}-\hat{\mu }_{i,t-1}\) follows a Gaussian law of null mean and variance \(\Sigma _{i,t-1}^{-1}\). Also, we know from this proposition that \(\Sigma _{i,t-1}^{-1}=((1+n_{i,t-1})E[\tau _{i}])^{-1}I\), and thus \(\Sigma _{i,t-1}^{-1}\) is diagonal with equal components on the diagonal. Let us temporarily denote \(\Sigma _{i,t-1}^{-1}\) as \(\sigma _{i,t}^{2} I\). Then, each component of the vector \(\dfrac{\mu _{i}-\hat{\mu }_{i,t-1}}{\sigma _{i,t}}\) follows a standard Gaussian \(\mathcal {N}(0,1)\), and therefore \(\dfrac{||\mu _{i}-\hat{\mu }_{i,t-1}||^{2}}{\sigma _{i,t}^2}\) follows a \(\chi ^{2}\) law with d freedom degrees. Let us note \(\Psi \) the cumulative distribution function of the \(\chi ^{2}\) law with d freedom degrees. We have:

$$\begin{aligned} \mathbb {P}\left( \dfrac{||\mu _{i}-\hat{\mu }_{i,t-1}||^{2}}{\sigma _{i,t}^{2}} \le \eta \right) = \Psi (\eta ) \end{aligned}$$

Equivalently, we have:

$$\begin{aligned} \mathbb {P}\left( ||\mu _{i}-\hat{\mu }_{i,t-1}|| \le \sqrt{\eta }\sigma _{i,t} \right) = \Psi (\eta ) \end{aligned}$$

Now let us consider that \(|\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1}) | \le S||\mu _{i}-\hat{\mu }_{i,t-1}||\), since S is an upper-bound for \(||\beta ||\). Therefore, we get:

$$\begin{aligned} \mathbb {P}\left( |\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1}) | \le S\sqrt{\eta }\sigma _{i,t} \right) \ge \Psi (\eta ) \end{aligned}$$

If we set \(\Psi (\eta )=1-\delta _{2}\), it becomes:

$$\begin{aligned} \mathbb {P}\left( |\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1}) | \le S\sigma _{i,t}\sqrt{\Psi ^{-1}(1-\delta _{2})} \right) \ge 1-\delta _{2} \end{aligned}$$

where \(\Psi ^{-1}\) stands for the inverse cumulative distribution function of a \(\chi ^{2}\) law with d freedom degrees. Also, from Proposition 6, we know that \(\sigma _{i,t}=\sqrt{\dfrac{1}{(1+n_{i,t-1})E[\tau _{i}]}}=\sqrt{\dfrac{b_{i,t-1}}{a_{i,t-1}(1+n_{i,t-1})}}\), which proves the proposition.

1.7 Proof for Theorem 2

From Propositions 8 and 9 and the Boole inequality, we get:

$$\begin{aligned}&P\left( |\mathbb {E}[r_{i,t}]-\hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1}|> \alpha _{1} \sqrt{\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}} + \alpha _{2}\sqrt{\dfrac{b_{i,t-1}}{a_{i,t-1}(1+n_{i,t-1})}} \right) \\&\quad = P\left( |\beta ^{\top }\hat{\mu }_{i,t-1} +\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1})-\hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1}| \right. \\&\qquad \left.>\alpha _{1} \sqrt{\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}} + \alpha _{2}\sqrt{\dfrac{b_{i,t-1}}{a_{i,t-1}(1+n_{i,t-1})}} \right) \\&\quad \le \mathbb {P}\left( |\beta ^{\top }\hat{\mu }_{i,t-1} - \hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1} |+|\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1}) | \right. \\&\qquad \left.> \alpha _{1} \sqrt{\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}}+\alpha _{2}\sqrt{\dfrac{b_{i,t-1}}{a_{i,t-1}(1+n_{i,t-1})}}\right) \\&\quad \le \mathbb {P}\left( \left\{ |\beta ^{\top }\hat{\mu }_{i,t-1} - \hat{\beta }_{t-1}^{\top }\hat{\mu }_{i,t-1} |> \alpha _{1} \sqrt{\hat{\mu }_{i,t-1}^{\top }V_{t-1}^{-1}\hat{\mu }_{i,t-1}} \right\} \text { } \right. \\&\qquad \left. \cup \text { } \left\{ |\beta ^{\top }(\mu _{i}-\hat{\mu }_{i,t-1}) | >\alpha _{2}\sqrt{\dfrac{b_{i,t-1}}{a_{i,t-1}(1+n_{i,t-1})}}\right\} \right) \\&\quad \le \delta _{1}+\delta _{2} \end{aligned}$$

By considering the contrapositive of this inequality, one obtain the announced result.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lamprier, S., Gisselbrecht, T. & Gallinari, P. Contextual bandits with hidden contexts: a focused data capture from social media streams. Data Min Knowl Disc 33, 1853–1893 (2019). https://doi.org/10.1007/s10618-019-00648-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00648-w

Keywords

Navigation