Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial Rewarding

Published: 04 March 2024 Publication History


The diverse advertiser demands (brand effects or immediate outcomes) lead to distinct selling (pre-agreed volumes with an under-delivery penalty or compete per auction) and pricing (fixed prices or varying bids) patterns in Guaranteed delivery (GD) and real-time bidding (RTB) advertising. This necessitates fair impression allocation to unify the two markets for promoting ad content diversity and overall revenue. Existing approaches often deprive RTB ads of equal exposure opportunities by prioritizing GD ads, and coarse-grained methods are inferior to 1) Ambiguous reward due to varied objectives and constraints of GD fulfillment and RTB utility, hindering measurement of each allocation's contribution to the global interests; 2) Intensified competition by the coexistence of GD and RTB ads, complicating their mutual relationships; 3) Policy degradation caused by evolving user traffic and bid landscape, requiring adaptivity to distribution shifts.
We propose LIBRA, a generative-adversarial framework that unifies GD and RTB ads through request-level modeling. To guide the generative allocator, we solve convex optimization on historical data to derivehindsight optimal allocations that balance fairness and utility. We then train a discriminator to distinguish the generated actions from these solvedlatent expert policy's demonstrations, providing an integrated reward to align LIBRA with the optimal fair policy. LIBRA employs a self-attention encoder to capture the competitive relations among varying amounts of candidate ads per allocation. Further, it enhances the discriminator withinformation bottlenecks-based summarizer against overfitting to irrelevant distractors in the ad environment. LIBRA adopts a decoupled structure, where theoffline discriminator continuously fine-tunes with newly-coming allocations and periodically guides theonline allocation policy's updates to accommodate online dynamics. LIBRA has been deployed on the Tencent advertising system for over four months, with extensive experiments conducted. Online A/B tests demonstrate significant lifts in ad income (3.17%), overall click-through rate (1.56%), and cost-per-mille (3.20%), contributing a daily revenue increase of hundreds of thousands of RMB.


  Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial Rewarding



      WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining
      March 2024
      Published: 04 March 2024


      Author Tags

      display advertising
      generative adversarial network
      imitation learning
      reinforcement learning


