Sustainable ℓ2-regularized actor-critic based on recursive least-squares temporal difference learning | IEEE Conference Publication | IEEE Xplore