Power Optimization in Device-to-Device Communications: A Deep Reinforcement Learning Approach With Dynamic Reward | IEEE Journals & Magazine | IEEE Xplore