Reinforcement learning for penalty avoiding policy making | IEEE Conference Publication | IEEE Xplore