Dyna-like reinforcement learning based on accumulative and average rewards | IEEE Conference Publication | IEEE Xplore