A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning | IEEE Conference Publication | IEEE Xplore