Improving Generalization Performance of Adaptive Learning Rate by Switching from Block Diagonal Matrix Preconditioning to SGD | IEEE Conference Publication | IEEE Xplore