Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA | IEEE Conference Publication | IEEE Xplore