Semi-supervised Grounding Alignment for Multi-modal Feature Learning | IEEE Conference Publication | IEEE Xplore