Cross-Modal Fusing Vision-Language Network for Referring Image Segmentation | IEEE Conference Publication | IEEE Xplore