Learning Cross-Modal Representations for Language-Based Image Manipulation | IEEE Conference Publication | IEEE Xplore