SLVP: Self-Supervised Language-Video Pre-Training for Referring Video Object Segmentation | IEEE Conference Publication | IEEE Xplore