Human-Agent Collaboration Strategies for Vision-Grounded Instruction Following | IEEE Conference Publication | IEEE Xplore