M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers | IEEE Conference Publication | IEEE Xplore