Abstract:
For scene graph generation, it is crucial to properly understand the relationships of objects within the context of the image. We design a label transformation method usi...Show MoreMetadata
Abstract:
For scene graph generation, it is crucial to properly understand the relationships of objects within the context of the image. We design a label transformation method using a Transformer-VAE (Variational Autoencoder) structure, which converts bounding box labels into auxiliary labels that contain each object’s context in an unsupervised manner. The auxiliary labels are then trained jointly with bounding box labels and relation labels in a multi-task way. Our approach does not require any external datasets or language prior and is applicable to any graph generation models that infer the relationship between pairs of objects. We validate our method’s effectiveness and scalability with state-of-the-art scene graph generation models on VRD and VG datasets.
Date of Conference: 19-22 September 2021
Date Added to IEEE Xplore: 23 August 2021
ISBN Information: