Authors:
Brandon Birmingham
and
Adrian Muscat
Affiliation:
Department of Communications and Computer Engineering, University of Malta, Msida MSD 2080 and Malta
Keyword(s):
Spatial Relations, Image Understanding, Multi-label Learning, Clustering, Computer Vision, Natural Language Processing.
Related
Ontology
Subjects/Areas/Topics:
Informatics in Control, Automation and Robotics
;
Robotics and Automation
;
Vision, Recognition and Reconstruction
Abstract:
Detecting spatial relations between objects in an image is a core task in image understanding and grounded natural language. This problem has been addressed in cognitive linguistics through the development of template and computational models from controlled experimental data using 2D or 3D synthetic diagrams. Furthermore, the Computer Vision (CV) and Natural Language Processing (NLP) communities developed machine learning models for real-world images mostly from crowd-sourced data. The latter models treat the problem as a single label classification problem, whereas the problem is inherently a multi-label problem. In this paper, we learn a multi-label model based on computed spatial features. We choose to implement the model using a clustering-based approach, since apart from predicting multi-labels for a given instance, this method would allow us to get deeper insights into how spatial relations are related to each other. In this paper, we report our results from this model and a d
irect comparison with a Random Forest single label classifier is presented. The proposed model shows that in general it outperforms the single label classifier even when considering the top four prepositions predicted by the single label classifier.
(More)