D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers | IEEE Conference Publication | IEEE Xplore