Abstract:
Visual patterns are basic elements in images and represent the discernible regularity in the visual world. Thus, mining visual patterns is a fundamental task in computer ...Show MoreMetadata
Abstract:
Visual patterns are basic elements in images and represent the discernible regularity in the visual world. Thus, mining visual patterns is a fundamental task in computer vision. Most previous studies consider that only one visual pattern exists in a category, and then builds up a one-to-one mapping using category label. In reality, however, many categories include multiple patterns, which are many-to-one mappings. Without knowing the information of patterns, few existing pattern mining methods can discover and distinguish varied patterns in a category. To tackle this problem, we propose a novel framework, PaclMap, which learns medium-grained features to represent patterns. It includes an unsupervised pattern-level contrastive learning and a multipattern activation map. Their joint optimization encourages the network to mine both discriminative and frequent patterns in a category. Extensive experiments conducted on four benchmark datasets (Place-20, imagenet large scale visual recognition challenge (ILSVRC)-20, visual object classes (VOC), and Travel) demonstrate that PaclMap outperforms six state-of-the-art methods with average improvements of 2.9% on accuracy and 12.3% on frequency, respectively.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 35, Issue: 7, July 2024)