Abstract
We present a new dataset of images of pinned insects from museum collections along with a ground truth phylogeny (a graph representing the relative evolutionary distance between species). The images include segmentations, and can be used for clustering and deep hierarchical metric learning. As far as we know, this is the first dataset released specifically for generating phylogenetic trees. We provide several benchmarks for deep metric learning using a selection of state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
To genus-level means that each species within a genus is considered unresolved, or equally likely to be related to any other species within that genus.
- 2.
The taxonomy represents how the organism is classified - ie which class, order, family the organism belongs to, and is a non binary tree. The phylogeny represents how related different species are together, and would ideally be a binary tree. In an ideal world the taxonomy would be a congruent to the phylogeny, but in reality they tend to diverge as taxonomic revisions take longer.
- 3.
it is, however, different from the edit distance popular in computer science.
References
Bakalar, N.: Nicholas. The New York Times (2014). https://www.nytimes.com/2014/05/27/science/welcoming-the-newly-discovered.html
Bameri, F., Pourreza, H.R., Taherinia, A.H., Aliabadian, M., Mortezapour, H.R., Abdilzadeh, R.: TMTCPT: The tree method based on the taxonomic categorization and the phylogenetic tree for fine-grained categorization. Biosystems 195, 104137 (2020). https://doi.org/10.1016/j.biosystems.2020.104137
Brunke, A., Smetana, A.: A new genus of staphylinina and a review of major lineages (staphylinidae: Staphylininae: Staphylinini). System. Biodiv. 17, 745–758 (2019). https://doi.org/10.1080/14772000.2019.1691082
Chani-Posse, M.R., Brunke, A.J., Chatzimanolis, S., Schillhammer, H., Solodovnikov, A.: Phylogeny of the hyper-diverse rove beetle subtribe philonthina with implications for classification of the tribe staphylinini (coleoptera: Staphylinidae). Cladistics 34(1), 1–40 (2018). https://doi.org/10.1111/cla.12188
Cho, H., Ahn, C., Min Yoo, K., Seol, J., Lee, S.g.: Leveraging class hierarchy in fashion classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (October 2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference On Computer Vision And Pattern Recognitio, pp. 248–255. IEEE (2009)
Deng, J., Guo, J., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. CoRR abs/ arXiv: 1801.07698 (2018)
DiSSCo: Distributed system of scientific collections (July 2022). https://www.dissco.eu/
Felsenstein, J.: Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)
Felsenstein, J.: Statistical inference of phylogenies. J. Royal Statis. Soc. Ser. A (General) 146(3), 246–262 (1983). https://doi.org/10.2307/2981654
Felsenstein, J.: Inferring phylogenies. Sinauer associates, Sunderland, MA (2003)
Fink, M., Ullman, S.: From aardvark to zorro: A benchmark for mammal image classification. Int. J. Comput. Vision 77(1–3), 143–156 (2008). https://doi.org/10.1007/s11263-007-0066-8
Goëau, H., Bonnet, P., Joly, A.: Overview of plantclef 2021: cross-domain plant identification. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, vol. 2936, pp. 1422–1436 (2021)
Gutschenreiter, D., Bech, S.: Deep-learning methods on taxonomic beetle data Automated segmentation and classification of beetles on genus and species level. Master’s thesis, University of Copenhagen (2021)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). vol. 2, pp. 1735–1742 (2006). https://doi.org/10.1109/CVPR.2006.100
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Hedrick, B.P., et al.: Digitization and the future of natural history collections. Bioscience 70(3), 243–251 (2020). https://doi.org/10.1093/biosci/biz163
Hudson, L.N., et al.: Inselect: Automating the digitization of natural history collections. PLoS ONE 10(11), 1–15 (2015). https://doi.org/10.1371/journal.pone.0143402
Höhna, L., Heath, B., Lartillot, M., Huelsenbeck, R.: Revbayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65, 726–736 (2016)
iDigBio: Integrated digitized biocollections (July 2022). https://www.idigbio.org/
J. Peraire, S.W.: Lecture notes from mit course 16.07 dynamics, fall 2008. l26–3d rigid body dynamics: The inertia tensor (2008). https://ocw.mit.edu/courses/16-07-dynamics-fall-2009/dd277ec654440f4c2b5b07d6c286c3fd_MIT16_07F09_Lec26.pdf
Kaya, M., B\(\dot{i}\)lge, H.S.: Deep metric learning: A survey. Symmetry 11(9) (2019). https://doi.org/10.3390/sym11091066, https://www.mdpi.com/2073-8994/11/9/1066
Kiel, S.: Assessing bivalve phylogeny using deep learning and computer vision approaches. bioRxiv (2021). https://doi.org/10.1101/2021.04.08.438943, https://www.biorxiv.org/content/early/2021/04/09/2021.04.08.438943
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Kuhner, M.K., Yamato, J.: Practical performance of tree comparison metrics. System. Biol. 64(2), 205–214 (2014). https://doi.org/10.1093/sysbio/syu085
Lee, M., Palci, A.: Morphological phylogenetics in the genomic age. Current Biol. 25(19), R922–R929 (2015). https://doi.org/10.1016/j.cub.2015.07.009, https://www.sciencedirect.com/science/article/pii/S096098221500812X
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
MacQueen, J.: Classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 360–368 (2017). https://doi.org/10.1109/ICCV.2017.47
Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41
Natural History Museum of Denmark: Digital nature: Giant grant makes the natural history collections of denmark accessible to everyone (2021). Newsletter
Natural History Museum of Denmark: (2022). Entomology - Dry and Wet Collections. Homepage
Nye, T., Lio, P., Gilks, W.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics (Oxford, England) 22, 117–119 (2006). https://doi.org/10.1093/bioinformatics/bti720
Orlov, I., Leschen, R.A., Żyła, D., Solodovnikov, A.: Total-evidence backbone phylogeny of aleocharinae (coleoptera: Staphylinidae). Cladistics 37(4), 343–374 (2021). https://doi.org/10.1111/cla.12444
Parins-Fukuchi, C.: Use of continuous traits can improve morphological phylogenetics. System. Biol. 67(2), 328–339 (2017). https://doi.org/10.1093/sysbio/syx072
Popov, D., Roychoudhury, P., Hardy, H., Livermore, L., Norris, K.: The value of digitising natural history collections. Res. Ideas Outcomes 7, e78844 (2021). https://doi.org/10.3897/rio.7.e78844
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2, https://www.sciencedirect.com/science/article/pii/0025556481900432
Roth, K., Milbich, T., Sinha, S., Gupta, P., Ommer, B., Cohen, J.P.: Revisiting training strategies and generalization performance in deep metric learning (2020)
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (09–15 June 2019), https://proceedings.mlr.press/v97/tan19a.html
Van Horn, G., Cole, E., Beery, S., Wilber, K., Belongie, S., MacAodha, O.: Benchmarking representation learning for natural world image collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12884–12893 (2021)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5017–5025 (2019). https://doi.org/10.1109/CVPR.2019.00516
Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2859–2867 (2017). https://doi.org/10.1109/ICCV.2017.309
Wu, X., Zhan, C., Lai, Y., Cheng, M.M., Yang, J.: Ip102: A large-scale benchmark dataset for insect pest recognition. In: IEEE CVPR, pp. 8787–8796 (2019)
Yuan, Y., Chen, W., Yang, Y., Wang, Z.: In defense of the triplet loss again: Learning robust person re-identification with fast approximated triplet loss and label distillation (2019)
Żyła, D., Bogri, A., Hansen, A.K., Jenkins Shaw, J., Kypke, J., Solodovnikov, A.: A New Termitophilous Genus of Paederinae Rove Beetles (Coleoptera, Staphylinidae) from the Neotropics and Its Phylogenetic Position. Neotrop. Entomol. 51(2), 282–291 (2022). https://doi.org/10.1007/s13744-022-00946-x
Żyła, D., Solodovnikov, A.: Multilocus phylogeny defines a new classification of staphylininae (coleoptera, staphylinidae), a rove beetle group with high lineage diversity. Syst. Entomol. 45(1), 114–127 (2020). https://doi.org/10.1111/syen.12382
Acknowledgements
Many people have been involved in this project. First we would like to thank David Gutschenreiter, Søren Bech and André Fastrup who took the photos of the unit trays and completed the initial segmentations of the images as part of their theses. Next, we would like to thank Alexey Solodovnikov of the Natural History Museum of Denmark for providing the specimens, the ground truth phylogeny and guidance for all things entomological. Also, thanks Francois Lauze and the entire Phylorama team for their input the project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Concerns
Models similar to those described, if applied to images of faces, could be used to generate family trees for humans. This could result in public images being used to infer familial relationships which could have a negative societal impact. The authors strongly discourage this form of misuse of the proposed methods.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hunt, R., Pedersen, K.S. (2023). Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-26348-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)