Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research

Hunt, Roberta; Pedersen, Kim Steenstrup

doi:10.1007/978-3-031-26348-4_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13845))

Included in the following conference series:

Asian Conference on Computer Vision

439 Accesses

Abstract

We present a new dataset of images of pinned insects from museum collections along with a ground truth phylogeny (a graph representing the relative evolutionary distance between species). The images include segmentations, and can be used for clustering and deep hierarchical metric learning. As far as we know, this is the first dataset released specifically for generating phylogenetic trees. We provide several benchmarks for deep metric learning using a selection of state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-Scale Automatic Species Identification

A vectorial tree distance measure

Article Open access 28 March 2022

Learning Topology: Bridging Computational Topology and Machine Learning

Article 01 July 2021

Notes

1.
To genus-level means that each species within a genus is considered unresolved, or equally likely to be related to any other species within that genus.
2.
The taxonomy represents how the organism is classified - ie which class, order, family the organism belongs to, and is a non binary tree. The phylogeny represents how related different species are together, and would ideally be a binary tree. In an ideal world the taxonomy would be a congruent to the phylogeny, but in reality they tend to diverge as taxonomic revisions take longer.
3.
it is, however, different from the edit distance popular in computer science.

References

Bakalar, N.: Nicholas. The New York Times (2014). https://www.nytimes.com/2014/05/27/science/welcoming-the-newly-discovered.html
Bameri, F., Pourreza, H.R., Taherinia, A.H., Aliabadian, M., Mortezapour, H.R., Abdilzadeh, R.: TMTCPT: The tree method based on the taxonomic categorization and the phylogenetic tree for fine-grained categorization. Biosystems 195, 104137 (2020). https://doi.org/10.1016/j.biosystems.2020.104137
Brunke, A., Smetana, A.: A new genus of staphylinina and a review of major lineages (staphylinidae: Staphylininae: Staphylinini). System. Biodiv. 17, 745–758 (2019). https://doi.org/10.1080/14772000.2019.1691082
Article Google Scholar
Chani-Posse, M.R., Brunke, A.J., Chatzimanolis, S., Schillhammer, H., Solodovnikov, A.: Phylogeny of the hyper-diverse rove beetle subtribe philonthina with implications for classification of the tribe staphylinini (coleoptera: Staphylinidae). Cladistics 34(1), 1–40 (2018). https://doi.org/10.1111/cla.12188
Article Google Scholar
Cho, H., Ahn, C., Min Yoo, K., Seol, J., Lee, S.g.: Leveraging class hierarchy in fashion classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (October 2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference On Computer Vision And Pattern Recognitio, pp. 248–255. IEEE (2009)
Google Scholar
Deng, J., Guo, J., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. CoRR abs/ arXiv: 1801.07698 (2018)
DiSSCo: Distributed system of scientific collections (July 2022). https://www.dissco.eu/
Felsenstein, J.: Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)
Article Google Scholar
Felsenstein, J.: Statistical inference of phylogenies. J. Royal Statis. Soc. Ser. A (General) 146(3), 246–262 (1983). https://doi.org/10.2307/2981654
Article MATH Google Scholar
Felsenstein, J.: Inferring phylogenies. Sinauer associates, Sunderland, MA (2003)
Google Scholar
Fink, M., Ullman, S.: From aardvark to zorro: A benchmark for mammal image classification. Int. J. Comput. Vision 77(1–3), 143–156 (2008). https://doi.org/10.1007/s11263-007-0066-8
Article Google Scholar
Goëau, H., Bonnet, P., Joly, A.: Overview of plantclef 2021: cross-domain plant identification. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, vol. 2936, pp. 1422–1436 (2021)
Google Scholar
Gutschenreiter, D., Bech, S.: Deep-learning methods on taxonomic beetle data Automated segmentation and classification of beetles on genus and species level. Master’s thesis, University of Copenhagen (2021)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). vol. 2, pp. 1735–1742 (2006). https://doi.org/10.1109/CVPR.2006.100
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Hedrick, B.P., et al.: Digitization and the future of natural history collections. Bioscience 70(3), 243–251 (2020). https://doi.org/10.1093/biosci/biz163
Article Google Scholar
Hudson, L.N., et al.: Inselect: Automating the digitization of natural history collections. PLoS ONE 10(11), 1–15 (2015). https://doi.org/10.1371/journal.pone.0143402
Article Google Scholar
Höhna, L., Heath, B., Lartillot, M., Huelsenbeck, R.: Revbayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65, 726–736 (2016)
Article Google Scholar
iDigBio: Integrated digitized biocollections (July 2022). https://www.idigbio.org/
J. Peraire, S.W.: Lecture notes from mit course 16.07 dynamics, fall 2008. l26–3d rigid body dynamics: The inertia tensor (2008). https://ocw.mit.edu/courses/16-07-dynamics-fall-2009/dd277ec654440f4c2b5b07d6c286c3fd_MIT16_07F09_Lec26.pdf
Kaya, M., B$\dot{i}$lge, H.S.: Deep metric learning: A survey. Symmetry 11(9) (2019). https://doi.org/10.3390/sym11091066, https://www.mdpi.com/2073-8994/11/9/1066
Kiel, S.: Assessing bivalve phylogeny using deep learning and computer vision approaches. bioRxiv (2021). https://doi.org/10.1101/2021.04.08.438943, https://www.biorxiv.org/content/early/2021/04/09/2021.04.08.438943
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Kuhner, M.K., Yamato, J.: Practical performance of tree comparison metrics. System. Biol. 64(2), 205–214 (2014). https://doi.org/10.1093/sysbio/syu085
Article Google Scholar
Lee, M., Palci, A.: Morphological phylogenetics in the genomic age. Current Biol. 25(19), R922–R929 (2015). https://doi.org/10.1016/j.cub.2015.07.009, https://www.sciencedirect.com/science/article/pii/S096098221500812X
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
MacQueen, J.: Classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 360–368 (2017). https://doi.org/10.1109/ICCV.2017.47
Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41
Chapter Google Scholar
Natural History Museum of Denmark: Digital nature: Giant grant makes the natural history collections of denmark accessible to everyone (2021). Newsletter
Natural History Museum of Denmark: (2022). Entomology - Dry and Wet Collections. Homepage
Nye, T., Lio, P., Gilks, W.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics (Oxford, England) 22, 117–119 (2006). https://doi.org/10.1093/bioinformatics/bti720
Article Google Scholar
Orlov, I., Leschen, R.A., Żyła, D., Solodovnikov, A.: Total-evidence backbone phylogeny of aleocharinae (coleoptera: Staphylinidae). Cladistics 37(4), 343–374 (2021). https://doi.org/10.1111/cla.12444
Article Google Scholar
Parins-Fukuchi, C.: Use of continuous traits can improve morphological phylogenetics. System. Biol. 67(2), 328–339 (2017). https://doi.org/10.1093/sysbio/syx072
Article Google Scholar
Popov, D., Roychoudhury, P., Hardy, H., Livermore, L., Norris, K.: The value of digitising natural history collections. Res. Ideas Outcomes 7, e78844 (2021). https://doi.org/10.3897/rio.7.e78844
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2, https://www.sciencedirect.com/science/article/pii/0025556481900432
Roth, K., Milbich, T., Sinha, S., Gupta, P., Ommer, B., Cohen, J.P.: Revisiting training strategies and generalization performance in deep metric learning (2020)
Google Scholar
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (09–15 June 2019), https://proceedings.mlr.press/v97/tan19a.html
Van Horn, G., Cole, E., Beery, S., Wilber, K., Belongie, S., MacAodha, O.: Benchmarking representation learning for natural world image collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12884–12893 (2021)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5017–5025 (2019). https://doi.org/10.1109/CVPR.2019.00516
Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2859–2867 (2017). https://doi.org/10.1109/ICCV.2017.309
Wu, X., Zhan, C., Lai, Y., Cheng, M.M., Yang, J.: Ip102: A large-scale benchmark dataset for insect pest recognition. In: IEEE CVPR, pp. 8787–8796 (2019)
Google Scholar
Yuan, Y., Chen, W., Yang, Y., Wang, Z.: In defense of the triplet loss again: Learning robust person re-identification with fast approximated triplet loss and label distillation (2019)
Google Scholar
Żyła, D., Bogri, A., Hansen, A.K., Jenkins Shaw, J., Kypke, J., Solodovnikov, A.: A New Termitophilous Genus of Paederinae Rove Beetles (Coleoptera, Staphylinidae) from the Neotropics and Its Phylogenetic Position. Neotrop. Entomol. 51(2), 282–291 (2022). https://doi.org/10.1007/s13744-022-00946-x
Article Google Scholar
Żyła, D., Solodovnikov, A.: Multilocus phylogeny defines a new classification of staphylininae (coleoptera, staphylinidae), a rove beetle group with high lineage diversity. Syst. Entomol. 45(1), 114–127 (2020). https://doi.org/10.1111/syen.12382
Article Google Scholar

Download references

Acknowledgements

Many people have been involved in this project. First we would like to thank David Gutschenreiter, Søren Bech and André Fastrup who took the photos of the unit trays and completed the initial segmentations of the images as part of their theses. Next, we would like to thank Alexey Solodovnikov of the Natural History Museum of Denmark for providing the specimens, the ground truth phylogeny and guidance for all things entomological. Also, thanks Francois Lauze and the entire Phylorama team for their input the project.

Author information

Authors and Affiliations

Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, Copenhagen, Denmark
Roberta Hunt & Kim Steenstrup Pedersen
Natural History Museum of Denmark, Øster Voldgade 5 - 7, 1350, Copenhagen, Denmark
Kim Steenstrup Pedersen

Authors

Roberta Hunt
View author publications
You can also search for this author in PubMed Google Scholar
Kim Steenstrup Pedersen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberta Hunt .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Ethics declarations

Ethical Concerns

Models similar to those described, if applied to images of faces, could be used to generate family trees for humans. This could result in public images being used to infer familial relationships which could have a negative societal impact. The authors strongly discourage this form of misuse of the proposed methods.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 715 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hunt, R., Pedersen, K.S. (2023). Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-26348-4_25
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Automatic Species Identification

A vectorial tree distance measure

Learning Topology: Bridging Computational Topology and Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Concerns

1 Electronic supplementary material

Supplementary material 1 (pdf 715 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Automatic Species Identification

A vectorial tree distance measure

Learning Topology: Bridging Computational Topology and Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Concerns

1 Electronic supplementary material

Supplementary material 1 (pdf 715 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation