A bioinformatics approach to 2D shape classification☆
Introduction
Research in Computational Biology and Bioinformatics experienced an unprecedented growth in the last years, mainly due to the fruitful interaction with many disciplines and fields of computer science. Among others, Pattern Recognition/Machine Learning techniques have been successfully exploited in this context [1], for many different reasons: it is possible to “learn from examples”, derive quantitative models, handle non vectorial data, and deal with many classification, clustering and detection problems commonly encountered in life sciences. In many cases the particular Pattern Recognition model has not been applied “as is”, but has been adapted and modified to take into account biological constraints and needs. Sometimes, this produced approaches that are very different from original methodology – a clear example is the profile-HMMs [2].
To some extent, it can be stated that this tight interaction has been mainly unidirectional, with biology/life science gaining the largest benefit1. In this paper, we explore an alternative direction, trying to answer the following question: can we reverse the typical direction of interaction between Pattern Recognition and Bioinformatics? Or, in other words, can we exploit advanced bioinformatics models and solutions to solve pattern recognition tasks?.
To the best of our knowledge, this perspective is rather new in the literature – the only relevant example is the video-genome project2 [4] – and it seems a promising direction for two different reasons. First, if we are able to encode the Pattern Recognition problem in biological terms then we can exploit the huge range of effective, optimized, and interpretable bioinformatics tools developed by more than 40 years of research. These tools heavily rely on the solution of general pattern recognition tasks such as matching, classification, retrieval, clustering, distance computation and so on. For example, in the video-genome project [4], authors established an analogy between biological sequences and videos, defining the so called “video-DNA”, a way to map features extracted from video frames into nucleotidic biological sequences. Having encoded the problem in biological terms, authors were then able to address the video retrieval task by using the famous BLAST [5] – an extremely fast and effective heuristic-driven algorithm for biological sequence retrieval. Second, and more important, the main goal in bioinformatics research is to derive knowledge from biological data: therefore, the interpretability of methods and solutions is a key feature, and many visualization, inspection and interpretation tools are available in the literature. These tools may be very useful also in the Pattern recognition scenarios, to better understand the different aspects of the data for a given problem: actually, in recent years interpretability has become a stringent need in Pattern Recognition [6].
This paper makes another step in this direction, providing some further evidence on the effectiveness and interpretability of bioinformatics approaches for Pattern Recognition problems. In particular, in this paper, we propose and discuss a bioinformatics approach to 2D shape classification. Analysis of 2D shapes represents an important and vibrant research area (often paving the way for 3D object classification). Many approaches appeared in the literature (see for example the reviews [7], [8]): very often, the 2D shape is encoded by the contour, which proved to be an effective and natural choice in many applications. Here we propose some methods to encode the shape contour as a biological sequence, employing tailored bioinformatics tools to perform classification. In the huge literature related to 2D shape analysis, many approaches exploit sequence alignments tools to perform shape matching ([9], [10], [11], [12], [13], just to cite a few) – some sequence matching-based approaches which start from shape-skeletons have also been proposed [14], [15], [16]. Focusing on our main target, i.e. to use biological sequence alignment tools, it should be noted that few approaches exist that employ techniques developed for biological sequences to perform shape classification or matching [17], [18]. Nevertheless, these approaches propose a very different perspective with respect to our approach (and the video genome project), where the main goal is to encode the PR problem in biological terms, hence exploiting tools developed for biological sequence analysis. In other words, to exploit Bioinformatics tools for Pattern Recognition, one can consider two main steps: (i) encoding the PR problem in biological terms; (ii) applying bioinformatics tools to solve the problem. From this point of view, the approaches in [17], [18] are rather poor, employing one particular technique for one particular purpose, and not considering a biological encoding which would allow the use of a wide class of algorithms for sequence analysis.
In this paper we do explicitly consider this aspect: first, we establish an analogy between 2D shapes and biological sequences, this motivating the employment of bioinformatics tools. Then we propose three ways for transforming a silhouette, encoded with the 8-directional chain code [19], into an aminoacidic sequence; given that, we can compute the similarity between shapes by using established biological sequence alignment tools. Such similarity is then exploited for classification in a K-nearest-neighbor setting. Finally, we show that other biological tools and concepts (such as multiple sequence alignment, conserved domains and locality and quality of alignment) can be used for a deeper analysis of the results. We performed different experiments with five standard shape datasets; on one hand, we show that classification results are very competitive with the state-of-the art. On the other hand, we show that poor results we obtained on a retrieval case can be analysed in a deeper way by exploiting other biological sequence mining tools.
Section snippets
Background
This section briefly summarizes the bioinformatics tools exploited in our analysis. First, we present a preliminary overview of biological sequence alignment, so to clarify notations and terminology. Then, we present the tools employed for pairwise sequence alignment and multiple sequence alignment, trying to highlight specific aspects which are useful for our task.
The proposed approach
In this section we present our approach: in particular, we first link 2D shapes and biological sequences, which may motivate the employment of bioinformatics tools in this context. Then we introduce the three methods used to encode shapes into biological sequences; finally, we detail how to transform alignments into a classification scheme.
Classification results and discussion
In this section we evaluate the proposed framework in the context of shape classification. In particular, we first describe the datasets we used and the corresponding evaluation protocols; then we provide some details on the parameters of the proposed framework; finally we present and discuss our classification results, putting them in perspective with respect to the state of the art.
Deeper analysis
In this part we provide an example of how it is possible to exploit the huge amount of bioinformatics tools to have a deeper understanding of the results. To do that, we evaluated our framework in a slightly different task (the retrieval task), trying to exploit bioinformatics tools and concepts to better understand results that were not satisfactory. Even if related, the retrieval task is slightly different from classification: given a testing object, the goal is to retrieve as many shapes as
Conclusions
In this paper we explored the possibility of exploiting bioinformatics concepts, tools and solutions to address the 2D shape classification problem. In our framework, the contour of a 2D shape is encoded using the chain code, and then transformed into biological sequences through three encoding strategies. We then employ biological sequence alignment tools to compute a similarity measure between sequences/shapes, and we use a KNN classification approach. We also proposed some tailoring of the
Acknowledgments
Authors would like to thank Nebojsa Jojic and Alessandro Farinelli for helpful discussions and suggestions. Authors are also grateful to the anonymous reviewers for their precious comments.
References (70)
- et al.
Basic local alignment search tool
J. Mol. Biol.
(1990) A survey of shape analysis techniques
Pattern Recogn.
(1998)- et al.
Review of shape representation and description techniques
Pattern Recogn.
(2004) - et al.
Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching
Pattern Recogn.
(2005) - et al.
A skeletal measure of 2d shape similarity
Comput. Vis. Image Underst.
(2004) - et al.
A general method applicable to the search for similarities in the amino acid sequence of two proteins
J. Mol. Biol.
(1970) - et al.
Identification of common molecular subsequences
J. Mol. Biol.
(1981) - et al.
Bag of contour fragments for robust shape classification
Pattern Recogn.
(2014) - et al.
A novel contour descriptor for 2d shape matching and its application to image retrieval
Image Vis. Comput.
(2011) - et al.
Edit distance-based kernel functions for structural pattern classification
Pattern Recogn.
(2006)
Component-based discriminative classification for hidden markov models
Pattern Recogn.
Classification of silhouettes using contour fragments
Comput. Vis. Image Underst.
Shape recognition based on kernel-edit distance
Comput. Vis. Image Underst.
Bioinformatics: The Mmachine Learning Approach
Profile hidden markov models
Bioinformatics
Biclustering algorithms for biological data analysis: a survey
IEEE/ACM Trans. Comput. Biol. Bioinf.
Reading tea leaves: How humans interpret topic models
NIPS
Shape classification using the inner-distance
IEEE Trans. Pattern Anal. Mach. Intell
Shape matching and classification using height functions
Pattern Recogn. Lett.
Robust symbolic representation for shape recognition and retrieval
Pattern Recogn.
Hierarchical matching of deformable shapes
Proceedings of the International Conference on Computer Vision and Pattern Recognition
Discovering shape classes using tree edit-distance and pairwise clustering
Int. J. Comput. Vis.
Path similarity skeleton graph matching
IEEE Trans. Pattern Anal. Mach. Intell.
Efficient partial shape matching using smith-waterman algorithm
CVPR workshop on Non-Rigid Shape Analysis and Deformable Image Alignment
A profile hidden markov model framework for modeling and analysis of shape
Proceedings of the International Conference on Image Processing
Digital Image Processing
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
A model of evolutionary change in proteins
Atlas of Protein Sequence and Structure
Amino acid substitution matrices from protein blocks
Proc. Natl. Acad. Sci.
Bioinformatics and Functional Genomics
Clustal w and clustal x version 2.0
Bioinformatics
3dcoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment
Nucl. Acids Res.
Confind: a robust tool for conserved sequence identification
Bioinformatics
2d shape recognition using information theoretic kernels
Proceedings of the International Conference on Pattern Recognition
Cited by (29)
Multi-level contour combination features for shape recognition
2023, Computer Vision and Image UnderstandingAn enhanced and interpretable feature representation approach to support shape classification from binary images
2021, Pattern Recognition LettersCitation Excerpt :Finally, Section 5 presents the conclusions and future work. For method comparison, we consider the following skeleton and contour-like benchmarks: multiresolution edit distance (MED) [8], kernel-edit distance (KeD) [9], multiscale distance matrix (MDM) [12], inner distance shape context and morphological strategies (IDSC+MS) [13], shape vocabulary (SV) [2], BoCF [26], line segment statistics (LSS) [14], BoCF and bag of skeleton paths (BoCF+BoSP) [24], bioinformatics (Bio) [3], contextual BOW model (ConBOW) [20], bag of skeleton-associated contour parts (BoSCP) [23], BoCF, BoSCP and its learning pooling function variants (BoCF-LP and BoSCP-LP) [22], distance transform network (DTN) [21], RNN [15], curvature bag of words (CBoW) [29], and enlacement and interlacement shape descriptor (EID) [6]. For the sake of clarity, we present the EIFR results concerning the enhanced spatial BI relevance from BoCF features.
Bag of Shape Features with a learned pooling function for shape recognition
2018, Pattern Recognition LettersVide-omics: A genomics-inspired paradigm for video analysis
2018, Computer Vision and Image UnderstandingCitation Excerpt :Despite encouraging performance (Bronstein et al., 2010), there is no evidence that further work was carried on based on that concept. Bicego et al. (2015); Bicego and Lovato (2012); 2016); Lovato et al. (2014) from the University of Verona have proposed encoding 2D and then 3D shapes as a biological sequence so that actual bioinformatics comparison tools could be used for shape recognition and classification. Their very competitive results have validated their approach.
Improved shape matching and retrieval using robust histograms of spatially distributed points and angular radial transform
2017, OptikCitation Excerpt :WLD [17] is based on Weber’s law, which states that the change of a stimulus (such as sound, lighting) that will be just noticeable is a constant ratio of the original stimulus. WLD performs better on texture images, other recent descriptors include robust histogram based descriptor [18], bioinformatics based approach [19], image to class similarity [20], adaptive local binary patterns [21]. On the other hand, region based descriptors include: moment invariants (MI) [24], angular radial transform (ART) [25], grid descriptor [26], generic Fourier descriptor [27], Zernike moment descriptor [28], etc.
- ☆
This paper has been recommended for acceptance by Sven Dickinson.