Abstract
We propose a system able to synthesize automatically a classification model and a set of interpretable decision rules defined over a set of symbols, corresponding to frequent substructures of the input dataset. Given a preprocessing procedure which maps every input element into a fully labeled graph, the system solves the classification problem in the graph domain. The extracted rules are then able to characterize semantically the classes of the problem at hand. The structured data that we consider in this paper are images coming from classification datasets: they represent an effective proving ground for studying the ability of the system to extract interpretable classification rules. For this particular input domain, the preprocessing procedure is based on a flexible segmentation algorithm whose behavior is defined by a set of parameters. The core inference engine uses a parametric graph edit dissimilarity measure. A genetic algorithm is in charge of selecting suitable values for the parameters, in order to synthesize a classification model based on interpretable rules which maximize the generalization capability of the model. Decision rules are defined over a set of information granules in the graph domain, identified by a frequent substructures miner. We compare the system with two other state-of-the-art graph classifiers, evidencing both its main strengths and limits.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.
Alves R, Rodriguez-Baena DS, Aguilar-Ruiz JS. Gene association analysis: a survey of frequent pattern mining from gene expression data. Brief Bioinform. 2010;11(2):210–24.
Antonini M, Barlaud M, Mathieu P, Daubechies I. Image coding using wavelet transform. IEEE Trans Image Process. 1992;1(2):205–20.
Bargiela A, Pedrycz W. Granular computing: an introduction. Springer Science & Business Media; 2012.
Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798–828.
Bianchi FM, Livi L, Rizzi A. Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Anal Appl. 2015. doi:10.1007/s10044-014-0440-4.
Bianchi FM, Maiorino E, Livi L, Rizzi A, Sadeghian A. An agent-based algorithm exploiting multiple local dissimilarities for clusters mining and knowledge discovery. Soft Comput. 2015. doi:10.1007/s00500-015-1876-1.
Bianchi FM, Scardapane S, Livi L, Uncini A, Rizzi A. An interpretable graph-based image classifier. In: 2014 International Joint Conference on Neural Networks (IJCNN), p. 2339–2346. IEEE (2014).
Bianchi FM, Livi L, Rizzi A, Sadeghian A. A granular computing approach to the design of optimized graph classification systems. Soft Comput. 2014;18(2):393–412. doi:10.1007/s00500-013-1065-z.
Borgelt C. Canonical forms for frequent graph mining. In: Advances in data analysis. Studies in classification, data analysis, and knowledge organization. Berlin Heidelberg: Springer; 2007. p. 337–349. doi:10.1007/978-3-540-70981-7_38.
Borgwardt KM, Ong CS, Schönauer S, Vishwanathan SVN, Smola AJ, Kriegel HP. Protein function prediction via graph kernels. Bioinformatics. 2005;21:47–56.
Boussaïd I, Lepagnot J, Siarry P. A survey on optimization metaheuristics. Inf Sci. 2013;237:82–117.
Cover T, Hart P. Nearest neighbor pattern classification. Inf Theory IEEE Trans. 1967;13(1):21–7.
Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A. On the problem of modeling structured data with the MinSOD representative. Int J Comput Theory Eng. 2014;6(1):9–14.
Del Vescovo G, Rizzi A. Automatic Classification of Graphs by Symbolic Histograms. In: Granular Computing, 2007. GRC 2007. IEEE International Conference on, p. 410–410.
Del Vescovo G, Rizzi A. Online Handwriting Recognition by the Symbolic Histograms Approach. In: Proceedings of the 2007 IEEE International Conference on Granular Computing., GRC ’07, p. 686–700. IEEE Computer Society, Washington, DC (2007).
Eichinger F, Bohm K. Software-bug localization with graph mining. In: Managing and mining graph data. Springer; 2010. vol. 40, p. 515–546. doi:10.1007/978-1-4419-6045-0_17.
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.
Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: current status and future directions. Data Min Knowl Discov. 2007;15(1):55–86.
Han D, Hu Y, Ai S, Wang G. Uncertain graph classification based on extreme learning machine. Cognitive Comput. 2015;7(3):346–58.
Hanbury A. A survey of methods for image annotation. J Vis Lang Comput. 2008;19(5):617–27.
Huan J, Wang W, Prins J. Efficient mining of frequent subgraphs in the presence of isomorphism. In: 2003 Third IEEE International Conference on Data Mining (ICDM’03), p. 549–552. IEEE (2003).
Ketkar NS, Holder LB, Cook DJ. Mining in the Proximity of Subgraphs. In: ACM KDD Workshop on Link Analysis: Dynamics and Statics of Large Networks (2006).
Lange J, von der Malsburg C, et al. Distortion invariant object recognition by matching hierarchically labeled graphs. In: 1989 International Joint Conference on Neural Networks (IJCNN’89), p. 155–159. IEEE (1989).
Li LJ, Su H, Fei-Fei L, Xing EP. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, editors. Advances in neural information processing systems 23. Curran Associates, Inc., 2010. p. 1378–86.
Livi L, Del Vescovo G, Rizzi A. Combining graph seriation and substructures mining for graph recognition. In: Pattern recognition - applications and methods. Advances in intelligent systems and computing. Berlin Heidelberg: Springer; 2013. vol. 204, p. 79–91. doi:10.1007/978-3-642-36530-0_7.
Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM. Building Pattern Recognition Applications with the SPARE Library. ArXiv preprint arXiv:1410.5263 (2014).
Livi L, Rizzi A. The graph matching problem. Pattern Anal Appl. 2013;16(3):253–83. doi:10.1007/s10044-012-0284-8.
Lu D, Weng Q. A survey of image classification methods and techniques for improving classification performance. Int J Remote Sens. 2007;28(5):823–70.
Mukundan R, Ramakrishnan KR. Moment functions in image analysis: theory and applications. Singapore: World Scientific; 1998.
Neuhaus M, Bunke H. Bridging the gap between graph edit distance and kernel machines. Series in machine perception and artificial intelligence. London: World Scientific; 2007.
Nijssen S, Kok JN. A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, p. 647–652. ACM (2004).
Pavlidis T. Representation of figures by labeled graphs. Pattern Recognit. 1972;4(1):5–17.
Rizzi A, Panella M, Frattale Mascioli F. Adaptive resolution min-max classifiers. Neural Netw IEEE Trans. 2002;13(2):402–14.
Rizzi A, Del Vescovo G. A symbolic approach to the solution of F-classification problems. In: 2005 Proceedings of the IEEE International Joint Conference on Neural Networks, 2005, vol. 3, p. 1953–1958. IEEE (2005).
Rizzi A, Del Vescovo G. Automatic Image Classification by a Granular Computing Approach. In: Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, p. 33–38. IEEE (2006).
Roerdink JB, Meijster A. The watershed transform: definitions, algorithms and parallelization strategies. Fundam Inform. 2000;41(1):187–228.
Scardapane S, Wang D, Panella M, Uncini A. Distributed learning for random vector functional-link networks. Inf Sci. 2015;301(0):271–84.
SPImR2: A set of 24 Instances of Synthetic and Photographic Image Classification problems. 2014. http://infocom.uniroma1.it/~rizzi/index.htm.
Theodoridis S, Koutroumbas K. Pattern recognition. Elsevier: Academic Press; 2006.
Tun K, Dhar P, Palumbo M, Giuliani A. Metabolic pathways variability and sequence/networks comparisons. BMC Bioinform. 2006;7(1):24.
Wang JZ, Li J, Wiederhold G. SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell. 2001;23(9):947–63.
Weng CH. Mining fuzzy specific rare itemsets for education data. Knowl-Based Syst. 2011;24(5):697–708.
Wiskott L, Fellous JM, Kuiger N, Von Der Malsburg C. Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):775–9.
Yan X, Han J. gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining (ICDM’02), p. 721–724. IEEE (2002).
Yun U, Ryu KH. Approximate weighted frequent pattern mining with/without noisy environments. Knowl-Based Syst. 2011;24(1):73–82.
Zhang J, Zhan ZH, Lin Y, Chen N, Gong YJ, Zhong JH, Chung HS, Li Y, Shi YH. Evolutionary computation meets machine learning: a survey. IEEE Comput Intell Mag. 2011;6(4):68–75.
Zhang S, He B, Nian R, Wang J, Han B, Lendasse A, Yuan G. Fast image recognition based on independent component analysis and extreme learning machine. Cogn Comput. 2014;6(3):405–22.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Filippo Maria Bianchi, Simone Scardapane, Antonello Rizzi, Aurelio Uncini, and Alireza Sadeghian declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Appendices
Appendix 1: The Image Segmentation Procedure
In this appendix we describe an implementation of the water-shed image segmentation procedure.
In the beginning, the image is scanned with a sampling step of z pixels; for each position a vector representing the signature of the pixel is extracted. The feature vector \(\mathbf s\), called pixel signature, is composed by the following sections:
The components CrCb (which are organized in a real-valued vector of \(\mathbb {R}^2\)) and Brg contains chromatic and brightness information about the pixel. Cr and Cb are the two chrominance values in the YCrCb color space, while Brg is the brightness Y value from the same color space. The wavelet vector Wlet, which contains the components \(W_{HL}\), \(W_{LH}\), and \(W_{HH}\), carries information about the texture characterizing the neighborhood of the pixel. In order to calculate these values, a 2D version of the well-known Daubechies wavelet transformation [3] is applied to the Brg channel of the image, using a \(B\times B\) window (where B is usually chosen to be 4 or 8) around the pixel of interest. The squares of the values contained in the HL, LH, and HH sections of the resulting matrix (see Fig. 17) are averaged, yielding, respectively, the \(W_{HL}\), \(W_{LH}\), and \(W_{HH}\) coefficients [42].
As a first segmentation step, a single channel image is obtained calculating the following value for each pixel at position (i, j):
where \(\mathbf s (i,j)\) is the signature of the pixel in the i-th row and j-th column. Signatures of pixels are compared using the following weighted Manhattan distance:
which depends on 3 weight parameters \(\nu _0\), \(\nu _1\) and \(\nu _2\), defined as follows:
where \(\omega _{\text {Brg}}\in \left[ 0,1\right]\) and \(\omega _{\text {bin}} \in \left[ 0,1\right]\) are two parameters of the segmentation procedure.
Note that the entry L(i, j) represents a measure of how much the pixel at position (i, j) is similar (on average) to its closest neighbors, according to the definition of pixel signature and the dissimilarity measure defined in the signature domain. After the computation of the L channel, an affine normalization into the range \(\left[ 0,1\right]\) is performed. Successively, the following steps are executed in order to decompose the image into connected regions (or segments). Note that we are always referring to path-connection, meaning that between two points of a region must exist at least one path joining them, formed exclusively by points (pixels in our case) belonging to the region itself.
-
1.
A discrete version of the L map is obtained by a two-value (1 bit) quantization with the threshold given by the parameter \(\tau _{\text {bin}}\).
-
2.
After the step 1 the image is subdivided in two generally non-connected parts: the “stable” one, corresponding to \(L(i,j)<\tau _{\text {bin}}\), and the “unstable” one, corresponding to \(L(i, j)\ge \tau _{\text {bin}}\). Each one of the two parts is then decomposed into its connected components. Each obtained component is called a region and marked as “stable” if originating from the stable part, “unstable” otherwise.
-
3.
In order to reduce the total area of unstable regions, their pixels are absorbed by their adjacent stable regions. The pixel is absorbed if its distance from the centroid of the nearest adjacent stable region is less than an absorption threshold \(\tau _{\text {abs}}\). The remaining parts of the unstable regions are now marked as stable regions, so that after this step there are no more unstable regions. As it can be observed there is no guarantee that the newly obtained stable regions are connected, but this is not an issue at this stage.
-
4.
In order to merge very similar stable regions, the BSAS clustering algorithm [26, 40] is used. The clustering threshold used in this step is the segmentation parameter \(\tau _{\text {fus}}\). The distance between two regions is calculated as the distance between their centroids. In general the regions obtained after the steps 3 and 4 are non-connected, so the routine of decomposition into connected components is applied again to each region. The regions obtained at the end of this stage are certainly connected.
In both steps 3 and 4, the following weights are used in the signature dissimilarity measure (see Eq. (28)):
where \(\omega _{\text {W}} \in [0,1]\) is an additional weighting parameters which replaces \(\omega _{\text {bin}}\), introduced to enhance the flexibility of the algorithm.
Thus the set of the segmentation parameters \(\mathcal {H}\) is the following: \(\mathcal {H} = \{ \tau _{\text {bin}}, \tau _{\text {abs}}, \tau _{\text {fus}}; \omega _{\text {Brg}}, \omega _{\text {bin}}, \omega _{\text {W}} \}\).
Appendix 2: Features Extraction from Image Segments
In this appendix, we describe how to compute the 12 features used for describing a segment (node label feature space) and the 3 features used for representing the mutual spatial displacement in the image between a pair of segments (edge label feature space).
With reference to notation described in Eq. (1), let V be the generic segment in the image. The features Xc and Yc are, respectively, the horizontal and vertical position of the center of mass \(\mathbf C\) of the segment. The components CrCb and Wlet are the average values over V of the corresponding quantities for the pixel signature used in the segmentation process. The additional color feature, Sat, is calculated from CrCb and Brg and it represents the saturation value from the well-known HSB color space.
The Area, Symm (symmetry), Rnd (roundness), and Cmp (compactness) values carry information about the size and shape of the segment; they are designed to be invariant to scaling and rotation. The “area” feature is given by: \(\texttt {Area}=P/P_{tot}\), where P is the number of pixels of the segment V and \(P_{tot}\) is the total number of pixels in the image.
Let \(\mathbf p =(p_i, p_j)\) be a generic pixel of V described by a \(\mathbb {R}^2\) vector, being \(p_i, p_j\) the components that describe its coordinates in the image, while \(\mathbf C =(\texttt {Xc}, \texttt {Yc})\) is the center of mass of the segment. Moreover, let \(\mathbf p '\) be the position of the symmetric pixel of \(\mathbf p\) with respect to \(\mathbf C\). A measure of the symmetry degree of V with respect to \(\mathbf C\) is computed by the following expression:
Now, let \(\mathbf r\) be the generalized radius of V, computed as follows:
where \(d_E(\mathbf p , \mathbf C )\) is the Euclidean distance between a given pixel p of V and the center of mass C. The equivalent circle \({\varGamma }\) is defined as the circle centered in C and such that its area is equal to the previously defined area feature of V. As it can be easily seen, the radius \(\mathbf r '\) of \({\varGamma }\) is: \(\mathbf{r }' = \sqrt{{\texttt {Area}} / {\pi }}\). The roundness of V can now be defined as follows: \(\texttt {Rnd}=\mathbf r '/\mathbf r\).
Finally, let us introduce the function \(\mathbf t (\mathbf p )\) defined as follows:
We define the compactness of V as:
Figure 18 helps to understand the procedure for computing the shape features of a sample region.
When a segment has a fairly linear shape (elongated shape), it is useful to characterize its direction and orientation. Since the previous three features do not hold this information, we introduce in the following four additional measurements. In order to calculate these values, a Principal Component Analysis (PCA) is evaluated from the \(P\times 2\) pixel matrix \(\mathbf {M}\):
As a result of the PCA analysis, we get the \(2\times 2\) unitary rotation matrix R holding the principal direction of the pixel cluster, as well as the standard deviations of the two components in the rotated space. Let us define a local reference system with the origin coincident with C and axes given by the two eigenvectors \(\mathbf x '\) and \(\mathbf y '\) in R (where \(\mathbf x '\) is the principal direction); we force the orientation of \(\mathbf x '\) such that it lies in the upper half-plane. Let \(x'\) and \(y'\) be the coordinates in the reference system \(<\mathbf C , \mathbf x ',\mathbf y '>\) and \(\sigma _{x'}\), \(\sigma _{y'}\) the respective standard deviations. The angle between \(\mathbf x '\) and the x axis of the main reference system is the direction of V. The feature Dir is a complex number defined as:
where its magnitude \(S_{DN}\) represents the significance (i.e., reliability) of the Dir value, and it is defined as follows:
The \(S_{DN}\) value is a measure of eccentricity with suitable normalization properties (it ranges from 0 to 1), and it worked well for our application. The phase, \(D_N\), is the normalized direction and is defined as follows:
where \(\theta\) is the angle between \(\mathbf x\) and \(\mathbf x ^{'}\) (according to a counterclockwise rotation). The value of \(D_N\) ranges from \(0^ \circ\) to \(180^ \circ\), and it is meaningful only in the case of an elongated-shape segment, i.e., when \(S_{DN}\) assumes values close to 1. Dir represents a unique feature, which is invariant with respect to a \(180^ \circ\) rotation (orientation flipping).
In order to introduce an orientation discerning feature, we define a new angular quantity, which is related to the mass distribution of the pixels in V. First, we perform the following affine normalization:
in order to ensure that every pixel coordinate ranges between 0 and 1. Let us now introduce the values:
As it can be easily seen, the absolute value of \(md_\mathbf{x }\) is a measure of the displacement of the normalized center of mass of the segment in the \(\mathbf x '\) direction, with respect to the geometric center point 0.5. On the other hand, the sign of \(md_\mathbf{x }\) is positive when the vector \(\mathbf x\) “points toward the side where most of the mass accumulates” and negative in the opposite case. Let us define \(\mathbf x ^{''}\) as:
where \(\text {sign}(\cdot )\) is the common sign function. Let \(\theta ^{'}\) be the angle among \(\mathbf x\) and \(\mathbf x ^{''}\) (according to a counterclockwise rotation), which allows to introduce the following quantity (the normalized orientation):
We can now introduce the following complex number:
where
is the magnitude of the complex number Or and represents the reliability of the phase value \(O_N\). \(g(\cdot )\) is a nonlinear monotonic function that performs a dynamic correction of the displacement of the normalized center of mass with respect to the geometric center (see Eq, (38)), in order to make it closer to human perception. In particular, we use a function \(g(\cdot )\) which performs a soft-clipping, since a region whose \(md_\mathbf{x }\) value is above a certain threshold must be completely characterized by an orientation (supposing that the value \(S_{DN}\) is sufficiently high).
Figure 19 shows the Dir and Or magnitude of two image segments characterized by a pronounced direction and orientation.
It is worth to highlight the following remarks:
-
1.
The value \(O_{N}\) ranges from \(-180^\circ\) to \(+180^\circ\); it always represents the orientation that “points toward the side where most of the mass accumulates”. So, its definition is coherent for all the possible segments.
-
2.
The value \(S_{ON}\) is significantly greater than zero only when the segment both has an elongated shape and an asymmetric mass distribution with respect to the geometric center \(\mathbf G\), while the value \(S_{DN}\) requires only that the segment is characterized by an elongated shape.
For what concerns the 3 features which describe the spatial displacement between two segments, \(\delta \texttt {Xc}\) and \(\delta \texttt {Yc}\) are the values of the differences, respectively, between the coordinates of the centers of mass of the two segments. The value \(\delta \texttt {B}\) is the minimum Euclidean distance between two pixels belonging to the boundaries of the two segments (boundary distance).
Rights and permissions
About this article
Cite this article
Bianchi, F.M., Scardapane, S., Rizzi, A. et al. Granular Computing Techniques for Classification and Semantic Characterization of Structured Data. Cogn Comput 8, 442–461 (2016). https://doi.org/10.1007/s12559-015-9369-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-015-9369-1