Elsevier

Neurocomputing

Volume 127, 15 March 2014, Pages 88-97
Neurocomputing

Determining structural identifiability of parameter learning machines

https://doi.org/10.1016/j.neucom.2013.08.039Get rights and content

Abstract

This paper reports an extension of our previous study on determining structural identifiability of the generalized constraint (GC) models, which are considered to be parameter learning machines. Identifiability defines a uniqueness property to the model parameters. This property is particularly important for those physically interpretable parameters in GC models. We derive identifiability criteria according to the types of models. First, by taking the models as a family of deterministic nonlinear transformations from input space to output space, we provide a criterion for examining identifiability of the Multiple-input Multiple-output (MIMO) models. This result therefore generalizes the previous one for Single-input Single-output (SISO) and Multiple-input Single-output (MISO) models. Second, if considering the models as the mean functions of input-dependent conditional distributions within stochastic framework, we derive an identifiability criterion by means of the Kullback–Leibler divergence (KLD) and regular summary. Third, time-variant models are studied based on the exhaustive summary method. The new identifiability criterion is valid for a range of differential/difference equation models whenever their exhaustive summaries can be obtained. Several model examples from the literature are presented to examine their identifiability property.

Introduction

Mathematical models have become another sensing channel for human beings to perceive, describe, and understand either natural or virtual worlds deeply. For this reason, more and more models are and will be generated for a vast variety of applications. Their modeling approaches are of course different from varied aspects. For a fast examination of the approach differences, Dubios et al. [1], Solomatine and Ostfeld [2] and Todorovski and Dzeroski [3] considered two basic modeling approaches with respect to the degree of knowledge included, namely, “knowledge-driven” and “data-driven”. The knowledge-driven modeling approach is also called “physical-based” [2] or “mechanistic-based” [3] modeling approach, because the approach relies mainly on the given knowledge in modeling, such as the first principle from physics. In contrary, the data-driven modeling approach is capable of constructing a model solely from the given data without using any prior knowledge. While Todorovski and Dzeroski [3] described the application advantages and drawbacks between the two types of modeling approaches, Hu et al. [4] compared them from the viewpoints of inference methodologies (deduction vs. induction) and parameter meaning involved. Although the data-driven models have parameters for themselves, the models are considered as “non-parametric” because their parameters are generally unable to represent the real ones in a physical (or target) system.

In order to take advantage of each approach, a study of integrating two types of modeling approaches is reported [2], [3], [4], [5], [6]. Hence, “hybrid” models are called when the integration approach is applied to the models [5], [2], [3]. For stressing on a mathematical description, another term, “generalized constraint” (GC) [7], [4], is adopted to call these models. Considering the large diversity and unstructured representations of prior knowledge, one can expect that the “hybridizing” difficulty is appeared more from imposing “knowledge constraints” on the models. Fig. 1 schematically depicts a GC model, which basically consists of two modules, namely, knowledge-driven (KD) submodel and data-driven (DD) submodel. For a detailed description of the GC models, one can refer [4], [8], [9].

Suppose a time-invariant model is considered, a general description of the GC model is given in a form of:y=f(x,θ)=fk(x,θk)fd(x,θd)θ=(θk,θd),θkθd=where xn and ym are the input and output vectors, f is a function for a complete model relation between x and y, fk and fd are the functions associated to the KD and DD submodels, respectively. θk is the parameter vector of the function f, θk and θd are the parameter vectors associated to the functions fk and fd respectively. The symbol “” represents a coupling operation between the two submodels. Generally, the KD submodel contains physically interpretable parameters whose identifiability is of fundamental importance to the understanding of the system. However, owing to the coupling operation between the two submodels, the resulting GC model may have some unidentifiable parameters (i.e., these parameters cannot be determined uniquely) even if the parameters of each submodels are identifiable respectively [4], [8]. Identifiability of parameters will be an important aspect to reflect a transparency degree of models and hence “determining identifiability of the models should be addressed before any implementation of estimation” [8], [10], [11]. Moreover, identifiability is closely related to the convergence of a class of estimates including the maximum likelihood estimate (MLE) [8], [12]. Lack of identifiability gives no guarantee of convergence to the true value of parameters and therefore usually results in severe ill-posed estimation problems [8], which is a critical issue if decisions are to be taken on the basis of their numerical values [13]. Besides the ability to detect deficient models in advance, the analysis of identifiability can also bring practical benefits, such as insightful revealing of the relations among inputs, outputs and parameters, which can be very useful for model structure design and selection [4], [8]. To summarize, the usefulness and importance of identifiability analysis can be recognized in at least threefold:

  • (a)

    Statistical inference. In an unidentifiable statistical model, the standard statistical paradigm of the Cramér–Rao bound (CRB) does not hold, the MLE is no longer subject to Gaussian distribution even asymptotically, the model selection criteria such as AIC, BIC and MDL fail to hold, and the singularity gives rise to strange behaviors in parameter estimation, hypothesis test, Bayesian inference, model selection, etc. [14], [15]. Therefore, it is imperative to check identifiability for statistical inference.

  • (b)

    Physically interpretable (sub-)models. In these models, some or all parameters have physically interpretable meaning [4], [13], [16], and to identify the true values of such parameters is important because nonuniqueness of such parameters not only means nonunique description of the process but also leads to completely erroneous or misleading results. One would not select an unidentifiable model since the parameters are of practical importance. Hence, identifiability analysis should be addressed, as part of qualitative experiment design, before any experimental data have been collected [8].

  • (c)

    Learning dynamics. In an unidentifiable parametric model, the trajectories of dynamics of learning are strongly affected by the nonidentifiability [14]. It has been shown that once parameters are attracted to singular points, the learning trajectory is very slow to move away from them. For example, [14] studied the dynamical behaviors of learning in multi-layer perceptions (MLP) and Gaussian mixture models (GMM), and showed that nonidentifiability resulting in plateaus and slow manifolds.

The structural identifiability is concerned with the uniqueness of the parameters determined from the input–output data. A property is said to be “structural” if it is true for all admissible parameter values [8]. In [4], [8], the authors derived identifiability results for Single-input Single-output (SISO) and Multiple-input Single-output (MISO) models. However, their theorems cannot deal with Multiple-input Multiple-output (MIMO) models. Therefore, this work is an extension of [4], [8] and we further expect to consider the problem from a wide spectrum of models. In this study, we view a model to be a “parameter learning machine” if it can be parameterized by a finite-dimensional vector (Fig. 2). A special emphasis is put on identifiability of arbitrary nonlinear functions for parameter learning machines. The main contribution of the present work is given from the following three aspects:

  • (1)

    From a partial derivative matrix (PDM), we derive a new identifiability criterion for deterministic nonlinear functions, which is applicable to MIMO models.

  • (2)

    Based on the Kullback–Leibler divergence (KLD) and regular summary, we present a new identifiability theorem for stochastic models which can be applied to more generic statistical models without restricting to exponential family [17].

  • (3)

    For the time-variant models, we adopt an exhaustive summary method which is valid for a wide range of differential/difference equation models whenever their exhaustive summaries can be obtained.

The remainder of this paper is organized as follows. Section 2 gives some basic definitions and views the identifiability problem from two different perspectives. Section 3 presents an identifiability criterion for deterministic MIMO models. In Section 4, we present an identifiability result for stochastic models with the help of KLD and regular summary. Section 5 gives a method for testing parameter redundancy by using exhaustive summary. Section 6 concludes with a brief summary.

Section snippets

Models and definitions

Typically, the approaches of examining structural identifiability of parameter learning machines can be categorized into two frameworks according to the modeling nature:

  • (1)

    Deterministic framework. In this framework, it is assumed that the model is deterministic and noise-free [8], [13], [16]. In other words, the model is viewed as a family of parameterized nonlinear mappings from an input vector xn to an output vector ym,y=f(x,θ),where θΘ is a parameter vector indexing a specific mapping(θ):θ

Identifiability criterion for deterministic models

In the deterministic framework, a model is identifiable if there exists a unique input–output behavior for each admissible parameter [8], [16]. A nonlinear model that attempts to accurately describe the underlying phenomena may be complex with too many parameters. For example, a pair of parameters may always appear together as a product (or a sum) in the model equations, making it impossible to obtain unique estimate of both parameters. An open problem in nonlinear regression is to determine

Identifiability criterion for stochastic models

Identifiability is a primary assumption in all classical statistical models [15], [20], [33]. However, such an assumption may be violated in a large variety of models. Unidentifiable families of probability distributions occur in many statistical modeling fields. In particular, in the study of machine learning, almost all learning machines used in information processing are unidentifiable [15]. Generally, if a model has hierarchical structures, latent variables or coupled submodels, the model

Parameter redundancy

The most obvious cause of non-identifiability is parameter redundancy, in the sense that the model can be written in terms of a smaller set of parameters. Following [8], [17], we give the following definition.

Definition 4

(Parameter redundancy). A model (θ),θΘk is said to be parameter redundant if it can be expressed in terms of a smaller parameter vector β=β(θ), where dimβ<k. Models which are not parameter redundant are said to be of full rank.

In [17], Catchpole et al. introduced the concept of

Conclusion

Identifiability becomes an essential requirement for learning machines when the models contain physically interpretable parameters. Despite the existing methods can handle some specific families of parameter models, the structural identifiability analysis for arbitrary nonlinear models is still an open question [8], [21]. This paper is a further study on the structural identifiability of parameter learning machines. For the time-invariant models, we first present an identifiability result for

Acknowledgments

This work is supported in part by NSFC no. 61273196.

Zhi-Yong Ran received his M.Sc. degree in Applied Mathematics from the Beijing University of Technology, Beijing, China, in 2007. Currently he is a Ph.D. candidate at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include parameter identifiability theory and machine learning.

References (41)

  • A. Dasgupta et al.

    Nonidentifiable parametric probability models and reparameterization

    J. Stat. Plann. Inference

    (2007)
  • S. Watanabe

    Algebraic geometry of singular learning machines and symmetry of generalization and training errors

    Neurocomputing

    (2005)
  • D. Dubios et al.

    Knowledge-driven versus data-driven logics

    J. Logics Lang. Inf.

    (2000)
  • D.P. Solomatine et al.

    Data-driven modeling: some past experiences and new approaches

    J. Hydroinf.

    (2008)
  • D. Psichogios et al.

    A hybrid neural network – first principles approach to process modeling

    AIChE J.

    (1992)
  • R. Ben-Hamadou et al.

    Ecohydrology modeling: tools for management

  • L.A. Zadeh, The concept of a generalized constraint – a bridge from natural languages to mathematics, in: NAFIPS...
  • Y.J. Qu et al.

    Generalized constraint neural network regression model subject to linear priors

    IEEE Tran. Neural Networks

    (2011)
  • L. Ljung

    System Estimation: Theory for the User

    (1999)
  • L. Wang et al.

    System Estimation, Environmental Modeling and Control System Design

    (2012)
  • Cited by (0)

    Zhi-Yong Ran received his M.Sc. degree in Applied Mathematics from the Beijing University of Technology, Beijing, China, in 2007. Currently he is a Ph.D. candidate at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include parameter identifiability theory and machine learning.

    Bao-Gang Hu (M'94-SM'99) received the M.Sc. degree from the University of Science and Technology, Beijing, China, and the Ph.D. degree from McMaster University, Hamilton, ON, Canada, both in mechanical engineering, in 1983 and 1993, respectively. He was a Research Engineer and Senior Research Engineer at C-CORE, Memorial University of Newfoundland, St. John's, NF, Canada, from 1994 to 1997. From 2000 to 2005, he was the Chinese Director of computer science, control, and applied mathematics with the Chinese-French Joint Laboratory, National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is currently a Professor at NLPR. His current research interests include pattern recognition and plate growth modeling.

    View full text