Sustainable case learning for continuous domains

https://doi.org/10.1016/S1364-8152(98)00097-8Get rights and content

Abstract

Case-based reasoning (CBR) provides an adequate framework to cope with continuous domains, where a great amount of new valuable experiences are generated in a non-stop way. CBR systems become more competent in their evolution over time by means of learning new relevant experiences. There are two central problems derived from the continuous nature of some domains: the fast growing size of the case library and the overhead in the case library organisation. Our proposal to overcome these two problems is to learn only relevant cases, and to establish a lazy learning algorithm for storing cases in the case library. A relevance measure based on L'Eixample distance, and a related ontology of cases are defined, and the lazy learning algorithm is described. Finally, experimental tests on real data are presented and discussed.

Introduction

Case-based reasoning (CBR) provides an adequate framework to cope with continuous domains, where a great amount of new valuable experiences are generated in a non-stop way. In environmental sciences, CBR has been applied in different areas with different goals, because of its general applicability. It has been used in information retrieval from large historical meteorological databases (Jones and Roydhouse, 1995), in optimisation of sequence operations for the design of wastewater treatment systems (Krovvidy and Wee, 1993), in supervisory systems for diagnosing and controlling WWTP systems (Sànchez-Marrè et al., 1997a), in decision support systems for planning forest fire fighting (Avesani et al., 1995), in case-based prediction for rangeland pest management advisories (Branting et al., 1997), and in case-based design for process engineering (Surma and Brauschweig, 1996).

The environment in which our CBR-agent has emerged is the Wastewater Treatment Plant (WWTP) supervision domain. It is a clear example of a real-world complex process where AI techniques are needed and where applications give not only fair results, but also open new lines of research (e.g. Gimeno et al., 1998) as organisations and states have started to take a very active role in their interactions with the environment. Particularly, we have proposed a CBR architecture to model the experiential specific knowledge about a concrete WWTP. Currently we have been successfully working in three different WWTPs located in Catalonia (Manresa, Girona and Lloret de Mar). CBR is a flexible paradigm that supports the implementation of a dynamic learning environment. Within the frame of a CBR agent, we can model the actual operating situations1 of a WWTP through cases, and organise all the cases into the case library.

This paper is organised as follows. In the rest of Section 1we introduce the problem and give a brief idea of what a WWTP is and also we will give a description of DAI-DEPUR, our hybrid multi-knowledge supervisory architecture (see Section 1.3). In Section 2we explain our approach to CBR and give the definitions of what DAI-DEPUR assumes as its learning strategies. Also we introduce L'Eixample distance for similarity assessment.

In Section 3we discuss the problem of continuous domains and present our approach that leads towards a solution based on the ideas of relevance and utility (Section 3.1), and the implementation of a lazy strategy of learning to achieve this solution.

In Section 4we present the experimental part and give an evaluation of the implementation of our relevance-based and the lazy learning strategies for sustained learning. Section 5gives some conclusions and presents some future lines of work.

Wastewater Treatment Plants (WWTP) provide a necessary buffer between the natural environment and the concentrated wastewater from urban and industrial areas. Only when their operation is successful do WWTP achieve their objective of returning good quality water to the natural environment. The correct control and operation of WWTP is not an obvious task. There are many operations of a different nature meeting in a WWTP: mechanical, electrical, chemical, biological, physical, etc. All of them can generate failures which can take the plant to an undesirable operation state, i.e. poor outflow water quality.

Classical control methods, such as feedback (Marsili-Libeli, 1982), feed-forward (Corder and Lee, 1986), adaptive (Dochain, 1991) and optimal control (Moreno et al., 1992), have been used to improve and optimise WWTP operation. However, this classical approach, based on mathematical modelling, shows some limitations when trying to control the activated sludge process—the main biotechnological technique used in WWTP—mainly when the plant is not working in the ideal state. Some aspects that compromise the success of classical control methods are as follows:

  • the complex, and often unknown, behaviour of the micro-organisms;

  • the lack of on-line sensors and signals;

  • the ill-structure of the WWTP domain;

  • the uncertainty of some variables or instruments;

  • the dynamic state of the process;

  • the use of subjective information;

  • the relevance of many qualitative variables;

  • the delay of the analytical information from the laboratory.

In spite of these circumstances, the use of mathematical models and control algorithms improves WWTP operation and general management. However, it is necessary to find a deeper approach that allows to detect some unforeseen situations such as mechanical faults or to cope with a toxic shock. Also, it is important: (a) to take advantage of the subjective knowledge accumulated through years of experience by the experts (Sànchez-Marrè et al., 1997b), (b) to use the objective information provided by years of WWTP operation (Sànchez-Marrè et al., 1997a), or (c) to use the available but incomplete information to solve a specific problem (Baeza et al., 1998).

The integrated AI framework developed is called DAI-DEPUR (Sànchez-Marrè et al., 1996, Sànchez-Marrè, 1996), where DAI stands for Distributed And Integrated supervisory multi-level architecture. It was developed for the WWTP domain, but is clearly a general framework for complex real-world process supervision.

DAI-DEPUR is an integrated architecture (see Fig. 1). It joins in a single framework several cognitive tasks and techniques such as learning, reasoning, knowledge acquisition, distributed problem solving, etc. Four levels are distinguished from the domain model's (Steels, 1990) point of view: data, knowledge, situations and plans. On the other hand, taking into account the supervision tasks, seven levels are considered.

A nice design characteristic of this approach is reusability as most of the tools, depicted in the right-hand side of Fig. 1, are almost ready to be reused in other applications of DAI.

It is possible to describe the behaviour of our CBR-agent and its application to the WWTP domain as problem solving+learning. And, in turn, these two elements can be described in terms of the goals to be achieved by the system, the tasks that need to be solved, the methods that will lead to the accomplishment of those tasks and the background or the knowledge of the application domain that those methods require.

Section snippets

Case-based reasoning

The basic reasoning cycle of a CBR agent can be summarised by a schematic cycle (see Fig. 2). Aamodt and Plaza (1994)adopt the four REs schema:

  • Retrieve the most similar case(s) to the new case.

  • Reuse or Adapt the information and knowledge in that case to solve the new case. The selected best case has to be adapted when it does not match perfectly the new case.

  • Revise or Evaluate the proposed solution. A CBR-agent usually requires some feedback to know what is right and what is wrong. Usually,

Continuous domains

Continuous domains present some added difficulties to the building process of a CBR system (Joh, 1997). The first one is to decide which elements of the domain constitute a case. One can argue that a case is a snapshot of a situation. This concept could guide the extraction of a case from discrete domains, where the boundaries of a situation (case) are clear. In continuous domains, however, this is not obvious and creates problems.

Most of the continuous domains are systems where planning and

Experimental testing and evaluation

Two experiments were designed to test our proposal of learning only relevant cases faced against the common learning policy of other CBR systems that learn all new cases or bound the number of cases stored in the same leaf.

In the first experiment, the case library was seeded with a representative set of 15 initial cases from a previous classification built-up by Linneo+ (Sànchez-Marrè et al., 1997b), taken from real data of operation of Girona's WWTP during 1996–1997. The experts carried out

Conclusions

CBR provides an adequate framework to cope with continuous domains, where a great amount of new valuable experiences are generated in a non-stop way. CBR systems become more competent in their evolution over time by means of learning new relevant experiences. But there are two central problems derived from the continuous nature of major environmental domains when case-based reasoning techniques are applied: the fast growing size of the case library and the overhead in the case library

Acknowledgements

This research was partially supported by the Junta de Sanejament de la Generalitat de Catalunya and the Spanish CICyT projects TIC96-0878 and AMB97-0889. The authors would like to acknowledge the reviewers of this paper for their valuable comments and suggestions on the previous versions. Finally, we wish to acknowledge the co-operation of Gabriela Poch, manager of Girona's wastewater treatment plant.

References (25)

  • Jones, E., Roydhouse, A., 1995. Retrieving structured spatial information from large databases: a progress report....
  • Kolodner, J., 1993. Case-Based Reasoning. Morgan Kaufmann, San Mateo,...
  • Cited by (18)

    • Environmental data stream mining through a case-based stochastic learning approach

      2018, Environmental Modelling and Software
      Citation Excerpt :

      First of all, the size of the case library could grow very fast as the CBR system is learning new cases without an extensive improvement in the competence of the system, as pointed out in (Miyashita and Sycara, 1995). Two natural human cognitive tasks appear as the solution to these problems: forgetting (Keane and Smith, 1995) and sustained relevant learning (Sànchez-Marrè et al., 1999). On the other hand, learning many cases could provoke an overhead in the case library organization.

    • Knowledge discovery with clustering based on rules by states: A water treatment application

      2010, Environmental Modelling and Software
      Citation Excerpt :

      The global process always follows a logical sequence of treatment divided in different stages that can change according to the structure and objectives of the plant (Metcalf and Eddy Inc., 2003). When the plant is not operating normally, which is extremely difficult to model by means of traditional mechanistic models (Sànchez-Marrè et al., 1999), decisions have to be made to modify some parameters of the wastewater treatment process an to reestablish normality as soon as possible. This process is very complex because of the intrinsic features of wastewater, and because of the negative consequences of any incorrect management of the plant.

    • GESCONDA: An intelligent data analysis system for knowledge discovery and management in environmental databases

      2006, Environmental Modelling and Software
      Citation Excerpt :

      Finally, in Section 3, main conclusions and future developments of the tool are detailed. On the basis of previous experiences described in Sànchez-Marrè et al. (1997, 1999), it was decided that GESCONDA would have a multi-layer architecture of 4 levels connecting the user with the environmental system or process. These 4 levels are displayed in Fig. 1 and briefly described below:

    • Case-based reasoning applied to textile industry processes

      2012, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text