Elsevier

Neurocomputing

Volume 72, Issues 1–3, December 2008, Pages 170-176
Neurocomputing

ART2 neural network interacting with environment

https://doi.org/10.1016/j.neucom.2008.02.026Get rights and content

Abstract

It is common to train a neural network by using samples so that it can realize the required input–output characteristics. However, to obtain such samples is difficult or even impossible in some cases. This paper proposes the use of on-line reinforcement learning (RL) algorithms to train adaptive-resonance-theory-based (ART2) neural networks through interaction with environments, namely RL-ART2 neural network. By utilizing its adaptation ability to a dynamic environment, RL is able to evaluate and select ART2 classification patterns without training samples. The connection weights can be automatically modified according to the running effect evaluation of classification pattern of neural networks. The proposed novel RL-ART2 neural network is applied to implement the collaboration movement of mobile robots. Simulation results are presented to demonstrate the feasibility and performance of the proposed algorithm.

Introduction

Although neural networks (NNs) are extensively studied by many researchers, adaptive-resonance-theory-based (ART2) NNs have not been widely studied [1], [2]. The research in this area is mainly focused on (i) pattern excursion, this is typically studied by the method of fractionizing and fitting [3], or by adding two bounds [4]; (ii) subjective setting of vigilance parameter, the research on this topic is typically based on the creation of new vigilance test criterion [5], [6]; and (iii) output without hierarchical structure, on which improved and modified ART2 NN learning algorithms were proposed [7], [8]. Like many other types of NNs, ART2 NNs require a large set of samples for the training purpose. In some cases, to obtain such samples is very difficult or even impossible. To overcome this difficulty, this paper proposes the use of reinforcement learning (RL) to train ART2 networks without the need of samples. In other words, the learning is based on the information from the system, for which pattern recognition technique is applied, i.e., the learning is based on interaction with a dynamic environment.

In general, ART2 NN [9] performs steady classification through competitive learning and self-stabilization mechanism. Utilizing the ability of pattern recognition and classification of ART2 NNs has been a popular subject of study in the engineering, artificial intelligence, and machine learning communities. It is applied in many research areas such as fault detection [10], video scene change detection [11], the viability of recommender systems in a citizen web portal [12], and so on.

As one of the most commonly used RL algorithms, temporal difference (TD) is used in this paper to train ART2 NN [13], via interaction with the dynamic environment instead of training samples. The state parameters of the system are inputted to the ART2 NNs. From this information, the ART2 NN generates a required pattern, which in turn is used by the system. At the RL level, according to the TD algorithm and the output from the system, it “rewards” the ART2 NN. Through this iterative process, a good ART2 NN can be finally obtained.

The rest of the paper is organized as follows. Section 2 gives a brief introduction to RL. In Section 3, the RL-ART2 model, its architecture, and learning algorithm are described. Section 4 presents some simulation results and error analysis for an example system, i.e. collaboration movement of mobile robots, which are compared with those obtained by applying a genetic algorithm for learning. Finally, a brief conclusion is given in Section 5.

Section snippets

Reinforcement learning

RL [13], also called strengthen learning or encourage learning, is one of the important machine learning methods. In a dynamic environment, it can be used to acquire self-adaptability response behavior, i.e. an asynchronous dynamic-planning method based on Markova decision process (MDP). Because of its ability to implement decision optimization through interaction with an environment, RL has a broad application value in solving complex optimization control problems.

A general RL model is

Reinforcement-learning-based ART2 NN (RL-ART2)

In this section, we briefly review the basic concepts and notations of ART2 NNs, as shown in Fig. 2. Subsequently, the architectures of ART2 nodes incorporating RL and its learning algorithm are outlined.

Simulation experiment and analysis

The parameters and settings of the environment are as follows:

  • Simulator: TeamBots [16].

  • Environment shown in Fig. 5, the objects graded as 0 and 1 are robot R, whose movement velocity is 0.7. The object in a square shape is the recycle bin, the objects in a long pole are the goal objects and other objects are static obstacles.

  • For each control mode, run about 200 episodes to get an average value. Each episode has 400 simulation steps and each simulation step is about 0.06 s.

  • The maximum deflection

Conclusions

ART2 NNs, like many other types of NNs, require a large set of samples for the training purpose. In some cases, to obtain such samples is very difficult or even impossible. To overcome this difficulty, we developed an on-line reinforced learning algorithm in this paper to train an ART2 NN without the need of samples. In other words, a novel RL-based ART2 NN, RL-ART2, is developed and is capable of online learning through interaction with environments. The proposed approach was evaluated by

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 60774059 and 60834002, Shanghai Leading Academic Disciplines under Grant T0103, the Excellent Young Teachers Program of Shanghai Municipal Commission of Education, and Key Project of Science and Technology Commission of Shanghai Municipality under Grants 061107031 and 06ZR14131. Also, the authors would like to thank Dr. T.C. Yang and Professor H.S. Hu for their helpful suggestions in improving the quality

Jian Fan received his B.E. degree in artillery command and M.S. degree in computer science from PLA Artillery College in 1999 and 2002, and his Ph.D. degree in Control Theory and Control Engineering from Shanghai University in 2006. His current research interests includes intelligence control, robot control, and computation intelligence. Now, he works as a post-doctor in the Department of Automation, Shanghai University.

References (17)

There are more references available in the full text version of this article.

Cited by (6)

  • Seawater intrusion pattern recognition supported by unsupervised learning: A systematic review and application

    2023, Science of the Total Environment
    Citation Excerpt :

    However, other ANN present good alternatives to the Kohonen map. Adaptive resonance theory 2 is an unsupervised network that classifies samples based on their memory,which makes it possible to include new samples after training, classify existing clusters, or create new ones (Fan et al., 2008). Moreover, gas neural networks can be used to preserve data topology, such as SOM, but avoids empty neurons (Du, 2010).

  • Research on Online Classification of Road Vehicle Types Based on Electromagnetic Induction

    2019, Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences
  • An Analysis of the Optimal Customer Clusters Using Dynamic Multi-Objective Decision

    2018, International Journal of Information Technology and Decision Making
  • A novel neural network structure for motion control in joints

    2017, 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques, ICEECCOT 2016

Jian Fan received his B.E. degree in artillery command and M.S. degree in computer science from PLA Artillery College in 1999 and 2002, and his Ph.D. degree in Control Theory and Control Engineering from Shanghai University in 2006. His current research interests includes intelligence control, robot control, and computation intelligence. Now, he works as a post-doctor in the Department of Automation, Shanghai University.

Yang Song received his B.E. degree and Ph.D. degree from the Department of Automation, Nanjing University of Science and Technology in 1998 and 2006. His interests are switched systems, hybrid systems, and robust control. He now works as a post-doctor in the Department of Automation, Shanghai University.

MinRui Fei received his B.E. and M.S. degrees in Industrial Automation from Shanghai University of Technology in 1984 and 1992, respectively, and his Ph.D. degree in Control Theory and Control Engineering from Shanghai University in 1997. Since 1998, he has been a professor and doctoral supervisor at Shanghai University. His current research interests are in the areas of intelligent control, complex system modeling, networked control systems, field control systems, etc.

View full text