Learning of Automata using Recurrent Neural Networks

Abstract:

In learning of finite state automata(FSA) by recurrent neural networks(RNN), we propose a learning method that stably trains RNN to simulate FSA from examples.

Introduction

Given examples generated by an FSA, it is difficult to identify the FSA. For this problem, it is one approach to infer an FSA by training an RNN to mimic examples. When an RNN successfully learns an FSA, orbits make clusters in a hidden space. Since these clusters correspond to states of the FSA, transitions among clusters make the RNN behave like the FSA. But, even if an RNN successfully learns to simulate an FSA, orbits mostly get outside of clusters for input data with a long length, because of unstable transitions. Consequently, transitions are incorrect and the RNN can not keep mimicking the FSA. We found a theorem on the stability of transitions among clusters[1] and proposed a new learning method based on the theorem[2].

Existence of the Critical Neuro Gain that Guarantees the Stability of Transitions

Any FSA can be simulated by some RNN with binary threshold neurons. A threshold function is equivalent to a sigmoid function with $\beta=\infty$ . We usually employ a sigmoid function with finite $\beta$ for learning. In this case, it is not guaranteed that an RNN has the same computational abilities as an FSA. We have showed that if $\beta$ is greater than the critical value, then, for data with an arbitrary length, an RNN can maintain transitions among clusters that are acquired by learning from examples. Therefore, the RNN can correctly simulate the learned FSA for unknown test data. This means that an RNN with finite $\beta$ can perfectly mimic an FSA. Moreover, when $\beta$ is sufficiently large, behaviors of any RNN are equivalent to that of FSA.

Adaptive Annealed RNN Learning Method

If $\beta$ is sufficiently large, then it is expected that orbits stably move from cluster to cluster for data with an arbitrary length. But, in this case, it is hard for an RNN to learn to mimic an FSA since an error surface forms a steep stair-like landscape. We propose a learning method that gradually increases $\beta$ . But, after a long term learning, $\beta$ becomes large and the error surface has narrow steep valleys, then it is hard to learn effectively. To prevent jumping out of valleys, we adaptively control the value of modification gap of $\beta$ and learning rate. The experiments showed an RNN learns to get stable cluster transitions faster and stably(Figure 1).

Contact: Kenichi Arai, Email:ken@cslab.kecl.ntt.co.jp

**Figure 1:** Comparison of learning (Comparison of stability(Left) and convergence time(Right))
$\includegraphics[height=4.8cm]{arai-1.ps}$ $\includegraphics[height=4.8cm]{arai-2.ps}$

Bibliography

1: Arai, K. and Nakano, R.: Annealed RNN learning of finite state automata, Proc. of the 6th International Conference on Artificial Neural Networks (ICANN'96), pp. 519-524 (1996).
2: Arai, K. and Nakano, R.: Adaptive $\beta$ scheduling method of finite state automata by recurrent neural networks, Proc. of the 4th International Conference on Neural Information Processing(ICONIP'97), pp. 351-354 (1997).

This page is assembled by Takeshi Yamada