MACC'97 Session: Complex System and Multiagent
- How to identify the true state from a partially observable state ?
- Transitionally Observable MDP under variations -
- Tomohiro Yamaguchi
- Osaka Univ.
- Contact to: tomo@sys.es.osaka-u.ac.jp
- Abstract
For a reinforcement learning agent, it is one of the basic problem
how to distinguish the true states from a partially observable state,
since if not, perceptual aliasing problem occurs. To solve this
problem, my idea is to estimate the variation of the true state from
the change of the observable state.
After estimating the change point of the variation, it is possible to
use the model based reinforcement learning method that switches the
model (or make a new model) according to the estimated variation.
As a learining model for identifying the envrionment under variations,
I present the Transitionally Observable MDP (TOMDP) model as
transitions of the observable MDP according to the intermittent change
of the observation of the environment.
In order to estimate the timing for the intermittent change, I
formalize it as the "Change Point Problem" of the state transition
probabilities of the observable MDP.
Then I illustrate this problem by "an irregular dice playing problem"
and show the simple experiment that estimating the intermittent
change of the true probability of an event only from the maximum
likelihood probability from the frequency of the observed time series
results of the event. Finally I discuss the incremental algorithm for
estimating the Change Point Problem.
- keywords
partially observation, MDP, reinforcement learning, Transitionally
Observable MDP, Jump, Change Point Problem, moving maximum likelihood probability,
model selection, AIC
-
PS
file(+gzip) (in Japanese)
Return to top page
Wed Jan 21 09:37:36 JST 1998