Read/Search this Article
Reinforcement learning is a kind of machine learning. It aims to adapt a system to a given environment according to rewards. We consider profit sharing that is a representative reinforcement learning method. A rule sequence applied between reward and reward is called an episode. Profit sharing reinforce rules at each episode. A function that shares the reward between rules of an episode is called a reinforcement function. Conventional work has used ad hoc functions. This paper analyzes reinforcement functions theoretically. First, we examine what a reinforcement function is locally reasonable. We call a rule is ineffective if and only if it is on a detour for any episodes. It is locally reasonable that ineffective rules are suppressed than any effective rules. We have derived the necessary and sufficient condition to suppress any ineffective rules as following inequality ; <LΣ>^^^w___<j=i><f_<i-1> (i=1,…,W). where, L is the maximum number of conflicting effective rules, W is the maximum length of episodes, and f_j is the value of reinforcement for the j-th previous rule applied before the reward. We call this as the ineffective rule suppression theorem. We demonstrate that a profit sharing can learn ineffective rules when the condition is violated. Second, we examine whether reinforcement functions satisfying the condition are globally reasonable. We call a collection of effective rules as a rule selection plan if and only if it selects at most one effective rule per one state. It is globally reasonable that a plan gains reward continuously. We show that the condition is also necessary and sufficient to learn a rewardfull plan. We call this as the rewardfull plan acquisition theorem. We also demonstrate that a profit sharing can learn rewardless plans when the condition is violated.