Application of recurrent neural networks to reinforcement learning under incomplete perception リカレントニューラルネットワークの不完全知覚下での強化学習への応用
この論文にアクセスする
この論文をさがす
著者
書誌事項
- タイトル
-
Application of recurrent neural networks to reinforcement learning under incomplete perception
- タイトル別名
-
リカレントニューラルネットワークの不完全知覚下での強化学習への応用
- 著者名
-
Ahmet Onat
- 著者別名
-
アーメト オナト
- 学位授与大学
-
京都大学
- 取得学位
-
博士 (工学)
- 学位授与番号
-
甲第7842号
- 学位授与年月日
-
1999-03-23
注記・抄録
博士論文
目次
- 論文目録 / (0001.jp2)
- Contents / p1 (0005.jp2)
- 1 Introduction / p1 (0009.jp2)
- 2 Reinforcement Learning / p5 (0011.jp2)
- 2.1 Overview / p5 (0011.jp2)
- 2.2 Background / p6 (0012.jp2)
- 2.3 Mathematical definition of Reinforcement Learning / p7 (0012.jp2)
- 2.4 Markovian decision processes / p11 (0014.jp2)
- 2.5 Dynamic programming / p12 (0015.jp2)
- 2.6 Exploration methods / p14 (0016.jp2)
- 2.7 Reinforcement learning algorithms / p16 (0017.jp2)
- 3 Reinforcement Learning under Incomplete Perception / p25 (0021.jp2)
- 3.1 Overview / p25 (0021.jp2)
- 3.2 Observation based solution methods / p27 (0022.jp2)
- 3.3 Solution methods with a dynamic environment model / p28 (0023.jp2)
- 4 Recurrent Neural Networks / p35 (0026.jp2)
- 4.1 Overview / p35 (0026.jp2)
- 4.2 The neuron model / p37 (0027.jp2)
- 4.3 Recurrent neural network architecture / p37 (0027.jp2)
- 4.4 Supervised training algorithms / p42 (0030.jp2)
- 5 Q-learning with Recurrent Neural Networks / p48 (0033.jp2)
- 5.1 Overview / p48 (0033.jp2)
- 5.2 Learning agent structure / p49 (0033.jp2)
- 5.3 The learning procedure / p54 (0036.jp2)
- 5.4 Implementation of the learning agent / p55 (0036.jp2)
- 5.5 Differences between the proposed structure and Recurrent-Q / p56 (0037.jp2)
- 6 Q-learning with Recurrent Neural Networks in Symbolic Environments / p58 (0038.jp2)
- 6.1 Overview / p58 (0038.jp2)
- 6.2 The symbolic environments / p59 (0038.jp2)
- 6.3 Results for the house environment / p63 (0040.jp2)
- 6.4 Propagation of the Q values / p86 (0052.jp2)
- 6.5 Summary / p90 (0054.jp2)
- 7 Q-learning with Recurrent Neural Networks in a Numeric Control Problem / p93 (0055.jp2)
- 7.1 Overview / p93 (0055.jp2)
- 7.2 The inverted pendulum problem / p94 (0056.jp2)
- 7.3 Results for controlling the inverted pendulum / p97 (0057.jp2)
- 7.4 Summary / p103 (0060.jp2)
- 8 Stochastic Gradient Ascent with Recurrent Neural Networks / p106 (0062.jp2)
- 8.1 Overview / p106 (0062.jp2)
- 8.2 The architecture / p108 (0063.jp2)
- 8.3 Details of the learning algorithm / p110 (0064.jp2)
- 8.4 The simulation environments / p112 (0065.jp2)
- 8.5 Results of simulations / p114 (0066.jp2)
- 8.6 Summary / p126 (0072.jp2)
- 9 Conclusion / p129 (0073.jp2)