资源算法drqn_mazeworld

drqn_mazeworld

2020-01-16 | |  40 |   0 |   0

Deep Recurrent Q-Network

This is a demo of DQRN, with the env as MazeWorld.

  • MazeWorld (No mei zi here)

  1. Basic maze:

    • IN: (0, 0), OUT: (2, 5)

    • state: (row, col)

    • reward: -5 for any step before finding OUT, 100 for achieving OUT.

    • action: "UP", "DOWN", "Left", "Right", encoded as 0, 1, 2, 3.

  2. Bonus maze:

    • IN: (0, 0), OUT: (2, 5)

    • state: (row, col, bonus_bit). bonus_bit indicates whether the bonus has been gotten. We can only see (row, col). Bonus is in a fixed place (3, 4).

    • reward: -5 for any step before finding OUT, 100 for achieving OUT, 100 for getting bonus.

    • action: "UP", "DOWN", "Left", "Right", encoded as 0, 1, 2, 3.

  3. Partial-info bonus maze:

    • IN: (0, 0), OUT: (2, 5)

    • state: (row, col, bonus_bit). bonus_bit indicates whether the bonus has been gotten. We can only see (row). Bonus is in a fixed place (3, 4).

    • reward: -5 for any step before finding OUT, 100 for achieving OUT, 100 for getting bonus.

    • action: "UP", "DOWN", "Left", "Right", encoded as 0, 1, 2, 3.

  1. Deep Q-Network (DQN)

  2. Deep Recurrent Q-Network (DRQN)

    • The network structure is proposed by Matthew Hausknecht & Peter Stone.

    • The network uses a RNN (basic RNN, GRU or LSTM) to "remember" or "forget" history states the agent has been to and the new action is based on the "memory".

  3. Deep Recurrent Q-Network with actions (DRQNA, named by myself)

    • The network is a revision of DRQN. In optimal control we use the information set I_k to help us make decisions, sometimes only history of S_k is not enough. So I put a_k in the RNN.

  • Results

  1. Basic maze

    • All the networks work.

  2. Bonus maze

    • For a DQN agent, it can only choose to go "RIGHT", for the policy has to be DETERMINISTIC.

    • For a DRQN agent, it can learn act "UP" at the first time reaching (3, 4), and then go "RIGHT".

    • For a DRQN agent, the same as DRQN.

  3. Partial-info bonus maze

    • For a DQN agent, it can't converge (find OUT).

    • For a DRQN agent, it can't converge either.

    • For a DRQNA agent, it can find the optimal path.


上一篇:DRQN-keras

下一篇:DRQN-visualization

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • shih-styletransfer

    shih-styletransfer Code from Style Transfer ...