马尔可夫决策过程，MDP转化为MRP时计算的P疑似有误 #77

zyy777 · 2024-05-08T03:31:51Z

gamma = 0.5
P_from_mdp_to_mrp = [
[0.5, 0.5, 0.0, 0.0, 0.0],
[0.5, 0.0, 0.5, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.5, 0.5],
[0.0, 0.1, 0.2, 0.2, 0.5],
[0.0, 0.0, 0.0, 0.0, 1.0],
]
P_from_mdp_to_mrp = np.array(P_from_mdp_to_mrp)
R_from_mdp_to_mrp = [-0.5, -1.5, -1.0, 5.5, 0]

V = compute(P_from_mdp_to_mrp, R_from_mdp_to_mrp, gamma, 5)
print("MDP中每个状态价值分别为\n", V)`
其中P[4,4]应该是0吧？

The text was updated successfully, but these errors were encountered:

Crane-YU · 2024-07-06T02:39:46Z

终止状态有个自己到自己的transit probability = 1，这个在状态转移图上是默认不用画出来的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

马尔可夫决策过程，MDP转化为MRP时计算的P疑似有误 #77

马尔可夫决策过程，MDP转化为MRP时计算的P疑似有误 #77

zyy777 commented May 8, 2024

Crane-YU commented Jul 6, 2024

马尔可夫决策过程，MDP转化为MRP时计算的P疑似有误 #77

马尔可夫决策过程，MDP转化为MRP时计算的P疑似有误 #77

Comments

zyy777 commented May 8, 2024

Crane-YU commented Jul 6, 2024