Applications of MDP as in
Todorov, E. (2009). Efficient computation of optimal actions, PNAS 106 (28), pp 11478–11483
finding shortest path (with regularizations) to sets of nodes in a random graph.
When tradeoff = 10
, normalization test of controlled transition probability fails probably just because of summation errors (5% error in norm). Yet, problem solved with high accuracy.