From e527d42b048bbe3cb086c0fe360ec78ba8c14b47 Mon Sep 17 00:00:00 2001 From: antoine_galataud Date: Wed, 29 May 2024 21:03:29 +0200 Subject: [PATCH] Title underline too short --- doc/source/overview/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/overview/index.rst b/doc/source/overview/index.rst index bd80a42..e0e164b 100644 --- a/doc/source/overview/index.rst +++ b/doc/source/overview/index.rst @@ -2,7 +2,7 @@ Hopes: finding the best policy ============================== What's off-policy (policy) evaluation? ------------------------------------- +-------------------------------------- In reinforcement learning, the goal is to find the best policy that maximizes the expected sum of rewards over time. However, in practice, it's often difficult to evaluate the value of a policy, especially when the policy is stochastic or